DaemonForums  

Go Back   DaemonForums > Miscellaneous > Programming

Programming C, bash, Python, Perl, PHP, Java, you name it.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 24th February 2021
Omphalotus_japonicus Omphalotus_japonicus is offline
New User
 
Join Date: Jan 2021
Posts: 9
Default I am having trouble receiving uncorrupted file data with unix sockets.

I'm a relative newbie when it comes to the world of programming and have only recently taken up unix networking. I first created a program which downloaded files over the HTTP protocol to avail, and am now trying to make a file downloader that instead uses the Gemini protocol.

The issue, however, is that I can not apply my method of receiving uncorrupted data that I used with my HTTP file downloader as I used the content length header which Gemini lacks. I've experimented a little and came up with what I thought could be a solution, but does not work and I have yet to understand why.

Here's my code:

Code:
/*
 * Simple program which downloads a file originating from the Gemini protocol
 * using unix socket programming and the LibreSSL library.
 */

#include <netdb.h>
#include <libressl/openssl/ssl.h>

#include <cassert>
#include <iostream>
#include <fstream>

int
main()
{
        /*
         * Flag the to-be-created pointer to a linked-list pointer to allow
         * both IPV4 and IPV6 IP addresses and use TCP.
         */

        struct addrinfo flags = {0};
                        flags.ai_family = AF_UNSPEC;
                        flags.ai_socktype = SOCK_STREAM;

        /*
         * Return value for error handling.
         */

        int rv = 0;

        /*
         * Stores given address, port, and above-defined flags into a pointer
         * to a linked-list pointer which contains necessary connection
         * information.
         */

        struct addrinfo *conninfo;
        rv = getaddrinfo("skyjake.fi", "1965", &flags, &conninfo);
        assert(rv == 0);

        /*
         * Creates a communication endpoint and returns a file descriptor
         * relating to that endpoint and assigns it to the "sockfd" int.
         */

        int sockfd = socket(conninfo->ai_family, conninfo->ai_socktype,
                            conninfo->ai_protocol);
        assert(sockfd != -1);

        /*
         * Connects to given address on the socket referred
         * to by the "sockfd" int.
         */

        rv = connect(sockfd, conninfo->ai_addr, conninfo->ai_addrlen);
        assert(rv == 0);

        SSL_library_init();

        /*
         * Creates an SSL structure containing TLS/SSL connection information
         * including the SSL_CTX object which initializes the cipher list,
         * session cache setting, callbacks, and the keys of
         * certificates based off the connection method given, being the
         * TLSv1.2 protocol.
         */

        SSL *ssl = SSL_new(SSL_CTX_new(TLSv1_2_client_method()));
        assert(ssl != nullptr);

        /*
         * Sets the above-defined SSL object's socket file descriptor to the
         * one contained in the "sockfd" int.
         */

        rv = SSL_set_fd(ssl, sockfd);
        assert(rv == 1);


        /*
         * Initializes a TLS/SSL handshake with the connected server.
         */

        rv = SSL_connect(ssl);
        assert(rv == 1);

        std::string req = "gemini://skyjake.fi/lagrange/lagrange_about.png"
                          "\r\n";

        /*
         * Sends a GET request to the connected server for the requested file.
         */

        rv = SSL_write(ssl, req.data(), req.size());
        assert(rv > 0);

        std::string buffer = {0};
        int read_size = 8192;

        /*
         * Inputs received data into a buffer while its available and with each
         * iteration resize the buffer and read data by an addition of the read
         * size, storing the new data in a new portion of the buffer to prevent
         * any overwriting.... or so I thought
         */

        while (rv != 0)
        {
                int old_size = buffer.size();
                buffer.resize(old_size + read_size);
                rv = SSL_read(ssl, buffer.data() + old_size, old_size + read_size);
        }

        /*
         * Removes the Gemini header from the buffer.
         */

        buffer.erase(0, buffer.find("\r\n") + 2);

        std::ofstream file("image.png", std::ios::binary | std::ios::out
                                      | std::ios::trunc);


        SSL_shutdown(ssl);
        shutdown(sockfd, 2);

        return 0;
}

Last edited by J65nko; 23rd March 2021 at 04:31 AM. Reason: Added [code] and [/code] tags
Reply With Quote
  #2   (View Single Post)  
Old 30th April 2021
TerryP's Avatar
TerryP TerryP is offline
Arp Constable
 
Join Date: May 2008
Location: USofA
Posts: 1,547
Default

Quote:
Originally Posted by Omphalotus_japonicus View Post

Code:
        std::string buffer = {0};
        int read_size = 8192;

        /*
         * Inputs received data into a buffer while its available and with each
         * iteration resize the buffer and read data by an addition of the read
         * size, storing the new data in a new portion of the buffer to prevent
         * any overwriting.... or so I thought
         */

        while (rv != 0)
        {
                int old_size = buffer.size();
                buffer.resize(old_size + read_size);
                rv = SSL_read(ssl, buffer.data() + old_size, old_size + read_size); /* overflow! */
        }
Side warning: string::data() traditionally returns a const pointer; the non const form is relatively young (C++17).

buffer.data() + old_size gives you the address you're looking for.

SSL_read() takes the number of bytes to be read into the buffer and returns the number written to the buffer. So let's say old_size = 0, therefore buffer.data() is the start of a block of memory length read_size (8k). You tell it to read old_size+read_size bytes into this buffer, so it tries to read that 8K of data.

Coming back around: old_size = buffer.size() is 8K. So data.buffer() + old_size is offset the next 'chunk' of data. You resize the buffer an additional 8K chunk, ala buffer.data(old_size + resize). Good.

Telling SSL_read() to insert from the old_size offset thus shoves bytes in the right place, and there is now 8K free space following it.

But you're telling SSL_read() to insert old_size + read_size characters. This overflows the buffer if the request is fulfilled.

Since old_size = 8K, read_size = 8K, and the offset into the 16K buffer is at 8K, you are now trying to write 16K into 8K of remaining space. Bad things are likely to happen one way or another. If SSL_read() actually got less bytes, weird stuff will happen due to the resize and offset (buffer.data() + old_size) not taking into account the partial read.

Change the "old_size + read_size" to simply read_size, and it would probably work provided the request is fulfilled without failure. I.e. if it returns less than requested, it hit EOF, not some other limitation. For Gemini, based on a short glance at the spec I believe it's definition of EOF is closed socket connection. Using a local file instead of a socket, that change would be enough to solve corruption and probably would for a well behaved socket.


Generally when writing such a "Read into buffer by chunks" it is necessary to keep track of how many bytes were written into the buffer when computing the next offset. I.e. if the SSL_read() returns < read_size bytes but is still positive, the 'next' offset needs to take that into account rather than assuming old_size is buffer.size(). Odds are, you really mean old_size += rv. This is much more important in domains like sockets and device handling than local files.

In kind environments you'll only get a partial read at the end of a data set (e.g., reading a 12K file in 8K chunks). In some environments you may see a partial read simply because of transmission quality over a network or a blurb of hardware. Some domains have more specific definitions of I/O than others. For example in one system I work with, there is a driver that assures a read will be 0 bytes or a multiple of 188 bytes because it buffers MPEG2-TS packets coming form a piece of hardware. If you ask this device for 7520 bytes (188*40) you might get 0, 40, or any number in between of MPEG2-TS packets depending on what the device has in its buffer at that moment. Some environments generate faults more willy nilly than others, especially if dealing with devices or clever people; like those who abuse signals.

In most cases it's better to write clean code for partial reads from the get go, because if assuming perfect read lengths doesn't screw you now it probably will screw someone later.
__________________
My Journal

Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest ``foo'' someone someday shall type ``supercalifragilisticexpialidocious''.
Reply With Quote
Reply

Tags
c++, networking, sockets, unix, unix networking

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Unix file permissions vs chroot + systrace e1-531g OpenBSD Security 4 10th August 2015 02:12 AM
Trouble pxe booting a .php file with dhcpd denriktiga OpenBSD General 3 6th October 2014 05:36 PM
any unix utility that determines the right extension of a file ? daemonfowl Other BSD and UNIX/UNIX-like 8 22nd March 2013 12:19 PM
Unix still data center darling, says survey J65nko News 3 15th July 2011 03:05 AM
.wav file playing very fast on unix, fine on win gosha General software and network 16 2nd June 2009 02:37 PM


All times are GMT. The time now is 08:59 PM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick