DaemonForums  

Go Back   DaemonForums > Miscellaneous > Programming

Programming C, bash, Python, Perl, PHP, Java, you name it.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 13th April 2009
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 1,027
Default fseek() and read() problem

I was working on an audio program on the weekend and had some weird results on BSD. Tracking it down, it seems to be related to fseek() and read(). It occurs on both

* NetBSD 4.0.1 -release i386

* OpenBSD 4.4 -release i386

What happens is this: An input file consists of a whole number of frames (frame=2352 bytes). I fseek() a whole number of frames into the file, and then try to read() the rest of the file frame-by-frame. I noticed the wrong number of frames get read, and the last read() doesn't read a whole frame (too few bytes are read). From the read(2) man page:

Quote:
The system guarantees to read the number of bytes requested if the descriptor references a normal file that has that many bytes left before the end-of-file, but in no other case.
After tracking down the problem a bit, here's a simple demo that reproduces it. First, make an input file:

$ dd if=/dev/zero of=infile bs=2352 count=9

You can vary the count to see what happens. Check the file size:

Code:
$ ls -l infile
-rw-r--r--  1 xxx  users  21168 Apr 13 10:17 infile
Then grab this file trd.c:

Code:
#include <stdio.h>
#include <unistd.h>

int
main( int argc, char* argv[] )
{
    ssize_t     nr;
    char        buf[2352];      /*  = 1 frame */
    int         ifd, j;
    FILE        *ifp;

    ifp = fopen( "infile", "rb" );
    ifd = fileno( ifp );

    printf( "where = %ld\n", ftell( ifp ) );

    if( fseek( ifp, 2352, SEEK_SET ) )    /* seek 1 frame in */
        return 1;

    printf( "where = %ld\n", ftell( ifp ) );

    for( j=1; (nr = read( ifd, (void*)buf, 2352 )) != -1; j++ ) {

        printf( "%d:  nr = %d\n", j, nr );
        if( nr==0 ) break;

    }
    fclose( ifp );
    return 0;
}
and compile it

$ gcc -Wall -o trd trd.c

and run it. E.g., with a count of 9:

Code:
$ trd
where = 0
where = 2352
1:  nr = 2352
2:  nr = 2352
3:  nr = 80
4:  nr = 0
So it's reading 2-and-a-partial frames, when it should read 8. I hope it's just me doing something dumb (as usual), but I can't see what it is. Any help or comments appreciated!
Reply With Quote
  #2   (View Single Post)  
Old 13th April 2009
ephemera's Avatar
ephemera ephemera is offline
Knuth's homeboy
 
Join Date: Apr 2008
Posts: 537
Default

Try using lseek(2) instead of fseek(3):

if (lseek( ifd, 2352, SEEK_SET ) == (off_t)-1) /* seek 1 frame in */
return 1;

Maybe mixing fseek(3) and read(2) is not ok?
Anyway, thats only a wild guess, you should ask on the fbsd ML or freebsdforum.

Last edited by ephemera; 13th April 2009 at 04:19 PM.
Reply With Quote
  #3   (View Single Post)  
Old 13th April 2009
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 1,027
Default

Thanks ephemera, that was a good idea. I tried lseek, and in a quick test (for count=9 again) it gave the expected result:

Code:
$ ltrd
where = 0
where = 0
1:  nr = 2352
2:  nr = 2352
3:  nr = 2352
4:  nr = 2352
5:  nr = 2352
6:  nr = 2352
7:  nr = 2352
8:  nr = 2352
9:  nr = 0
BTW, I had also tried fseeko(3), and it had the same problem as fseek(3). I'm still puzzled why fseek doesn't work with read(2), as they both seem to be rather legacy functions of this kind.

As for FreeBSD, I don't have it installed and never used it, so I don't know if the results are the same there. If any of the local FreeBSD users wish to try it and report the result that would be interesting!
Reply With Quote
  #4   (View Single Post)  
Old 13th April 2009
ephemera's Avatar
ephemera ephemera is offline
Knuth's homeboy
 
Join Date: Apr 2008
Posts: 537
Default

I don't know why this is so.

But, notice that ftell(3) didn't report the correct file offset.

> I'm still puzzled why fseek doesn't work with read(2), as they both seem to be rather legacy functions of this kind.

There is some difference, fseek is a stream I/O function of C stdlib and lseek is the system call for seeking into a file. Maybe there are different DS in the kernel for them and perhaps they are not kept in sync? <guess/>
Reply With Quote
  #5   (View Single Post)  
Old 13th April 2009
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 1,027
Default

Quote:
Originally Posted by ephemera View Post
But, notice that ftell(3) didn't report the correct file offset.
Interesting, thanks, that had slipped past me as I peered ahead at the correct number of reads.
Reply With Quote
  #6   (View Single Post)  
Old 13th April 2009
TerryP's Avatar
TerryP TerryP is offline
Arp Constable
 
Join Date: May 2008
Location: USofA
Posts: 1,547
Default

Hmm, I wonder if errno gets set to anything useful.

Code:
$ vim t.c
$ dd if=/dev/zero of=infile bs=2352 count=9
$ gcc t.c -o t && ./t
where = 0
where = 2352
1:  nr = 2352
2:  nr = 2352
3:  nr = 2352
4:  nr = 2352
5:  nr = 2352
6:  nr = 2352
7:  nr = 2352
8:  nr = 608
9:  nr = 0
$ uname -rms
7.2-PRERELEASE
__________________
My Journal

Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest ``foo'' someone someday shall type ``supercalifragilisticexpialidocious''.
Reply With Quote
  #7   (View Single Post)  
Old 13th April 2009
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 1,027
Default

Thanks TerryP for trying it on FreeBSD. Looks like you're also getting unexpected behaviour there, but different in detail. My sense is that, assuming the test code is properly written for the various platforms, then they should all give the same output. Since they don't, something is likely wrong.

Quote:
Originally Posted by TerryP View Post
Hmm, I wonder if errno gets set to anything useful.
Well, since fseek() and read() are tested for their respective error condition (-1 in each case) and those cases aren't entered, probably errno won't be set, right?
Reply With Quote
  #8   (View Single Post)  
Old 14th April 2009
BSDfan666 BSDfan666 is offline
Real Name: N/A, this is the interweb.
Banned
 
Join Date: Apr 2008
Location: Ontario, Canada
Posts: 2,223
Default

One would typically use fread(3) instead of read(2) when using fseek(3).

In fact, you should probably use fseeko(3) and ftello(3) as well.. the other functions do not handle 64-bit file offsets.
Reply With Quote
  #9   (View Single Post)  
Old 14th April 2009
TerryP's Avatar
TerryP TerryP is offline
Arp Constable
 
Join Date: May 2008
Location: USofA
Posts: 1,547
Talking

Quote:
Originally Posted by IdOp View Post
Well, since fseek() and read() are tested for their respective error condition (-1 in each case) and those cases aren't entered, probably errno won't be set, right?

FWIW: I tried creating a version that checks errno after each call via a macro, only to have it segfault on run. Then I yanked the version in your post (again) to a temp file, compiled & run as in the last post and it segfaulted exactly the same way (same machine).

Code:
Terry@dixie$ gcc -ggdb3 -Wall /tmp/t.c -o /tmp/t && gdb /tmp/t
GNU gdb 6.1.1 [FreeBSD]
...
This GDB was configured as "i386-marcel-freebsd"...
(gdb) run
Starting program: /tmp/t 

Program received signal SIGSEGV, Segmentation fault.
0x08048587 in main () at /tmp/t.c:13
13          ifd = fileno( ifp );
(gdb)

== beyond that ==

I've never tried to mix standard I/O functions with I/O system calls (why does anyone need to do that, normally?), but I remember a comment in the book Programming Perl: a warning about mixing things like read() and sysread(), should only be done if you are into wizardry, pain, or both. (read() and sysread() in Perl are basically equivalents to a Unix/C's fread() and read() respectively). I would reckon is you manipulate the file descriptor without updating the structure on the other side of a FILE *, like f.*() functions should do; things could probably get out of sync between the integer file descriptor and the FILE *stream; and get pissed off accordingly if certain ops were done, hypothetically anyway.

I'd really suggest trying it with fread() and such instead, as BSDFan suggests.

== other ==

The documentation on read() system call returns the # of bytes read, 0 if the read was EOF, -1 if a cork popped and sets errno. So if it's not reading the specified amount, I would rather assume it hit EOF and returned what was read up to that point (i.e. a number of bytes that is > 0 but < 2352)

edit: yep

Quote:
Originally Posted by The Open Group Base Specifications Issue 6
IEEE Std 1003.1, 2004 Edition; System Interfaces; ssize_t read(int fildes, void *buf, size_t nbyte);
Upon successful completion, where nbyte is greater than 0, read() shall mark for update the st_atime field of the file, and shall return the number of bytes read. This number shall never be greater than nbyte. The value returned may be less than nbyte if the number of bytes left in the file is less than nbyte, if the read() request was interrupted by a signal, or if the file is a pipe or FIFO or special file and has fewer than nbyte bytes immediately available for reading. For example, a read() from a file associated with a terminal may return one typed line of data.
__________________
My Journal

Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest ``foo'' someone someday shall type ``supercalifragilisticexpialidocious''.

Last edited by TerryP; 14th April 2009 at 07:40 AM.
Reply With Quote
Old 14th April 2009
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 1,027
Default

Thanks BSDfan666 and TerryP again, that is really helpful stuff. It looks like fread() may be the missing piece. In trying to understand the origins of my confusion on this there seem to be 3 factors:

1) There are a lot of similar functions available here. Although I was aware of the distinction between those using stdio FILE*'s and the lower level ones using file descriptors, I wasn't aware of lseek(), and it seems I had forgotten about fread() due to:

2) I don't work with these things on a very regular basis, so things get fuzzy .

3) The program was originally developed on Linux, where fseek() and read() seem to work together ok. (BTW a quick check on SunOS showed it was ok there too.) This is good in a way, but it led to a false sense of security as to the general situation.

Quote:
Originally Posted by TerryP
I've never tried to mix standard I/O functions with I/O system calls (why does anyone need to do that, normally?)
I guess in my case I just found the read() interface a bit cleaner, combined with not having problems with it previously. Live and learn ...

Quote:
So if it's not reading the specified amount, I would rather assume it hit EOF and returned what was read up to that point (i.e. a number of bytes that is > 0 but < 2352)
Agreed, that was my assumption too. In the little demo program I wanted to do an extra read just to make sure there was no more. Of course it could have checked for EOF explictly then too, but this was getting beyond the first-order problem and I wanted to keep it short and clear.

So ... yesterday I re-wrote the thing to use lseek() [pointed out by ephemera]. But it seems I should really use fread() and re-assess things concerning lseek vs fseeko.

Thanks again for all the patient replies.
Reply With Quote
Old 15th April 2009
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 1,027
Default

Ok, I sorted through the relevant functions and made a little summary. Here it is in case anyone ever finds it helpful. Notes:

* there are many other functions related to file access
* BSD = NetBSD 4.0.1 and OpenBSD 4.4
* Linux = Slackware 11.0 and 12.2
* comments on speed are on my i386 machines (take with grain of salt )
* corrections etc. are welcome

Functions using stdio library interface and FILE structures:

16- and 18-bit, exists in K&R and PDP-7 respectively but not Linux or BSD
seek

32-bit
fseek
ftell

32-bit Linux, can be changed to 64 by a #define.
... also ...
64-bit BSD.

fseeko
ftello - does not show the result of lseek. [Either in 32 or 64 bit mode (Linux)]

fread

Functions using system/kernel calls with file descriptors:

32-bit Linux, can be changed to 64 by a #define.
... also ...
64-bit BSD.

lseek

64-bit Linux
lseek64 = llseek

"ltell" - does not exist, ftell[o] don't work here.

read - faster than fread (Linux, NetBSD); not faster than fread (OpenBSD)

-----------------------------------

At the moment, I think I'll stick with read() over fread(), since it can be faster. A lot of reads are done, and portability to non-Unix-like is not important.

So lseek() must be used so as not to mix functions from the two categories (originally done via blundering).

On Linux lseek() is 32-bit by default, which is good enough for now, and can easily be changed if needed. Not a big downside.

Last edited by IdOp; 15th April 2009 at 03:56 AM. Reason: added 18 bits and attempt to clarify headings
Reply With Quote
Old 15th April 2009
BSDfan666 BSDfan666 is offline
Real Name: N/A, this is the interweb.
Banned
 
Join Date: Apr 2008
Location: Ontario, Canada
Posts: 2,223
Default

Short history lesson; The first Unix system was the PDP-7, this system had 18-bit integers.. not 16-bit.

Also.. fseeko/ftello use off_t, under OpenBSD this type is always 64-bit.. but you'll need to define _FILE_OFFSET_BITS=64 on Linux.
Reply With Quote
Old 15th April 2009
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 1,027
Default

Quote:
Originally Posted by BSDfan666 View Post
Short history lesson; The first Unix system was the PDP-7, this system had 18-bit integers.. not 16-bit.
Interesting, as that's an odd size. At any rate, my comment re K&R referred to page 164 where they say:

Quote:
Originally Posted by K&R
In pre-version 7 UNIX, the basic entry point to the I/O system is called seek. seek is identical to lseek, except that its offset argument is an int rather than a long. Accordingly, since PDP-11 integers have only 16 bits, the offset specified by seek is limited to 65,535.
I'll try to edit the post to include 18-bits as well.

Quote:
Also.. fseeko/ftello use off_t, under OpenBSD this type is always 64-bit.. but you'll need to define _FILE_OFFSET_BITS=64 on Linux.
Correct, that is what I wrote too, but maybe the headings were not clear. I'll see if I can punctuate them better or something. Thanks.
Reply With Quote
Old 15th April 2009
TerryP's Avatar
TerryP TerryP is offline
Arp Constable
 
Join Date: May 2008
Location: USofA
Posts: 1,547
Default

What's so odd, DEC made Programmed Data Processors (PDPs) in 12, 16, 18, and 36 bit, among other interesting gizmos over the years.
__________________
My Journal

Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest ``foo'' someone someday shall type ``supercalifragilisticexpialidocious''.
Reply With Quote
Old 15th April 2009
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 1,027
Default

heh, well odd for those of us weaned on registers commensurate with 8-bit bytes and unaware of the history.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How do you read an IRIX cd using EFS ablecode NetBSD General 9 26th May 2010 07:54 PM
I've read the installation guide...but! wubrgamer FreeBSD General 5 20th September 2008 02:37 PM
when and by what is .profile read? kasse FreeBSD General 8 11th September 2008 08:46 AM
/etc/rc.* files isn't read properly? mathias OpenBSD General 4 1st June 2008 06:35 PM
Filesystem read errors Foon FreeBSD General 0 10th May 2008 07:27 AM


All times are GMT. The time now is 11:34 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick