View Single Post
  #5   (View Single Post)  
Old 7th April 2012
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,983
Default

Back to the thread's topic -- your "temporary" I/O error: what might have been the problem, and why did it suddenly resolve after a long time passed by?

Let us assume, for the sake of this discussion, that the timeout was 5 seconds, and your drive spins at a typical 7200rpm.

During that 5 seconds, 600 times the sector passed underneath the head and the write failed. Why would it fail ~600 times and then ... suddenly succeed? Keep in mind:
  • Writes are not tested after being written to the physical media. You won't find out if you've written to a bad sector until you later attempt to read it. However, if the sector is n
  • Writes and reads typically share the use of a cache buffer on the drive electronics. This chunk of RAM improves performance, though on write operations, the drive does not report the write is complete until the block(s) are actually written to the disk.
  • If the drive was busy having trouble on a prior read, and was busy retrying, and placed the write into a queue until the read was either successful or failed.... you would have had other indications of this, such as a read timeout.
There are two possibilities: 1) Perhaps that the sector could not be located by the drive electronics, or the sector header was damaged and could not be read. The sudden success would be due to the electronics selecting a replacement sector from its built-in spares and remapping the sector. Or .... 2) Perhaps the drive was unable to seek to the track, and a realignment exercise was conducted by the drive to correct the error, which took five seconds.

If you are interested in finding out which it was, your drive electronics may be able to tell you. There is data stored on the electronics, through a technology called SMART. The atactl(8) program can read it, but I find sysutil/smartmontools much easier to use.

Last edited by jggimi; 7th April 2012 at 01:14 AM. Reason: typo, addition of SMART paragraph and link. Added seek error as a plausible root cause
Reply With Quote