DaemonForums  

Go Back   DaemonForums > FreeBSD > FreeBSD General

FreeBSD General Other questions regarding FreeBSD which do not fit in any of the categories below.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 9th August 2008
PeterSteele PeterSteele is offline
Port Guard
 
Join Date: Jul 2008
Posts: 43
Thanked 0 Times in 0 Posts
Default What is best way to monitor for bad sectors?

We are developing an appliance based on FreeBSD 7.0. We will be running a custom FS on top of both UFS and raw partitions and will need to have a way to deal with bad sectors. Ideally we'd like to be able to subscribe to a system event and have FreeBSD alert us when a bad sector is detected. I don't think there is a mechanism to do this though. We've noted that the system does seem to log a message in /var/log/dmesg when a bad sector is encountered, for example, these messages are occurring on one of ours systems right now:

ad8: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=343237347
g_vfs_done():ad8s1d[READ(offset=143525234688, length=2048)]error = 5
ad8: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=343237347
g_vfs_done():ad8s1d[READ(offset=143525234688, length=2048)]error = 5
ad8: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=343237347
g_vfs_done():ad8s1d[READ(offset=143525234688, length=2048)]error = 5

I assume these are due to bad sectors. So we thought one alternative could simply be to have a process that polls and scans /var/log/dmesg periodically and reports back to our management system when one of these errors are detected.

Is this a reliable/reasonable approach? Is there a better way to monitor a system for bad sectors?

I should add that we do have SMART installed and operational, but our experience so far with SMART is that it will not alert us to a sector that goes bad without warning. If we're mistaken about this, we'd appreciate some feedback. Thanks.

Last edited by PeterSteele; 9th August 2008 at 10:57 PM.
Reply With Quote
  #2   (View Single Post)  
Old 10th August 2008
PeterSteele PeterSteele is offline
Port Guard
 
Join Date: Jul 2008
Posts: 43
Thanked 0 Times in 0 Posts
Default

I should also point out that these are SATA drives. A preferred option of course would be to query the SATA driver directly to pull this kind of information. Is there an ioctl for this kind of query. I don't know for sure but I assume we are using the generic FreeBSD SATA driver.
Reply With Quote
  #3   (View Single Post)  
Old 10th August 2008
J65nko J65nko is offline
Administrator
 
Join Date: May 2008
Location: Budel - the Netherlands
Posts: 3,116
Thanked 182 Times in 149 Posts
Default

You could have a look at the source code of http://www.freebsd.org/cgi/url.cgi?p...ools/pkg-descr . And/or the source of the OpenBSD atactl(8) utility.

If this doesn't bring you any further you could consider to post on one of the FreeBSD mailing lists, where the developers hang out.

For the reader who is interested in SMART see http://en.wikipedia.org/wiki/Self-Mo...ing_Technology
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump
Reply With Quote
  #4   (View Single Post)  
Old 10th August 2008
lvlamb's Avatar
lvlamb lvlamb is offline
Real Name: Louis V. Lambrecht
Spam Deminer
 
Join Date: May 2008
Location: .be
Posts: 221
Thanked 25 Times in 24 Posts
Default

Yeah! In theory bad sectors are managed through S.M.A.R.T., but frankly, if bad sectors are a concern and have to be monitored, junk that bloody hard drive.
Once S.M.A.R.T. signals unrecoverrable errors the drive already is dead and begs for a sysutils/testdisk of last resort..
__________________
da more I know I know I know nuttin'
Reply With Quote
  #5   (View Single Post)  
Old 11th August 2008
PeterSteele PeterSteele is offline
Port Guard
 
Join Date: Jul 2008
Posts: 43
Thanked 0 Times in 0 Posts
Default

I will have to investigate S.M.A.R.T further. We do have some monitoring currently in place but a recent case of bad sectors did not get detected by our SMART subsystem. There may be more that we can do here; I've just taken over this area of our code and I have some learning to do...
Reply With Quote
  #6   (View Single Post)  
Old 11th August 2008
lvlamb's Avatar
lvlamb lvlamb is offline
Real Name: Louis V. Lambrecht
Spam Deminer
 
Join Date: May 2008
Location: .be
Posts: 221
Thanked 25 Times in 24 Posts
Default

IDE drives have dedicated cylinder(s) on which sectors are mapped to the bad sectors. When these cylinders are full, bad sectors can't be mapped anyore and just get flagged.
In theory, S.M.A.R.T. should issue warnings, if well implemented by both you and the manufacturer.

What I would do is to run the manufacturer's utility disk. A warranty test or RMA test would compare manufacturr's specs to the real status of the drive, eventually generate a report (or just a code number) of found defects.
Sometimes, "zeroing" the drive (long process, writes patterns on the whole surface) can help revive the drive.
For a while. As long as new bad sectors aren't found.
You could still use the drive for testing purposes or non-critical backups, but not anymore for intensive read-writes.

In any case, bad sectors means the drive isn't fit for production anylonger.
__________________
da more I know I know I know nuttin'
Reply With Quote
  #7   (View Single Post)  
Old 12th August 2008
PeterSteele PeterSteele is offline
Port Guard
 
Join Date: Jul 2008
Posts: 43
Thanked 0 Times in 0 Posts
Default

The issue is that these systems will be running at customer sites and we need our software to be able to detect failing drives and take corrective action. It's an HA system so when a drive is deemed usable another drive will have to take over. We can detect a completely failed drive, it's just these bad sectors we need to find a way to deal with. We ultimately want to treat a drive that we detect bad sectors on the same way as a drive that fails completely. Essentially, we want to be alerted as soon as a sector goes bad, or at least within a very short period of time...
Reply With Quote
  #8   (View Single Post)  
Old 13th August 2008
phoenix's Avatar
phoenix phoenix is offline
Risen from the ashes
 
Join Date: May 2008
Posts: 699
Thanked 90 Times in 81 Posts
Default

Sounds like you are trying to recreate RAID in software. Wouldn't it make more sense to invest in good quality hardware RAID controllers, that already have this kind of monitoring built-in, along with e-mail and/or SMS alerts?

We've been using AMCC/3Ware 9000-, 9500-, and 9600-series RAID controllers for just this reason, and the onboard e-mail alerts have allowed us to replace 4 drives so far, before they died completely (and still within the manufacturer's warranty period). These do SMART monitoring along with a bunch of other stuff. If you don't need the RAID features, you can create "Single Disk" arrays that allow you to see/use each drive individually, but with all the onboard cache and monitoring features of the controller (we do this in our ZFS storage servers).
__________________
Freddie

Help for FreeBSD: Handbook, FAQ, man pages, mailing lists.

Last edited by phoenix; 13th August 2008 at 07:26 AM. Reason: Add info on 3Ware
Reply With Quote
  #9   (View Single Post)  
Old 14th August 2008
PeterSteele PeterSteele is offline
Port Guard
 
Join Date: Jul 2008
Posts: 43
Thanked 0 Times in 0 Posts
Default

Yes, it does seem like RAID would be the obvious solution but the nature of our product precludes this option...
Reply With Quote
Old 16th August 2008
phoenix's Avatar
phoenix phoenix is offline
Risen from the ashes
 
Join Date: May 2008
Posts: 699
Thanked 90 Times in 81 Posts
Default

Even a low-profile, 4-port version?
__________________
Freddie

Help for FreeBSD: Handbook, FAQ, man pages, mailing lists.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
gdm/ new monitor issue jimbus FreeBSD General 3 4th August 2009 07:39 PM
Wireless networking xor battery monitor Nobber OpenBSD General 5 27th February 2009 12:25 PM
Network analyzer/monitor suggestion? Bruco FreeBSD Ports and Packages 2 29th January 2009 06:42 PM
wlan -> monitor mode ccc FreeBSD Security 2 4th November 2008 09:19 PM


All times are GMT. The time now is 06:19 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick