|
FreeBSD General Other questions regarding FreeBSD which do not fit in any of the categories below. |
|
Thread Tools | Display Modes |
|
|||
What is best way to monitor for bad sectors?
We are developing an appliance based on FreeBSD 7.0. We will be running a custom FS on top of both UFS and raw partitions and will need to have a way to deal with bad sectors. Ideally we'd like to be able to subscribe to a system event and have FreeBSD alert us when a bad sector is detected. I don't think there is a mechanism to do this though. We've noted that the system does seem to log a message in /var/log/dmesg when a bad sector is encountered, for example, these messages are occurring on one of ours systems right now:
ad8: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=343237347 g_vfs_done():ad8s1d[READ(offset=143525234688, length=2048)]error = 5 ad8: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=343237347 g_vfs_done():ad8s1d[READ(offset=143525234688, length=2048)]error = 5 ad8: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=343237347 g_vfs_done():ad8s1d[READ(offset=143525234688, length=2048)]error = 5 I assume these are due to bad sectors. So we thought one alternative could simply be to have a process that polls and scans /var/log/dmesg periodically and reports back to our management system when one of these errors are detected. Is this a reliable/reasonable approach? Is there a better way to monitor a system for bad sectors? I should add that we do have SMART installed and operational, but our experience so far with SMART is that it will not alert us to a sector that goes bad without warning. If we're mistaken about this, we'd appreciate some feedback. Thanks. Last edited by PeterSteele; 9th August 2008 at 10:57 PM. |
|
|||
I should also point out that these are SATA drives. A preferred option of course would be to query the SATA driver directly to pull this kind of information. Is there an ioctl for this kind of query. I don't know for sure but I assume we are using the generic FreeBSD SATA driver.
|
|
|||
You could have a look at the source code of http://www.freebsd.org/cgi/url.cgi?p...ools/pkg-descr . And/or the source of the OpenBSD atactl(8) utility.
If this doesn't bring you any further you could consider to post on one of the FreeBSD mailing lists, where the developers hang out. For the reader who is interested in SMART see http://en.wikipedia.org/wiki/Self-Mo...ing_Technology
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump |
|
|||
I will have to investigate S.M.A.R.T further. We do have some monitoring currently in place but a recent case of bad sectors did not get detected by our SMART subsystem. There may be more that we can do here; I've just taken over this area of our code and I have some learning to do...
|
|
||||
IDE drives have dedicated cylinder(s) on which sectors are mapped to the bad sectors. When these cylinders are full, bad sectors can't be mapped anyore and just get flagged.
In theory, S.M.A.R.T. should issue warnings, if well implemented by both you and the manufacturer. What I would do is to run the manufacturer's utility disk. A warranty test or RMA test would compare manufacturr's specs to the real status of the drive, eventually generate a report (or just a code number) of found defects. Sometimes, "zeroing" the drive (long process, writes patterns on the whole surface) can help revive the drive. For a while. As long as new bad sectors aren't found. You could still use the drive for testing purposes or non-critical backups, but not anymore for intensive read-writes. In any case, bad sectors means the drive isn't fit for production anylonger.
__________________
da more I know I know I know nuttin' |
|
|||
The issue is that these systems will be running at customer sites and we need our software to be able to detect failing drives and take corrective action. It's an HA system so when a drive is deemed usable another drive will have to take over. We can detect a completely failed drive, it's just these bad sectors we need to find a way to deal with. We ultimately want to treat a drive that we detect bad sectors on the same way as a drive that fails completely. Essentially, we want to be alerted as soon as a sector goes bad, or at least within a very short period of time...
|
|
||||
Sounds like you are trying to recreate RAID in software. Wouldn't it make more sense to invest in good quality hardware RAID controllers, that already have this kind of monitoring built-in, along with e-mail and/or SMS alerts?
We've been using AMCC/3Ware 9000-, 9500-, and 9600-series RAID controllers for just this reason, and the onboard e-mail alerts have allowed us to replace 4 drives so far, before they died completely (and still within the manufacturer's warranty period). These do SMART monitoring along with a bunch of other stuff. If you don't need the RAID features, you can create "Single Disk" arrays that allow you to see/use each drive individually, but with all the onboard cache and monitoring features of the controller (we do this in our ZFS storage servers). Last edited by phoenix; 13th August 2008 at 07:26 AM. Reason: Add info on 3Ware |
|
|||
Yes, it does seem like RAID would be the obvious solution but the nature of our product precludes this option...
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
gdm/ new monitor issue | jimbus | FreeBSD General | 3 | 4th August 2009 07:39 PM |
Wireless networking xor battery monitor | Nobber | OpenBSD General | 5 | 27th February 2009 12:25 PM |
Network analyzer/monitor suggestion? | Bruco | FreeBSD Ports and Packages | 2 | 29th January 2009 06:42 PM |
wlan -> monitor mode | ccc | FreeBSD Security | 2 | 4th November 2008 09:19 PM |