|
OpenBSD General Other questions regarding OpenBSD which do not fit in any of the categories below. |
|
Thread Tools | Display Modes |
|
|
|||
How to force a file system clean?
Hi!
After a power falure my openbsd server halts on boot. One of my drive have "inconsistency" according to the prompt. This is what I get after the file system check is done: ... CANNOT READ: BLK 28196704 /dev/rwd1i: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY. ... I tryed "fsck_ffs /dev/rwd1i" and then I got: ...recal drive fault... ...(more numbers and stuff)... CANNOT READ: BLK 128 CONTINUE? [Fyn?] If I type "y" it keeps listing other blocks and I finally end up in a prompt. What should I do to fix this (to repair my file system, or force marking it clean)? /Quaxo |
|
|||
Im getting nervous now I have no backup of this drive.
Im not sure what you mean by "spare sectors". The drive isn't 100% full of data but the entire drive (500Gb) were allocated to wd1i when i installed and mounted openbsd. What should I do? /Quaxo |
|
||||
Regarding spare sectors, managed by drive electronics, and not by you or the OS, see http://en.wikipedia.org/wiki/Bad_sector.
You can force mount the partition while in this state, and attempt to copy off everything you want saved, to another drive. If you mount it "dirty", you should do so read-only. Please see the mount(8) manual for details. If it were my drive, I would take the drive out of service, and write to, then read, every sector on the harddrive, to ensure that spare sectors have successfully replaced all the bad sectors, before restoring data and returning it to service. Of course, I have backups of all my drives. You should, too. But you know that....now. (The "badblocks" program is included in the efs2progs package, which I prefer, or, you could use dd(1) with /dev/rwd1c.) Last edited by jggimi; 23rd July 2011 at 11:41 PM. |
|
||||
Just some additional thoughts for you to consider.
Writing to the drive will, of course, destroy what is there, so use dd or badblocks only after recovering what you can. You do not have other partitions on the drive, only "i". But if you did have other partitions, you should expect you might have bad sectors among them as well. Kernel messages regarding sector numbers will be the physical sectors numbers used by the drive. Userland programs, such as fsck_ffs, will report sector numbers within the partition. They will not be identical, unless you are running a program against the "c" partition, which is of the whole drive. Use raw partitions with dd or badblocks, to maximize performance. You can communicate with the drive electronics (SMART) using atactl(8); I prefer the smartmontools package, as I find it easier to use. This may be able to provide you with a better understanding of the state of your drive. What comes from SMART is up to the drive vendor, some manufacturers produce more information than others. |
|
|||
This drive is my backup drive However, I wasn't finished with my custom backup system yet so I wasn't ready for this and this drive have unique files.
Hmm, this was tricky. Do I have to mount it successfully once in order to be able to reboot (like marking the file system OK)? Would it work to just edit fstab to set the drive as read-only and then just reboot to get the server running again? The other drive is OK and contains all the important system stuff (root, usr, var...). /Quaxo |
|
||||
One last thought -- if you cannot mount successfully with -o force,rdonly due to a "bad superblock" -- the first block of metadata about the filesystem, usually sector #8 within the partition -- the alternate/spare superblock may be usable. This is usually sector #32. See the -b option.
|
|
|||
Ok ill try use ed (my not very favourite editing program ) tomorrow, i must get some sleep now.
ill be back.. |
|
||||
I'd mentioned SMART -- the drive electronics standard -- and smartmontools. On all my systems, I set up daily "short offline" tests to have the electronics do self-tests and test the data bus, and weekly "long offline" (also called "extended offline") tests to have the electronics read every sector.
On my RAID systems, it's easy enough to take a problem drive out of service and run badblocks on it, then put it back in service if it is still useable. On my single-drive systems, what I do will depend on what sectors have failed. And that's a manual process. I would map the drive sectors to partition sectors, then map those to block numbers, then determine if the blocks are unassigned, or if assigned, to which inodes. Not easy, but dumpfs(8) can help. I haven't done this in several years, because my single-drive OBSD systems are now down to a grand total of one, and and so far .... no tests have reported any errors. |
|
|||
I have manage to boot the server now.
What I did was first trying to set the drive in read-only mode by changeing the line in fstab (the drive is FFS according to fstab): Code:
/dev/wd1i /backup ffs rw,nodev,nosuid 1 2 Code:
/dev/wd1i /backup ffs ro,nodev,nosuid 1 2 Code:
#/dev/wd1i /backup ffs ro,nodev,nosuid 1 2 I commented away the entire line instead of adding the word "force" to it, as you suggested. I didn't find this particular setting in the manual for fstab so I just commented the entire line instead. The major part of the problem still exists though, my drive is damaged. Now the drive (the directory rather) is present in the root directory list ("ls /") but the directory is empty. I'm surprised that the directory showed up at all when the line is commented in fstab. So, as you mentioned I could now mount the drive as read-only. I tried Code:
mount -f ffs -f -r /dev/wd1i /backup |
|
||||
Mounts are made to "mount points", existing directories. You must have created the /backup directory at some point.
What do you mean by "halts"? Does your computer hang, or does it panic? (You won't be able to see a panic if you are running X, do this from the console.) You might consider installing smartmontools and reviewing what the electronics on the drive can tell you. If you do, share the output of # smartctl -a /dev/wd1c with us. Pipe the output to a file and attach it. Last edited by jggimi; 24th July 2011 at 11:56 AM. |
|
|||
After the mount of /dev/wd1i it seemed to work but then I did a "ls /backup" the prompt jumped down to a new line (as I pressed ENTER) and then nothing happened.
I installed smartmontool, the output is as follows: Code:
smartctl -a /dev/wd1i smartctl version 5.38 [i386-unknown-openbsd4.5] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: [No Information Found] Serial Number: [No Information Found] Firmware Version: [No Information Found] Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 1 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sun Jul 24 14:55:34 2011 CEST SMART is only available in ATA Version 3 Revision 3 or greater. We will try to proceed in spite of this. SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported. Checking for SMART support by trying SMART ENABLE command. SMART ENABLE appeared to work! Continuing. SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 85-87 don't show if SMART is enabled. A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. Code:
atactl: ATA device returned Aborted Command In my message log I see the following: Code:
cat /var/log/messages Jul 24 12:00:02 server1 newsyslog[32278]: logfile turned over Jul 24 12:00:02 server1 syslogd: restart Jul 24 12:09:44 server1 /bsd: WARNING: /backup was not properly unmounted Jul 24 12:09:54 server1 /bsd: wd1i: device fault reading fsbn 656400560 of 656400544-656400575 (wd1 bn 656400623; cn 40859 tn 12 sn 32), retrying Jul 24 12:09:55 server1 /bsd: pciide1:0:1: recal drive fault ... Jul 24 12:10:19 server1 /bsd: wd1i: device fault reading fsbn 656400544 of 656400544-656400575 (wd1 bn 656400607; cn 40859 tn 12 sn 16), retrying Jul 24 12:10:20 server1 /bsd: pciide1:0:1: recal drive fault Jul 24 12:10:20 server1 /bsd: wd1i: device fault reading fsbn 656400544 of 656400544-656400575 (wd1 bn 656400607; cn 40859 tn 12 sn 16) |
|
|||
"smartctl -a /dev/wd1c" yields the same as before.
wd0c works but that's the other drive. Could something be wrong with my mount? |
|
||||
If what you describe is accurate -- and that is only what I can determine from your posting here -- your drive's electronics are not operating properly. If your power failure included either a power surge or a period of low voltage (or missing phases of AC), this might be the cause. But your drive's electronics are at least functioning, as the device does respond to I/O requests, sometimes successfully.
As for your question on "mounts" -- Whatever is occurring has nothing to do with a mount. Mounts are merely logical attachments of filesystems to the OS. --- I do not clearly understand how you ended up with an "i" partition on wd1, but based on your technical background as presented in this thread, I will assume you are using a foreign MBR partition rather than a BSD disklabel on the drive -- the OS will create a virtual disklabel and assign foreign partitions it recognizes to BSD partitions beginning with "i'. And, whether or not this partition was ever actually an FFS filesystem or not is now immaterial - your kernel messages alone are proof of hardware problems -- an inability to read some sectors. Since your forced, read only mount succeeded, the drive was able to return the sectors containing the primary superblock, which begins at sector #8. When attempting to read the root directory (inode #2, if this is an FFS filesystem), your "ls" command appeared to hang. Understand that as the drive spins, the electronics may attempt to read the same sector repeatedly, in an attempt to extract valid information. Dozens, or hundreds of times. It must wait for a complete rotation of the drive each time it tries, and that is relatively slow. Eventually, kernel messages will be produced, showing timeouts (from retrying reads over and over, and the OS gives up waiting) and read errors (when the electronics on the drive gives up before the OS does). If you issue the "ls" command from the console, you would see these kernel messages appear while you waited. Unfortunately, with the root inode unreadable, there is not much further you yourself will be able to do to extract useful data from the drive. If the root directory were available, you might be able to extract undamaged files, and traverse other undamaged directories. But it is not. A skilled technician may be able to copy undamaged sectors from the drive, and reassemble some of the content into meaningful files. But that would be a manual, difficult, and long effort, with no guarantees. As for your unreadable sectors, some of them might be readable by commercial laboratories that specialize in data recovery from disk drives. This would be many thousands of Dollars or Euros, and of course there are no guarantees, depending upon the underlying physical damage to the media. --- If you wish to give up on the existing data on the drive, you may start destructive testing, and see if the drive can be returned to useful function. To do that, install e2fsprogs, dismount /backup, and use the badblocks program against the entire drive, rwd1c or wd1c, I can't remember which badblocks prefers. Use -p 1, so that badblocks continues to run until no new failures are discovered -- so that all bad blocks have been successfully replaced with spare sectors, and -w, so that badblocks writes and tests various bit patterns. See badblocks(8) for details. Last edited by jggimi; 24th July 2011 at 07:50 PM. |
|
|||
It was a thunderstorm that caused the power failure and the lights did blink for a few seconds like in a horror movie. I could understand if some files were lost due to a uneven current, but a complete disc destroyed!
I don't know what to do now. I'll try fiddle with it a bit more and see if I can get it to list some files. Thank you very much for your help!!! I post again later on (perhaps in a few days) when I have tried some more. /Quaxo Edit: Oh the letter "i". I don't remember but this was done during installation and I didn't make any notes about this particular part. So it's probably as you say, a virtual disklabel. If it helps my fstab looks as follows: Code:
cat /etc/fstab /dev/wd0a / ffs rw 1 1 /dev/wd0h /home ffs rw,nodev,nosuid 1 2 /dev/wd0i /home/someuser ffs rw,nodev,nosuid 1 2 #/dev/wd1i /backup ffs ro,nodev,nosuid 1 2 /dev/wd0d /tmp ffs rw,nodev,nosuid 1 2 /dev/wd0g /usr ffs rw,nodev 1 2 /dev/wd0e /var ffs rw,nodev,nosuid 1 2 Last edited by Quaxo; 24th July 2011 at 08:48 PM. Reason: The "i" letter |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Create MS/XP file system so it will be recognized on a XP system. | FBSD | Guides | 0 | 1st May 2010 06:49 AM |
System stops booting after "GEOM_MIRROR: Force device gm0 start due to timeout." | indienick | FreeBSD General | 1 | 25th March 2010 09:46 PM |
Live OS with file system encryption? | eurovive | Other BSD and UNIX/UNIX-like | 0 | 18th February 2010 06:56 PM |
File system at more than 100% | michaelrmgreen | FreeBSD General | 4 | 28th July 2008 01:52 PM |
Which file system use to share data on Bsd system? | aleunix | Other BSD and UNIX/UNIX-like | 2 | 1st June 2008 04:14 PM |