DaemonForums  

Go Back   DaemonForums > OpenBSD > OpenBSD General

OpenBSD General Other questions regarding OpenBSD which do not fit in any of the categories below.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 23rd July 2011
Quaxo Quaxo is offline
Port Guard
 
Join Date: Jun 2008
Posts: 29
Default How to force a file system clean?

Hi!

After a power falure my openbsd server halts on boot. One of my drive have "inconsistency" according to the prompt. This is what I get after the file system check is done:
...
CANNOT READ: BLK 28196704
/dev/rwd1i: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.
...

I tryed "fsck_ffs /dev/rwd1i" and then I got:
...recal drive fault...
...(more numbers and stuff)...
CANNOT READ: BLK 128
CONTINUE? [Fyn?]

If I type "y" it keeps listing other blocks and I finally end up in a prompt.

What should I do to fix this (to repair my file system, or force marking it clean)?

/Quaxo
Reply With Quote
  #2   (View Single Post)  
Old 23rd July 2011
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,975
Default

You have one or more sectors that cannot be read. In other words, a damaged drive. If there are sufficient spare sectors, the disk will reassign those sectors to the spares, once they are rewritten. But not until they are rewritten.

Do you have a backup of wd1f?
Reply With Quote
  #3   (View Single Post)  
Old 23rd July 2011
Quaxo Quaxo is offline
Port Guard
 
Join Date: Jun 2008
Posts: 29
Default

Im getting nervous now I have no backup of this drive.

Im not sure what you mean by "spare sectors". The drive isn't 100% full of data but the entire drive (500Gb) were allocated to wd1i when i installed and mounted openbsd.

What should I do?

/Quaxo
Reply With Quote
  #4   (View Single Post)  
Old 23rd July 2011
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,975
Default

Regarding spare sectors, managed by drive electronics, and not by you or the OS, see http://en.wikipedia.org/wiki/Bad_sector.

You can force mount the partition while in this state, and attempt to copy off everything you want saved, to another drive. If you mount it "dirty", you should do so read-only. Please see the mount(8) manual for details.

If it were my drive, I would take the drive out of service, and write to, then read, every sector on the harddrive, to ensure that spare sectors have successfully replaced all the bad sectors, before restoring data and returning it to service. Of course, I have backups of all my drives. You should, too. But you know that....now.

(The "badblocks" program is included in the efs2progs package, which I prefer, or, you could use dd(1) with /dev/rwd1c.)

Last edited by jggimi; 23rd July 2011 at 11:41 PM.
Reply With Quote
  #5   (View Single Post)  
Old 23rd July 2011
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,975
Default

Just some additional thoughts for you to consider.

Writing to the drive will, of course, destroy what is there, so use dd or badblocks only after recovering what you can.

You do not have other partitions on the drive, only "i". But if you did have other partitions, you should expect you might have bad sectors among them as well.

Kernel messages regarding sector numbers will be the physical sectors numbers used by the drive. Userland programs, such as fsck_ffs, will report sector numbers within the partition. They will not be identical, unless you are running a program against the "c" partition, which is of the whole drive.

Use raw partitions with dd or badblocks, to maximize performance.

You can communicate with the drive electronics (SMART) using atactl(8); I prefer the smartmontools package, as I find it easier to use. This may be able to provide you with a better understanding of the state of your drive. What comes from SMART is up to the drive vendor, some manufacturers produce more information than others.
Reply With Quote
  #6   (View Single Post)  
Old 24th July 2011
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,975
Default

One last thought -- if you cannot mount successfully with -o force,rdonly due to a "bad superblock" -- the first block of metadata about the filesystem, usually sector #8 within the partition -- the alternate/spare superblock may be usable. This is usually sector #32. See the -b option.
Reply With Quote
  #7   (View Single Post)  
Old 24th July 2011
Quaxo Quaxo is offline
Port Guard
 
Join Date: Jun 2008
Posts: 29
Default

This drive is my backup drive However, I wasn't finished with my custom backup system yet so I wasn't ready for this and this drive have unique files.

Hmm, this was tricky. Do I have to mount it successfully once in order to be able to reboot (like marking the file system OK)?
Would it work to just edit fstab to set the drive as read-only and then just reboot to get the server running again?

The other drive is OK and contains all the important system stuff (root, usr, var...).

/Quaxo
Reply With Quote
  #8   (View Single Post)  
Old 24th July 2011
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,975
Default

You could place the force and rdonly options in your fstab, or, remove the partition from your fstab entirely.
Reply With Quote
  #9   (View Single Post)  
Old 24th July 2011
Quaxo Quaxo is offline
Port Guard
 
Join Date: Jun 2008
Posts: 29
Default

Ok ill try use ed (my not very favourite editing program ) tomorrow, i must get some sleep now.

ill be back..
Reply With Quote
Old 24th July 2011
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,975
Default

Boot into single user mode, mount -a, then use your $EDITOR of choice. Or, wait for rc to fail, then mount -a and use $EDITOR.
Reply With Quote
Old 24th July 2011
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,975
Default

I'd mentioned SMART -- the drive electronics standard -- and smartmontools. On all my systems, I set up daily "short offline" tests to have the electronics do self-tests and test the data bus, and weekly "long offline" (also called "extended offline") tests to have the electronics read every sector.

On my RAID systems, it's easy enough to take a problem drive out of service and run badblocks on it, then put it back in service if it is still useable.

On my single-drive systems, what I do will depend on what sectors have failed. And that's a manual process. I would map the drive sectors to partition sectors, then map those to block numbers, then determine if the blocks are unassigned, or if assigned, to which inodes. Not easy, but dumpfs(8) can help. I haven't done this in several years, because my single-drive OBSD systems are now down to a grand total of one, and and so far .... no tests have reported any errors.
Reply With Quote
Old 24th July 2011
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,975
Default

Now I have a stupid question, because I was just thinking...this is a single partition drive, and that partition is "i". Is this a foreign filesystem, and not FFS? That could be the root cause of fsck barfing up errors.
Reply With Quote
Old 24th July 2011
Quaxo Quaxo is offline
Port Guard
 
Join Date: Jun 2008
Posts: 29
Default

I have manage to boot the server now.

What I did was first trying to set the drive in read-only mode by changeing the line in fstab (the drive is FFS according to fstab):
Code:
/dev/wd1i /backup ffs rw,nodev,nosuid 1 2
to:
Code:
/dev/wd1i /backup ffs ro,nodev,nosuid 1 2
but that still halted my boot sequence with the same message as in the first post, so I just commented away the entire line like:
Code:
#/dev/wd1i /backup ffs ro,nodev,nosuid 1 2
and now the server boots up completely.

I commented away the entire line instead of adding the word "force" to it, as you suggested. I didn't find this particular setting in the manual for fstab so I just commented the entire line instead.

The major part of the problem still exists though, my drive is damaged.

Now the drive (the directory rather) is present in the root directory list ("ls /") but the directory is empty. I'm surprised that the directory showed up at all when the line is commented in fstab.

So, as you mentioned I could now mount the drive as read-only. I tried
Code:
mount -f ffs -f -r /dev/wd1i /backup
but that didnt work. The computer halts when i do "ls /backup", so i'm stuck at this point.
Reply With Quote
Old 24th July 2011
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,975
Default

Mounts are made to "mount points", existing directories. You must have created the /backup directory at some point.

What do you mean by "halts"? Does your computer hang, or does it panic? (You won't be able to see a panic if you are running X, do this from the console.)

You might consider installing smartmontools and reviewing what the electronics on the drive can tell you. If you do, share the output of

# smartctl -a /dev/wd1c

with us. Pipe the output to a file and attach it.

Last edited by jggimi; 24th July 2011 at 11:56 AM.
Reply With Quote
Old 24th July 2011
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,975
Default

Oh, yes...do examine /var/log/messages, and see if you have missed kernel messages about wd1.
Reply With Quote
Old 24th July 2011
Quaxo Quaxo is offline
Port Guard
 
Join Date: Jun 2008
Posts: 29
Default

After the mount of /dev/wd1i it seemed to work but then I did a "ls /backup" the prompt jumped down to a new line (as I pressed ENTER) and then nothing happened.

I installed smartmontool, the output is as follows:
Code:
smartctl -a /dev/wd1i
smartctl version 5.38 [i386-unknown-openbsd4.5] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     [No Information Found]
Serial Number:    [No Information Found]
Firmware Version: [No Information Found]
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   1
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Jul 24 14:55:34 2011 CEST
SMART is only available in ATA Version 3 Revision 3 or greater.
We will try to proceed in spite of this.
SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported.
                  Checking for SMART support by trying SMART ENABLE command.
                  SMART ENABLE appeared to work!  Continuing.
SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 85-87 don't show if SMART is enabled.
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
My drive is a Seagate Baracuda ES.2 ST3500320NS 500Gb SATA. And if I type "atactl /dev/wd1i smartread" i get:
Code:
atactl: ATA device returned Aborted Command
I think the atactl-command did print some output before, but i'm not sure, it was so long time ago.

In my message log I see the following:
Code:
cat /var/log/messages
Jul 24 12:00:02 server1 newsyslog[32278]: logfile turned over
Jul 24 12:00:02 server1 syslogd: restart
Jul 24 12:09:44 server1 /bsd: WARNING: /backup was not properly unmounted
Jul 24 12:09:54 server1 /bsd: wd1i: device fault reading fsbn 656400560 of 656400544-656400575 (wd1 bn 656400623; cn 40859 tn 12 sn 32), retrying
Jul 24 12:09:55 server1 /bsd: pciide1:0:1: recal drive fault
...
Jul 24 12:10:19 server1 /bsd: wd1i: device fault reading fsbn 656400544 of 656400544-656400575 (wd1 bn 656400607; cn 40859 tn 12 sn 16), retrying
Jul 24 12:10:20 server1 /bsd: pciide1:0:1: recal drive fault
Jul 24 12:10:20 server1 /bsd: wd1i: device fault reading fsbn 656400544 of 656400544-656400575 (wd1 bn 656400607; cn 40859 tn 12 sn 16)
Reply With Quote
Old 24th July 2011
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,975
Default

You must run SMART commands with the drive, not your partition. Use wd1c.
Reply With Quote
Old 24th July 2011
Quaxo Quaxo is offline
Port Guard
 
Join Date: Jun 2008
Posts: 29
Default

"smartctl -a /dev/wd1c" yields the same as before.

wd0c works but that's the other drive.

Could something be wrong with my mount?
Reply With Quote
Old 24th July 2011
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,975
Default

If what you describe is accurate -- and that is only what I can determine from your posting here -- your drive's electronics are not operating properly. If your power failure included either a power surge or a period of low voltage (or missing phases of AC), this might be the cause. But your drive's electronics are at least functioning, as the device does respond to I/O requests, sometimes successfully.

As for your question on "mounts" -- Whatever is occurring has nothing to do with a mount. Mounts are merely logical attachments of filesystems to the OS.

---

I do not clearly understand how you ended up with an "i" partition on wd1, but based on your technical background as presented in this thread, I will assume you are using a foreign MBR partition rather than a BSD disklabel on the drive -- the OS will create a virtual disklabel and assign foreign partitions it recognizes to BSD partitions beginning with "i'. And, whether or not this partition was ever actually an FFS filesystem or not is now immaterial - your kernel messages alone are proof of hardware problems -- an inability to read some sectors.

Since your forced, read only mount succeeded, the drive was able to return the sectors containing the primary superblock, which begins at sector #8.

When attempting to read the root directory (inode #2, if this is an FFS filesystem), your "ls" command appeared to hang. Understand that as the drive spins, the electronics may attempt to read the same sector repeatedly, in an attempt to extract valid information. Dozens, or hundreds of times. It must wait for a complete rotation of the drive each time it tries, and that is relatively slow. Eventually, kernel messages will be produced, showing timeouts (from retrying reads over and over, and the OS gives up waiting) and read errors (when the electronics on the drive gives up before the OS does). If you issue the "ls" command from the console, you would see these kernel messages appear while you waited.

Unfortunately, with the root inode unreadable, there is not much further you yourself will be able to do to extract useful data from the drive. If the root directory were available, you might be able to extract undamaged files, and traverse other undamaged directories. But it is not.

A skilled technician may be able to copy undamaged sectors from the drive, and reassemble some of the content into meaningful files. But that would be a manual, difficult, and long effort, with no guarantees.

As for your unreadable sectors, some of them might be readable by commercial laboratories that specialize in data recovery from disk drives. This would be many thousands of Dollars or Euros, and of course there are no guarantees, depending upon the underlying physical damage to the media.

---

If you wish to give up on the existing data on the drive, you may start destructive testing, and see if the drive can be returned to useful function. To do that, install e2fsprogs, dismount /backup, and use the badblocks program against the entire drive, rwd1c or wd1c, I can't remember which badblocks prefers. Use -p 1, so that badblocks continues to run until no new failures are discovered -- so that all bad blocks have been successfully replaced with spare sectors, and -w, so that badblocks writes and tests various bit patterns. See badblocks(8) for details.

Last edited by jggimi; 24th July 2011 at 07:50 PM.
Reply With Quote
Old 24th July 2011
Quaxo Quaxo is offline
Port Guard
 
Join Date: Jun 2008
Posts: 29
Default

It was a thunderstorm that caused the power failure and the lights did blink for a few seconds like in a horror movie. I could understand if some files were lost due to a uneven current, but a complete disc destroyed!

I don't know what to do now. I'll try fiddle with it a bit more and see if I can get it to list some files.

Thank you very much for your help!!! I post again later on (perhaps in a few days) when I have tried some more.

/Quaxo

Edit: Oh the letter "i". I don't remember but this was done during installation and I didn't make any notes about this particular part. So it's probably as you say, a virtual disklabel. If it helps my fstab looks as follows:
Code:
cat /etc/fstab
/dev/wd0a / ffs rw 1 1
/dev/wd0h /home ffs rw,nodev,nosuid 1 2
/dev/wd0i /home/someuser ffs rw,nodev,nosuid 1 2
#/dev/wd1i /backup ffs ro,nodev,nosuid 1 2
/dev/wd0d /tmp ffs rw,nodev,nosuid 1 2
/dev/wd0g /usr ffs rw,nodev 1 2
/dev/wd0e /var ffs rw,nodev,nosuid 1 2

Last edited by Quaxo; 24th July 2011 at 08:48 PM. Reason: The "i" letter
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Create MS/XP file system so it will be recognized on a XP system. FBSD Guides 0 1st May 2010 06:49 AM
System stops booting after "GEOM_MIRROR: Force device gm0 start due to timeout." indienick FreeBSD General 1 25th March 2010 09:46 PM
Live OS with file system encryption? eurovive Other BSD and UNIX/UNIX-like 0 18th February 2010 06:56 PM
File system at more than 100% michaelrmgreen FreeBSD General 4 28th July 2008 01:52 PM
Which file system use to share data on Bsd system? aleunix Other BSD and UNIX/UNIX-like 2 1st June 2008 04:14 PM


All times are GMT. The time now is 10:14 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick