DaemonForums  

Go Back   DaemonForums > FreeBSD > FreeBSD General

FreeBSD General Other questions regarding FreeBSD which do not fit in any of the categories below.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 2nd June 2008
lil_elvis2000 lil_elvis2000 is offline
Port Guard
 
Join Date: May 2008
Location: The armpit of England
Posts: 21
Default mirror device detached on large file copy

Hello running FreeBDs 7.0 on a Celeron 800Mhz and I copied a 1.8G file across from my Windows box to the FreeBSD box. It finished the copy..or during the copy I got the following:

kernel : ad6 : FAILURE - device detached
kernel : subdisk6: detached
kernel : ad6 : detached
kernel : GEOM_MIRROR : device dat : provider ad6 disconnected.
kernel : g_vfs_done():mirror/dats1d[READ(offset=196937613312, length=16384)]error=6

I can only think that perhaps its a time out?

the configuration is roughly thus
- ad4 - 320G SATA
ad6 - 320G SATA

I put a newfs on ad4 but not ad6. Thought the gmirror would take care of that. I have put other files of several hundred megabytes without a problem. But when I tried this 1.8G file this error happened.

This is first problem I have had with the mirror.

Any tips?
Reply With Quote
  #2   (View Single Post)  
Old 6th June 2008
lil_elvis2000 lil_elvis2000 is offline
Port Guard
 
Join Date: May 2008
Location: The armpit of England
Posts: 21
Default

Well I've had further problems. I was extracting files from the large gzip and the mirror crashed again. This time it was ad4.
I did a "gmirror forget" and then tried to re-insert ad4. but could not..ad4 did not appear in the /dev folder either.
So I did a reboot. and to my horror..gmirror remounted ad4 , but not ad6, in a DEGRADED state.
and when I did a gmirror status..the screen then filled and poured a stream of g_vfs_done ERROR messages. I had to hit the power switch.

I managed to get the system back up and unmounted the mirror and destroyed it...a bit tricky as gmirror kept remounting it after a few seconds. I had to quickly do a "gmirror unload".

So after making a backup. I then relabelled
gmirror label -b split -s 2048 dat ad4 ad6
and newfs'd the mirror.
newfs /dev/mirror.dat -U

mounted and tried to copy the data back. The mirror broke again. I rebuilt the mirror and this time I left off soft-updates as I suspect there is a problem there. Remounted and copied back all the data and the mirror held together.

Don't know if that is a coincedence or not. Is there an issue with soft-updates and gmirror?
Reply With Quote
  #3   (View Single Post)  
Old 6th June 2008
radcapricorn radcapricorn is offline
Port Guard
 
Join Date: Jun 2008
Posts: 15
Default

Are you using the whole drive for the root (/) partition? In this case, there indeed may be some troubles with soft-updates, because using soft-updates for the root partition is not recommended (sysinstall even disables soft-updates for this partition by default when you create it).
Reply With Quote
  #4   (View Single Post)  
Old 6th June 2008
lil_elvis2000 lil_elvis2000 is offline
Port Guard
 
Join Date: May 2008
Location: The armpit of England
Posts: 21
Default

I am using the entire drive, but it is mounted at /home.
my other drive, a small 11GB is / and swap.
Reply With Quote
  #5   (View Single Post)  
Old 6th June 2008
gkontos's Avatar
gkontos gkontos is offline
Real Name: George
Port Guard
 
Join Date: May 2008
Location: Greece
Posts: 41
Default

If the hard drive did not appeared in the /dev directory then it might be a hardware problem with the drive. Run check disk utilities from the manufacturer of the HD.

George
__________________
...when you have excluded the impossible, whatever remains, however improbable, must be the truth.
Reply With Quote
  #6   (View Single Post)  
Old 9th June 2008
lil_elvis2000 lil_elvis2000 is offline
Port Guard
 
Join Date: May 2008
Location: The armpit of England
Posts: 21
Default

When the drive "failed" in the mirror, it is reasonable that BSD removes it from the /dev folder. But on the reboot the drive reppeared. and I have rebuilt the mirror and it is working perfectly.

I suspect a problem with Soft-updates and gmirror and perhaps certain configurations.

Haven't had chance yet to look through the log file. Wonder how I could submit this as a problem to the BSD developers.
Reply With Quote
  #7   (View Single Post)  
Old 9th June 2008
ohauer ohauer is offline
Port Guard
 
Join Date: May 2008
Location: germany
Posts: 32
Default

I had the same problem at two from 8 machines with a fresh 7.0 install. All machines are identical machines/setups.
No smartdisk failures or other errors, only timeout and then lost mirror
I scratched my head over 3 weeks with raid rebuilds since i could see the device in /dev, null it with dd and rebuild.
One day a power supply from one of the affected machine smokes away, after a replace the error was gone.
The solution for the two machines was a power supply replacement.

Maybe this can be also the case for you?
Reply With Quote
  #8   (View Single Post)  
Old 10th June 2008
lil_elvis2000 lil_elvis2000 is offline
Port Guard
 
Join Date: May 2008
Location: The armpit of England
Posts: 21
Default

have just checked the power supply in the unit. It looks like 235W.
There are two fans
three PCI cards
three HD (one IDe, two SATA)
one CD-RW

is 235W not enough to run all this? I could easily put in a 350W for not much outlay.

Last error I got was "unable to write meta data".

Jun 10 12:35:49 ChamRAID01 kernel: ad4: FAILURE - device detached
Jun 10 12:35:49 ChamRAID01 kernel: subdisk4: detached
Jun 10 12:35:49 ChamRAID01 kernel: ad4: detached
Jun 10 12:35:49 ChamRAID01 kernel: GEOM_MIRROR: Cannot write metadata on ad4 (device=dat, error=6).
Jun 10 12:35:49 ChamRAID01 kernel: GEOM_MIRROR: Cannot update metadata on disk ad4 (error=6).
Jun 10 12:35:49 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider ad4 disconnected.

I was doing another install of oracle, which involves extracting from the gzip. The Mirror is NFS exported.

When I reboot the mirror comes back, but with ad6 detached and ad4 mounted in a DEGRADED state. Which has
happened to me many times. Why ad4 comes back on reboot? It shouldn't. The mirror should come back on reboot with ad6 in COMPLETE and then I can insert ad4 back in. Am I wrong?

Next time the disk detached..always seems to be ad4??? I just did a reboot. The mirror started and ad4 was re-inserted automatically and began rebuilding it. Strange. And both drives are busy for about 4 hours during the rebuild so can't see it being the power supply.

I just wonder if I have a corrupt CD or FreeBSD has a issue with my SATA card. (SiL 3512) or something else? I'm really scratching my head now. Ironically I chose FreeBSD for its stability.

As many people seem to have no issues...I may just junk this machine (it also has problems with my KVM) and get another MB and powersupply and hopefully that cures things.

I'm just so short of cash at the moment.

Last edited by lil_elvis2000; 10th June 2008 at 02:07 PM.
Reply With Quote
  #9   (View Single Post)  
Old 10th June 2008
halber_mensch's Avatar
halber_mensch halber_mensch is offline
Real Name: halber mensch
Port Guard
 
Join Date: Jun 2008
Location: Sapulpa, OK
Posts: 14
Default

I too lean towards a power problem here. SATA seems to be very fickle where power is concerned... it seems to me that if I soft reset my machine (AMD64 3000+, 1G ram, 2xSATA 160G, 1xMemorex CD/DVD burner on 450W supply), I get READ_DMA timeouts from the disks - it's especially a pain if I've had a power failure because my gmirror rebuild and file system fscks spit out tons of READ_DMA timeout errors, and most likely the mirror rebuild will fail with a READ_DMA failure. However, it seems that if I completely power the machine down and cold boot, the error go away. I can't explain it. So now I've got my box on a UPS running apcupsd so it never goes down ungracefully.
__________________
perl -e "eval pack(q{H*}, join q{},qw{7072696e74207061636b28717b482a7d2c717b34393 23036333631366532303666366536633739323036313733373 33735366436353230373936663735323036353738373036353 63337343635363432303734363836393733323037343666323 03632363532303631323036633639373437343663363532303 66436663732363532303635366537343635373237343631363 93665363936653637326530617d293b})"
Reply With Quote
Old 10th June 2008
gkontos's Avatar
gkontos gkontos is offline
Real Name: George
Port Guard
 
Join Date: May 2008
Location: Greece
Posts: 41
Default

Strange, I never had this problem on my home fileserver (3 years now). It is an HP ML 110G3 very low entry level model. Of course I use a cheap ups for temp power failures but I had my experiences with hard reboots.

It could be a MB problem with the SATA drives. Especially if it is old it might not be able to handle SATA differently than IDE thus soft updates do not report correct. What happens if you remove soft-updates ? Do you have the same issues?

George
__________________
...when you have excluded the impossible, whatever remains, however improbable, must be the truth.
Reply With Quote
Old 10th June 2008
Weaseal's Avatar
Weaseal Weaseal is offline
Package Pilot
 
Join Date: May 2008
Location: East Coast, US
Posts: 177
Default

Please, please be sure to send-pr this (send problem report). This sounds like a big issue, especially since multiple people are experiencing it. Sounds high-priority to me.
__________________
FreeBSD addict since 4.2-RELEASE.
My FreeBSD wiki.
Reply With Quote
Old 10th June 2008
lil_elvis2000 lil_elvis2000 is offline
Port Guard
 
Join Date: May 2008
Location: The armpit of England
Posts: 21
Default

Well, I've ordered a power supply and lets see what happens there first. Its a 380W unit so should be able to handle the loads. It seems interesting that the rebuild is okay..which is heavily hitting both disks...but large writes and reads (especially reads) fail and break gmirror.

This will be okay until I decide to replace the MB...I'll probably need to upgrade the PSU again.

this was supposed to be a cheap project.....
Reply With Quote
Old 11th June 2008
lil_elvis2000 lil_elvis2000 is offline
Port Guard
 
Join Date: May 2008
Location: The armpit of England
Posts: 21
Default

Quote:
Originally Posted by halber_mensch View Post
I too lean towards a power problem here. SATA seems to be very fickle where power is concerned... it seems to me that if I soft reset my machine (AMD64 3000+, 1G ram, 2xSATA 160G, 1xMemorex CD/DVD burner on 450W supply), I get READ_DMA timeouts from the disks - it's especially a pain if I've had a power failure because my gmirror rebuild and file system fscks spit out tons of READ_DMA timeout errors, and most likely the mirror rebuild will fail with a READ_DMA failure. However, it seems that if I completely power the machine down and cold boot, the error go away. I can't explain it. So now I've got my box on a UPS running apcupsd so it never goes down ungracefully.
This sounds like a bug to me. I would pass this on.
Reply With Quote
Old 11th June 2008
lil_elvis2000 lil_elvis2000 is offline
Port Guard
 
Join Date: May 2008
Location: The armpit of England
Posts: 21
Default

Quote:
Originally Posted by gkontos View Post
Strange, I never had this problem on my home fileserver (3 years now). It is an HP ML 110G3 very low entry level model. Of course I use a cheap ups for temp power failures but I had my experiences with hard reboots.

It could be a MB problem with the SATA drives. Especially if it is old it might not be able to handle SATA differently than IDE thus soft updates do not report correct. What happens if you remove soft-updates ? Do you have the same issues?

George
Yes same issues. whether soft-updates are enabled or not. The SATA drives are on a PCI SATA card. Can't imagine that being an issue as prior to MB integration..millions of PCs had to have IDE cards to run their hard disks. I have NOT configured the RAID 1 in the SATA card's BIOS.

However it could be a power issue, I have consulted with a couple of other people and they both think 235W isn't enough. I wonder if this also affects my USB ports and USB KVM on this machine.
Reply With Quote
Old 17th June 2008
lil_elvis2000 lil_elvis2000 is offline
Port Guard
 
Join Date: May 2008
Location: The armpit of England
Posts: 21
Default Not power... :(

Well installed the new power supply and....Nope it didn't help.

Here's what I got when I extracted a file from one Samba share to another Samba share on the same mirror:

Jun 17 10:22:48 ChamRAID01 kernel: xl0: transmission error: 90
Jun 17 10:22:48 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 120 bytes
Jun 17 10:41:33 ChamRAID01 kernel: xl0: transmission error: 90
Jun 17 10:41:33 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 180 bytes
Jun 17 10:54:49 ChamRAID01 kernel: xl0: transmission error: 90
Jun 17 10:54:49 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 240 bytes
Jun 17 10:55:12 ChamRAID01 kernel: xl0: transmission error: 90
Jun 17 10:55:12 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 300 bytes
Jun 17 10:56:22 ChamRAID01 kernel: ad4: FAILURE - device detached
Jun 17 10:56:22 ChamRAID01 kernel: subdisk4: detached
Jun 17 10:56:22 ChamRAID01 kernel: ad4: detached
Jun 17 10:56:22 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider ad4 disconnected.
Jun 17 10:56:22 ChamRAID01 kernel: g_vfs_done():mirror/dat[READ(offset=267860606976, length=131072)]error = 6
Jun 17 10:56:41 ChamRAID01 kernel: ad6: FAILURE - device detached
Jun 17 10:56:41 ChamRAID01 kernel: subdisk6: detached
Jun 17 10:56:41 ChamRAID01 kernel: ad6: detached
Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider ad6 disconnected.
Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider mirror/dat destroyed.
Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat destroyed.
Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268466061312, length=16384)]error = 6
Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268467388416, length=131072)]error = 6
Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268467519488, length=131072)]error = 6
.... A LOT of these ....

The machine then crashed. Not sure where that's logged.
It rebooted itself and then complained about file blocks on the mirror. So I did a fsck on it. The fsck completed and then I attempted a reboot..but the machine crashed. output a core and rebooted.

time to give up?
Reply With Quote
Old 17th June 2008
halber_mensch's Avatar
halber_mensch halber_mensch is offline
Real Name: halber mensch
Port Guard
 
Join Date: Jun 2008
Location: Sapulpa, OK
Posts: 14
Default

Are those xl0 transmission underruns possibly related? Try dding a large file to your mirror and see if it causes trouble.. I'm suspicious about the proximity of those underruns to the mirror failure.
__________________
perl -e "eval pack(q{H*}, join q{},qw{7072696e74207061636b28717b482a7d2c717b34393 23036333631366532303666366536633739323036313733373 33735366436353230373936663735323036353738373036353 63337343635363432303734363836393733323037343666323 03632363532303631323036633639373437343663363532303 66436663732363532303635366537343635373237343631363 93665363936653637326530617d293b})"
Reply With Quote
Old 18th June 2008
lil_elvis2000 lil_elvis2000 is offline
Port Guard
 
Join Date: May 2008
Location: The armpit of England
Posts: 21
Default

Quote:
Originally Posted by halber_mensch View Post
Are those xl0 transmission underruns possibly related? Try dding a large file to your mirror and see if it causes trouble.. I'm suspicious about the proximity of those underruns to the mirror failure.
I tried a large copy from the console..from one location to another on the mirror. Same effect - just faster! Data underrun/overrun. PIC card incompatibility?

I've turned off the mirror and have installed a CRON job to copy the files over at the end of the day. I don't have a debugging kernel either so can't debug it. But I have a core..a couple of them..incase someone can have a look at them.

I'm now considering whether openSUSE or Xubuntu would be a switch to make. But I've no time to do that for a couple of months. Just have to keep it in "crippled" mode for now.
Reply With Quote
Old 18th June 2008
halber_mensch's Avatar
halber_mensch halber_mensch is offline
Real Name: halber mensch
Port Guard
 
Join Date: Jun 2008
Location: Sapulpa, OK
Posts: 14
Default

Wait.. you got xl0 underruns copying within the mirror? Your NIC should not be in the picture at that point.
__________________
perl -e "eval pack(q{H*}, join q{},qw{7072696e74207061636b28717b482a7d2c717b34393 23036333631366532303666366536633739323036313733373 33735366436353230373936663735323036353738373036353 63337343635363432303734363836393733323037343666323 03632363532303631323036633639373437343663363532303 66436663732363532303635366537343635373237343631363 93665363936653637326530617d293b})"
Reply With Quote
Old 18th June 2008
lil_elvis2000 lil_elvis2000 is offline
Port Guard
 
Join Date: May 2008
Location: The armpit of England
Posts: 21
Default

Sorry to confuse. I got the underruns from one Samba share to another on the same mirror...from my Windows box. Then I did it again. the same copy file from the console - to see if it was the NIC or Samba. gmirror broke again.

I can't keep having these problems so I just got rid of gmirror now and have installed a CRON copy job. Ugly I know, but at least the system is stable now.

I'm now either; swap the sil 3512 card for something else (a ICH5 based card maybe?) and try again. or change to Linux. At any rate I'm very busy with a Oracle project for the next month and a half so got no cycles to spare.

Most I will do is check and clean all the drive and card connectors.
Reply With Quote
Old 18th June 2008
Weaseal's Avatar
Weaseal Weaseal is offline
Package Pilot
 
Join Date: May 2008
Location: East Coast, US
Posts: 177
Default

Did you send-pr yet? There's a chance that with an issue this urgent they'd hurry a patch into -CURRENT that you could roll in.
__________________
FreeBSD addict since 4.2-RELEASE.
My FreeBSD wiki.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Rebuilding RAIDframe mirror after crash/power failure sherekhan OpenBSD Installation and Upgrading 7 25th September 2009 10:06 PM
Have problem transfer large file bigger 1GB bsdme2 FreeBSD General 9 14th January 2009 05:49 AM
Large MFS filesystems jggimi Guides 2 26th October 2008 05:17 PM
identifying device associated with USB device? spiderpig OpenBSD General 2 7th July 2008 05:18 AM
FreeBSD 7.0 Writing large amount to USB Disc cause kernel panic pvree FreeBSD General 1 13th June 2008 02:50 AM


All times are GMT. The time now is 08:39 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick