|
FreeBSD General Other questions regarding FreeBSD which do not fit in any of the categories below. |
|
Thread Tools | Display Modes |
|
|
|||
mirror device detached on large file copy
Hello running FreeBDs 7.0 on a Celeron 800Mhz and I copied a 1.8G file across from my Windows box to the FreeBSD box. It finished the copy..or during the copy I got the following:
kernel : ad6 : FAILURE - device detached kernel : subdisk6: detached kernel : ad6 : detached kernel : GEOM_MIRROR : device dat : provider ad6 disconnected. kernel : g_vfs_done():mirror/dats1d[READ(offset=196937613312, length=16384)]error=6 I can only think that perhaps its a time out? the configuration is roughly thus - ad4 - 320G SATA ad6 - 320G SATA I put a newfs on ad4 but not ad6. Thought the gmirror would take care of that. I have put other files of several hundred megabytes without a problem. But when I tried this 1.8G file this error happened. This is first problem I have had with the mirror. Any tips? |
|
|||
Well I've had further problems. I was extracting files from the large gzip and the mirror crashed again. This time it was ad4.
I did a "gmirror forget" and then tried to re-insert ad4. but could not..ad4 did not appear in the /dev folder either. So I did a reboot. and to my horror..gmirror remounted ad4 , but not ad6, in a DEGRADED state. and when I did a gmirror status..the screen then filled and poured a stream of g_vfs_done ERROR messages. I had to hit the power switch. I managed to get the system back up and unmounted the mirror and destroyed it...a bit tricky as gmirror kept remounting it after a few seconds. I had to quickly do a "gmirror unload". So after making a backup. I then relabelled gmirror label -b split -s 2048 dat ad4 ad6 and newfs'd the mirror. newfs /dev/mirror.dat -U mounted and tried to copy the data back. The mirror broke again. I rebuilt the mirror and this time I left off soft-updates as I suspect there is a problem there. Remounted and copied back all the data and the mirror held together. Don't know if that is a coincedence or not. Is there an issue with soft-updates and gmirror? |
|
|||
Are you using the whole drive for the root (/) partition? In this case, there indeed may be some troubles with soft-updates, because using soft-updates for the root partition is not recommended (sysinstall even disables soft-updates for this partition by default when you create it).
|
|
|||
I am using the entire drive, but it is mounted at /home.
my other drive, a small 11GB is / and swap. |
|
|||
When the drive "failed" in the mirror, it is reasonable that BSD removes it from the /dev folder. But on the reboot the drive reppeared. and I have rebuilt the mirror and it is working perfectly.
I suspect a problem with Soft-updates and gmirror and perhaps certain configurations. Haven't had chance yet to look through the log file. Wonder how I could submit this as a problem to the BSD developers. |
|
||||
I too lean towards a power problem here. SATA seems to be very fickle where power is concerned... it seems to me that if I soft reset my machine (AMD64 3000+, 1G ram, 2xSATA 160G, 1xMemorex CD/DVD burner on 450W supply), I get READ_DMA timeouts from the disks - it's especially a pain if I've had a power failure because my gmirror rebuild and file system fscks spit out tons of READ_DMA timeout errors, and most likely the mirror rebuild will fail with a READ_DMA failure. However, it seems that if I completely power the machine down and cold boot, the error go away. I can't explain it. So now I've got my box on a UPS running apcupsd so it never goes down ungracefully.
__________________
perl -e "eval pack(q{H*}, join q{},qw{7072696e74207061636b28717b482a7d2c717b34393 23036333631366532303666366536633739323036313733373 33735366436353230373936663735323036353738373036353 63337343635363432303734363836393733323037343666323 03632363532303631323036633639373437343663363532303 66436663732363532303635366537343635373237343631363 93665363936653637326530617d293b})" |
|
||||
Strange, I never had this problem on my home fileserver (3 years now). It is an HP ML 110G3 very low entry level model. Of course I use a cheap ups for temp power failures but I had my experiences with hard reboots.
It could be a MB problem with the SATA drives. Especially if it is old it might not be able to handle SATA differently than IDE thus soft updates do not report correct. What happens if you remove soft-updates ? Do you have the same issues? George
__________________
...when you have excluded the impossible, whatever remains, however improbable, must be the truth. |
|
|||
Quote:
However it could be a power issue, I have consulted with a couple of other people and they both think 235W isn't enough. I wonder if this also affects my USB ports and USB KVM on this machine. |
|
|||
Not power... :(
Well installed the new power supply and....Nope it didn't help.
Here's what I got when I extracted a file from one Samba share to another Samba share on the same mirror: Jun 17 10:22:48 ChamRAID01 kernel: xl0: transmission error: 90 Jun 17 10:22:48 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 120 bytes Jun 17 10:41:33 ChamRAID01 kernel: xl0: transmission error: 90 Jun 17 10:41:33 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 180 bytes Jun 17 10:54:49 ChamRAID01 kernel: xl0: transmission error: 90 Jun 17 10:54:49 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 240 bytes Jun 17 10:55:12 ChamRAID01 kernel: xl0: transmission error: 90 Jun 17 10:55:12 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 300 bytes Jun 17 10:56:22 ChamRAID01 kernel: ad4: FAILURE - device detached Jun 17 10:56:22 ChamRAID01 kernel: subdisk4: detached Jun 17 10:56:22 ChamRAID01 kernel: ad4: detached Jun 17 10:56:22 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider ad4 disconnected. Jun 17 10:56:22 ChamRAID01 kernel: g_vfs_done():mirror/dat[READ(offset=267860606976, length=131072)]error = 6 Jun 17 10:56:41 ChamRAID01 kernel: ad6: FAILURE - device detached Jun 17 10:56:41 ChamRAID01 kernel: subdisk6: detached Jun 17 10:56:41 ChamRAID01 kernel: ad6: detached Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider ad6 disconnected. Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider mirror/dat destroyed. Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat destroyed. Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268466061312, length=16384)]error = 6 Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268467388416, length=131072)]error = 6 Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268467519488, length=131072)]error = 6 .... A LOT of these .... The machine then crashed. Not sure where that's logged. It rebooted itself and then complained about file blocks on the mirror. So I did a fsck on it. The fsck completed and then I attempted a reboot..but the machine crashed. output a core and rebooted. time to give up? |
|
|||
Quote:
|
|
|||
Well, I've ordered a power supply and lets see what happens there first. Its a 380W unit so should be able to handle the loads. It seems interesting that the rebuild is okay..which is heavily hitting both disks...but large writes and reads (especially reads) fail and break gmirror.
This will be okay until I decide to replace the MB...I'll probably need to upgrade the PSU again. this was supposed to be a cheap project..... |
|
||||
Are those xl0 transmission underruns possibly related? Try dding a large file to your mirror and see if it causes trouble.. I'm suspicious about the proximity of those underruns to the mirror failure.
__________________
perl -e "eval pack(q{H*}, join q{},qw{7072696e74207061636b28717b482a7d2c717b34393 23036333631366532303666366536633739323036313733373 33735366436353230373936663735323036353738373036353 63337343635363432303734363836393733323037343666323 03632363532303631323036633639373437343663363532303 66436663732363532303635366537343635373237343631363 93665363936653637326530617d293b})" |
|
|||
Quote:
I've turned off the mirror and have installed a CRON job to copy the files over at the end of the day. I don't have a debugging kernel either so can't debug it. But I have a core..a couple of them..incase someone can have a look at them. I'm now considering whether openSUSE or Xubuntu would be a switch to make. But I've no time to do that for a couple of months. Just have to keep it in "crippled" mode for now. |
|
||||
Wait.. you got xl0 underruns copying within the mirror? Your NIC should not be in the picture at that point.
__________________
perl -e "eval pack(q{H*}, join q{},qw{7072696e74207061636b28717b482a7d2c717b34393 23036333631366532303666366536633739323036313733373 33735366436353230373936663735323036353738373036353 63337343635363432303734363836393733323037343666323 03632363532303631323036633639373437343663363532303 66436663732363532303635366537343635373237343631363 93665363936653637326530617d293b})" |
|
|||
Sorry to confuse. I got the underruns from one Samba share to another on the same mirror...from my Windows box. Then I did it again. the same copy file from the console - to see if it was the NIC or Samba. gmirror broke again.
I can't keep having these problems so I just got rid of gmirror now and have installed a CRON copy job. Ugly I know, but at least the system is stable now. I'm now either; swap the sil 3512 card for something else (a ICH5 based card maybe?) and try again. or change to Linux. At any rate I'm very busy with a Oracle project for the next month and a half so got no cycles to spare. Most I will do is check and clean all the drive and card connectors. |
|
|||
yes I have sent PR. but I don't have a core that I can debug and supply a backtrace. (someone asked). I have two cores which I believe were caused by GEOM. But don't know what to do with them.
IMHO I think that there is an issue with g_vfs_done with my specific configuration. Or maybe the configuration of the drives - I got that "cannot use BIOS cyl/head/track calculating my own" message from BSD. I just went with what BSD suggested. Its too bad because BSD performs very very well. |
|
|||
As an update I have managed to build a debug kernel and look at the cores. Nothing spectacular it looks pretty random to me. Today I got a core dump as well....and I'm not running gmirror! So the problem is more fundamental. I have done some tests and seems fairly random...as if there is a random memory or meta-data problem.
I was running the disks in dedicated mode? Where my mount is something like /dev/ad4 instead of /dev/ad4s1d. the BSDLabel looked pretty funny to me. two partitions, one large one at a small offset and then a large one at 0. which looked suspicious-probably a left over from gmirror label? So I've now gone into sysinstall and fdisk and label options and redone the disks. Will have to see how that runs. I've still got to finish it off tomorrow... |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Rebuilding RAIDframe mirror after crash/power failure | sherekhan | OpenBSD Installation and Upgrading | 7 | 25th September 2009 10:06 PM |
Have problem transfer large file bigger 1GB | bsdme2 | FreeBSD General | 9 | 14th January 2009 05:49 AM |
Large MFS filesystems | jggimi | Guides | 2 | 26th October 2008 05:17 PM |
identifying device associated with USB device? | spiderpig | OpenBSD General | 2 | 7th July 2008 05:18 AM |
FreeBSD 7.0 Writing large amount to USB Disc cause kernel panic | pvree | FreeBSD General | 1 | 13th June 2008 02:50 AM |