![]() |
|
FreeBSD General Other questions regarding FreeBSD which do not fit in any of the categories below. |
![]() |
|
Thread Tools | Display Modes |
|
|||
![]()
Hello running FreeBDs 7.0 on a Celeron 800Mhz and I copied a 1.8G file across from my Windows box to the FreeBSD box. It finished the copy..or during the copy I got the following:
kernel : ad6 : FAILURE - device detached kernel : subdisk6: detached kernel : ad6 : detached kernel : GEOM_MIRROR : device dat : provider ad6 disconnected. kernel : g_vfs_done():mirror/dats1d[READ(offset=196937613312, length=16384)]error=6 I can only think that perhaps its a time out? the configuration is roughly thus - ad4 - 320G SATA ad6 - 320G SATA I put a newfs on ad4 but not ad6. Thought the gmirror would take care of that. I have put other files of several hundred megabytes without a problem. But when I tried this 1.8G file this error happened. This is first problem I have had with the mirror. Any tips? |
|
|||
![]()
Well I've had further problems. I was extracting files from the large gzip and the mirror crashed again. This time it was ad4.
I did a "gmirror forget" and then tried to re-insert ad4. but could not..ad4 did not appear in the /dev folder either. So I did a reboot. and to my horror..gmirror remounted ad4 , but not ad6, in a DEGRADED state. and when I did a gmirror status..the screen then filled and poured a stream of g_vfs_done ERROR messages. I had to hit the power switch. I managed to get the system back up and unmounted the mirror and destroyed it...a bit tricky as gmirror kept remounting it after a few seconds. I had to quickly do a "gmirror unload". So after making a backup. I then relabelled gmirror label -b split -s 2048 dat ad4 ad6 and newfs'd the mirror. newfs /dev/mirror.dat -U mounted and tried to copy the data back. The mirror broke again. I rebuilt the mirror and this time I left off soft-updates as I suspect there is a problem there. Remounted and copied back all the data and the mirror held together. Don't know if that is a coincedence or not. Is there an issue with soft-updates and gmirror? |
|
|||
![]()
Are you using the whole drive for the root (/) partition? In this case, there indeed may be some troubles with soft-updates, because using soft-updates for the root partition is not recommended (sysinstall even disables soft-updates for this partition by default when you create it).
|
|
|||
![]()
I am using the entire drive, but it is mounted at /home.
my other drive, a small 11GB is / and swap. |
|
|||
![]()
When the drive "failed" in the mirror, it is reasonable that BSD removes it from the /dev folder. But on the reboot the drive reppeared. and I have rebuilt the mirror and it is working perfectly.
I suspect a problem with Soft-updates and gmirror and perhaps certain configurations. Haven't had chance yet to look through the log file. Wonder how I could submit this as a problem to the BSD developers. |
|
|||
![]()
I had the same problem at two from 8 machines with a fresh 7.0 install. All machines are identical machines/setups.
No smartdisk failures or other errors, only timeout and then lost mirror ![]() I scratched my head over 3 weeks with raid rebuilds since i could see the device in /dev, null it with dd and rebuild. One day a power supply from one of the affected machine smokes away, after a replace the error was gone. The solution for the two machines was a power supply replacement. Maybe this can be also the case for you? |
|
|||
![]()
have just checked the power supply in the unit. It looks like 235W.
There are two fans three PCI cards three HD (one IDe, two SATA) one CD-RW is 235W not enough to run all this? I could easily put in a 350W for not much outlay. Last error I got was "unable to write meta data". Jun 10 12:35:49 ChamRAID01 kernel: ad4: FAILURE - device detached Jun 10 12:35:49 ChamRAID01 kernel: subdisk4: detached Jun 10 12:35:49 ChamRAID01 kernel: ad4: detached Jun 10 12:35:49 ChamRAID01 kernel: GEOM_MIRROR: Cannot write metadata on ad4 (device=dat, error=6). Jun 10 12:35:49 ChamRAID01 kernel: GEOM_MIRROR: Cannot update metadata on disk ad4 (error=6). Jun 10 12:35:49 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider ad4 disconnected. I was doing another install of oracle, which involves extracting from the gzip. The Mirror is NFS exported. When I reboot the mirror comes back, but with ad6 detached and ad4 mounted in a DEGRADED state. Which has happened to me many times. Why ad4 comes back on reboot? It shouldn't. The mirror should come back on reboot with ad6 in COMPLETE and then I can insert ad4 back in. Am I wrong? Next time the disk detached..always seems to be ad4??? I just did a reboot. The mirror started and ad4 was re-inserted automatically and began rebuilding it. Strange. And both drives are busy for about 4 hours during the rebuild so can't see it being the power supply. I just wonder if I have a corrupt CD or FreeBSD has a issue with my SATA card. (SiL 3512) or something else? I'm really scratching my head now. Ironically I chose FreeBSD for its stability. As many people seem to have no issues...I may just junk this machine (it also has problems with my KVM) and get another MB and powersupply and hopefully that cures things. I'm just so short of cash at the moment. Last edited by lil_elvis2000; 10th June 2008 at 02:07 PM. |
|
||||
![]()
I too lean towards a power problem here. SATA seems to be very fickle where power is concerned... it seems to me that if I soft reset my machine (AMD64 3000+, 1G ram, 2xSATA 160G, 1xMemorex CD/DVD burner on 450W supply), I get READ_DMA timeouts from the disks - it's especially a pain if I've had a power failure because my gmirror rebuild and file system fscks spit out tons of READ_DMA timeout errors, and most likely the mirror rebuild will fail with a READ_DMA failure. However, it seems that if I completely power the machine down and cold boot, the error go away. I can't explain it. So now I've got my box on a UPS running apcupsd so it never goes down ungracefully.
__________________
perl -e "eval pack(q{H*}, join q{},qw{7072696e74207061636b28717b482a7d2c717b34393 23036333631366532303666366536633739323036313733373 33735366436353230373936663735323036353738373036353 63337343635363432303734363836393733323037343666323 03632363532303631323036633639373437343663363532303 66436663732363532303635366537343635373237343631363 93665363936653637326530617d293b})" |
|
||||
![]()
Strange, I never had this problem on my home fileserver (3 years now). It is an HP ML 110G3 very low entry level model. Of course I use a cheap ups for temp power failures but I had my experiences with hard reboots.
It could be a MB problem with the SATA drives. Especially if it is old it might not be able to handle SATA differently than IDE thus soft updates do not report correct. What happens if you remove soft-updates ? Do you have the same issues? George
__________________
...when you have excluded the impossible, whatever remains, however improbable, must be the truth. |
|
|||
![]()
Well, I've ordered a power supply and lets see what happens there first. Its a 380W unit so should be able to handle the loads. It seems interesting that the rebuild is okay..which is heavily hitting both disks...but large writes and reads (especially reads) fail and break gmirror.
This will be okay until I decide to replace the MB...I'll probably need to upgrade the PSU again. ![]() this was supposed to be a cheap project..... |
|
|||
![]() Quote:
|
|
|||
![]() Quote:
However it could be a power issue, I have consulted with a couple of other people and they both think 235W isn't enough. I wonder if this also affects my USB ports and USB KVM on this machine. |
|
|||
![]()
Well installed the new power supply and....Nope it didn't help.
Here's what I got when I extracted a file from one Samba share to another Samba share on the same mirror: Jun 17 10:22:48 ChamRAID01 kernel: xl0: transmission error: 90 Jun 17 10:22:48 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 120 bytes Jun 17 10:41:33 ChamRAID01 kernel: xl0: transmission error: 90 Jun 17 10:41:33 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 180 bytes Jun 17 10:54:49 ChamRAID01 kernel: xl0: transmission error: 90 Jun 17 10:54:49 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 240 bytes Jun 17 10:55:12 ChamRAID01 kernel: xl0: transmission error: 90 Jun 17 10:55:12 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 300 bytes Jun 17 10:56:22 ChamRAID01 kernel: ad4: FAILURE - device detached Jun 17 10:56:22 ChamRAID01 kernel: subdisk4: detached Jun 17 10:56:22 ChamRAID01 kernel: ad4: detached Jun 17 10:56:22 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider ad4 disconnected. Jun 17 10:56:22 ChamRAID01 kernel: g_vfs_done():mirror/dat[READ(offset=267860606976, length=131072)]error = 6 Jun 17 10:56:41 ChamRAID01 kernel: ad6: FAILURE - device detached Jun 17 10:56:41 ChamRAID01 kernel: subdisk6: detached Jun 17 10:56:41 ChamRAID01 kernel: ad6: detached Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider ad6 disconnected. Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider mirror/dat destroyed. Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat destroyed. Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268466061312, length=16384)]error = 6 Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268467388416, length=131072)]error = 6 Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268467519488, length=131072)]error = 6 .... A LOT of these .... The machine then crashed. Not sure where that's logged. It rebooted itself and then complained about file blocks on the mirror. So I did a fsck on it. The fsck completed and then I attempted a reboot..but the machine crashed. output a core and rebooted. time to give up? |
|
||||
![]()
Are those xl0 transmission underruns possibly related? Try dding a large file to your mirror and see if it causes trouble.. I'm suspicious about the proximity of those underruns to the mirror failure.
__________________
perl -e "eval pack(q{H*}, join q{},qw{7072696e74207061636b28717b482a7d2c717b34393 23036333631366532303666366536633739323036313733373 33735366436353230373936663735323036353738373036353 63337343635363432303734363836393733323037343666323 03632363532303631323036633639373437343663363532303 66436663732363532303635366537343635373237343631363 93665363936653637326530617d293b})" |
|
|||
![]() Quote:
I've turned off the mirror and have installed a CRON job to copy the files over at the end of the day. I don't have a debugging kernel either so can't debug it. But I have a core..a couple of them..incase someone can have a look at them. I'm now considering whether openSUSE or Xubuntu would be a switch to make. But I've no time to do that for a couple of months. Just have to keep it in "crippled" mode for now. |
|
||||
![]()
Wait.. you got xl0 underruns copying within the mirror? Your NIC should not be in the picture at that point.
__________________
perl -e "eval pack(q{H*}, join q{},qw{7072696e74207061636b28717b482a7d2c717b34393 23036333631366532303666366536633739323036313733373 33735366436353230373936663735323036353738373036353 63337343635363432303734363836393733323037343666323 03632363532303631323036633639373437343663363532303 66436663732363532303635366537343635373237343631363 93665363936653637326530617d293b})" |
|
|||
![]()
Sorry to confuse. I got the underruns from one Samba share to another on the same mirror...from my Windows box. Then I did it again. the same copy file from the console - to see if it was the NIC or Samba. gmirror broke again.
I can't keep having these problems so I just got rid of gmirror now and have installed a CRON copy job. Ugly I know, but at least the system is stable now. I'm now either; swap the sil 3512 card for something else (a ICH5 based card maybe?) and try again. or change to Linux. At any rate I'm very busy with a Oracle project for the next month and a half so got no cycles to spare. Most I will do is check and clean all the drive and card connectors. |
![]() |
Thread Tools | |
Display Modes | |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Rebuilding RAIDframe mirror after crash/power failure | sherekhan | OpenBSD Installation and Upgrading | 7 | 25th September 2009 10:06 PM |
Have problem transfer large file bigger 1GB | bsdme2 | FreeBSD General | 9 | 14th January 2009 05:49 AM |
Large MFS filesystems | jggimi | Guides | 2 | 26th October 2008 05:17 PM |
identifying device associated with USB device? | spiderpig | OpenBSD General | 2 | 7th July 2008 05:18 AM |
FreeBSD 7.0 Writing large amount to USB Disc cause kernel panic | pvree | FreeBSD General | 1 | 13th June 2008 02:50 AM |