|
Other BSD and UNIX/UNIX-like Any other flavour of BSD or UNIX that does not have a section of its own. |
|
Thread Tools | Display Modes |
|
|||
[CRITICAL] zfs pool in DEGRADED condition: how to solve it?
Hello Savvy Admins/Devs,
using OpenZFS filie system on OSX (from https://openzfsonosx.org) SYSTEM CONFIGS: I'm having a very difficult situation with an open-zfs data pool (mirrored in two/two partitions, for a total of 4 discs) on an external NAS, with #4 size 3.5 inch mechanical hard disks (Seagate pro quality), driven by a macOS vs. 10.15 (Catalina), with ZFS vs. zfs-macOS-2.2.3-rc4. PROBLEM: After a first crash (with a few data errors) whose zpool output was (read the OpenZFS message at https://openzfs.github.io/openzfs-do...-HC/index.html), resulting in a 'REMOVED' state for the partition from the 2nd mirror (mirror-1, prt 1/2), and an overall 'SUSPENDED' state for the pool, INITIAL SOLUTION ATTEMPT: that got temporarily resolved with (CLI commands): $ sudo zpool online [poolName] mirror-1 (reboot of the OS) $ sudo zpool clear [poolName] After another reboot, Initially the pool was 'ONLINE' and accessible, [SEE the attached image] and ZFS started a data resilvering process. But then, all of a sudden it seems that the resilvering process stops and all the I/O commands get hanging with no possibility to proceed further, resulting in a 'mirror-1' partition in 'REMOVED' state, and a final 'DEGRADED' status for the pool. I tried more than 10 reboots +'online' the 'mirror-1' + 'clear' the pool and wait for the resilvering process to finish, but to no avail: all the times the pool is initially accessible but soon the resilvering process stops and the I/O hangs indefinitely. The final resulting state of the pool is shown in the attached image. SOLUTION? I might (superficially) guess that one of the hard drives (the one tagged as 'mirror-1 media-etc') is a faulty unit and crashes unexpectedly during the resilvering process; probably should substitute that with a new hard disk and resilver it with the $ zpool replace command? HELP I kindly ask for your expertise and help, I'm unable to proceed and I'm worrying very much for the data. Thank you in advance, with Best Regards |
|
|||
There are not any ZFS experts here. Many years ago, I played with ZFS for a while.
You figured out the remedy already and that is to remove that troublesome mirror-1 disks and replace it with a new one. Another option, that assumes that mirror-0 is OK , is to remove or disable mirror-1 from the pool and use "zfs send" to copy the mirror-0 pool data to a single big (TB's) disk. Then you will have another copy of your data. That TB disks could be a USB 3.0 or 3.1 portable one. What I remember is that "zfs send" is quite flexible in the choice of the destination disk. It also could be on your network or in another free slot of your NAS.
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump |
|
|||
Thank you for the suggestions
Hello 'J65nko'
thank you very much for your detailed info. I am collecting as much technical details as possible so to be able to perform data recovery + disk resilvering, in the correct way. For the sake of general information: hard disk producers, when dealing with high quality disks covered by warranty, have a Data Recovery Plan in place, so that for difficult data recovery operations they are set up with all that's needed. Let's hope for the best! with Best Regards |
|
|||
It is quite common that during resilvering another disks in the array or pool dies. I strongly advise to use 'zfs send' to copy the data from your now still working mirror-0 to another disk. Do not rely on a external Data Recovery Plan ;-)
BTW Reddit has a ZFS subreddit: https://www.reddit.com/r/zfs/
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump |
|
|||
OK, checked that.
Hello 'J65nko''
sure, I am following your suggestions. I already performed a thorough backup from the working mirror-0 and I will try to resilver the other mirror-1 with another HD in substitution of the faulty one. Thanks again for the precious help, with Best Regards |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Slow rsync backup from zfs pool with 99% of files | Septic | FreeBSD General | 10 | 29th May 2013 09:45 PM |
FreeBSD 8.1/7.3 vm.pmap kernel local race condition | bryn1u | FreeBSD Security | 0 | 7th September 2010 08:12 PM |
update-fetch(how to solve)... | vato | FreeBSD Installation and Upgrading | 4 | 6th September 2010 08:00 AM |
IPv4 Free Pool Drops Below 10%, 1.0.0.0/8 Allocated | J65nko | News | 4 | 29th January 2010 12:13 PM |
Moving ZFS to a other pool? | amscotti | Solaris | 1 | 28th July 2009 11:49 PM |