Replacing Failed Drive in Zfs Zpool (on Proxmox)
Dec 12, 2016 · 5 minute readCategory: linux
Recently we had one of our Proxmox machines suffer a failed disk drive.
Thankfully, replacing a failed disk in a ZFS zpool is remarkably simple if you know how.
In this example, we are using the ZFS configuration as per the Proxmox installer which also creates a boot partition which is not part of the zpool. Seems like a pretty sensible idea to me.
Here is how we can look at the status of our zpool and see that it has a failed disk:
root@cluster1 zpool status -v
pool: rpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
sdb2 ONLINE 0 0 0
sdc2 ONLINE 0 0 0
sdd2 ONLINE 0 0 0
14456048953908038050 FAULTED 0 0 0 was /dev/sdd2
So you can see that /dev/sdd2
has died and is no longer available. The numeric ID that is in place of sdd2 is important so make sure you have note of it.
Now lets assume that you have figured out which drive the failed one is, whipped it out and slotted in a shiny new replacement drive that is at least as big as the one it is replacing. The next step is to actually add in the new drive.
Step one: Know your Drive IDs
To avoid misery, you need to make absolutely sure you know which drives are which. If you replace a drive then the IDs (sda etc) can get shuffled around, so you need to double check.
The easiest way I think is to look in /dev/disk/by-id
and in there you should notice one disk that has no partitions - that is your new one.
root@cluster1 cd /dev/disk/by-id/
/dev/disk/by-id
root@cluster1 ll
total 0
drwxr-xr-x 2 root root 560 Dec 12 12:08 .
drwxr-xr-x 6 root root 120 Dec 12 12:08 ..
lrwxrwxrwx 1 root root 9 Dec 12 12:08 ata-ST1000DX001-1NS162_Z4YCRREL -> ../../sdd
lrwxrwxrwx 1 root root 10 Dec 12 12:08 ata-ST1000DX001-1NS162_Z4YCRREL-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Dec 12 12:08 ata-ST1000DX001-1NS162_Z4YCRREL-part2 -> ../../sdd2
lrwxrwxrwx 1 root root 10 Dec 12 12:08 ata-ST1000DX001-1NS162_Z4YCRREL-part9 -> ../../sdd9
lrwxrwxrwx 1 root root 9 Dec 12 12:08 ata-ST1000DX001-1NS162_Z4YCRS0J -> ../../sdc
lrwxrwxrwx 1 root root 10 Dec 12 12:08 ata-ST1000DX001-1NS162_Z4YCRS0J-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Dec 12 12:08 ata-ST1000DX001-1NS162_Z4YCRS0J-part2 -> ../../sdc2
lrwxrwxrwx 1 root root 10 Dec 12 12:08 ata-ST1000DX001-1NS162_Z4YCRS0J-part9 -> ../../sdc9
lrwxrwxrwx 1 root root 9 Dec 12 12:08 ata-ST1000DX001-1NS162_Z4YCRV9T -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 12 12:08 ata-ST1000DX001-1NS162_Z4YCRV9T-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 12 12:08 ata-ST1000DX001-1NS162_Z4YCRV9T-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Dec 12 12:08 ata-ST1000DX001-1NS162_Z4YCRV9T-part9 -> ../../sdb9
lrwxrwxrwx 1 root root 9 Dec 12 12:08 ata-ST1000DX001-1NS162_Z4YE995W -> ../../sda
lrwxrwxrwx 1 root root 9 Dec 12 12:08 wwn-0x5000c50090cca172 -> ../../sdc
lrwxrwxrwx 1 root root 10 Dec 12 12:08 wwn-0x5000c50090cca172-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Dec 12 12:08 wwn-0x5000c50090cca172-part2 -> ../../sdc2
lrwxrwxrwx 1 root root 10 Dec 12 12:08 wwn-0x5000c50090cca172-part9 -> ../../sdc9
lrwxrwxrwx 1 root root 9 Dec 12 12:08 wwn-0x5000c50090cd24c4 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 12 12:08 wwn-0x5000c50090cd24c4-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 12 12:08 wwn-0x5000c50090cd24c4-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Dec 12 12:08 wwn-0x5000c50090cd24c4-part9 -> ../../sdb9
lrwxrwxrwx 1 root root 9 Dec 12 12:08 wwn-0x5000c50090cd2ff2 -> ../../sdd
lrwxrwxrwx 1 root root 10 Dec 12 12:08 wwn-0x5000c50090cd2ff2-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Dec 12 12:08 wwn-0x5000c50090cd2ff2-part2 -> ../../sdd2
lrwxrwxrwx 1 root root 10 Dec 12 12:08 wwn-0x5000c50090cd2ff2-part9 -> ../../sdd9
lrwxrwxrwx 1 root root 9 Dec 12 12:08 wwn-0x5000c50092c1f5b2 -> ../../sda
So in our example, the new disk is sda
Step two: Partitions
We need to get our drive set up with the right partition table. Thankfully this is easy enough because we can just copy this from a healthy drive.
Warning - make sure you have the next command right before running it
# Use these variables to make sure you have this the right way around
newDisk='/dev/sda'
healthyDisk='/dev/sdb'
sgdisk -R "$newDisk" "$healthyDisk"
sgdisk -G "$newDisk
Step three: Boot partition
In our example, partition one is boot and can be just copied directly from a healthy disk (we will sort out ZFS partition later)
# Use these variables to make sure you have this the right way around
newDiskBootPartition='/dev/sda1'
healthyDiskBootPartition='/dev/sdb1'
dd if="$healthyDiskBootPartition" of="$newDiskBootPartition" bs=512
Step four: Add to zpool
Now we are going to add the new disk to the zpool and replace the failed one.
newDiskZFSPartition='/dev/sda2`
#Put your failed disk ID here - as reported in `zpool status -v` - eg 14456048953908038050
failedDiskPartitionID=''
zpool replace rpool "$failedDiskPartitionID" "$newDiskZFSPartition"
That should give you the warning: Make sure to wait until resilver is done before rebooting.
You can keep track of the reslivering process by running zpool status -v
eg
root@cluster1 zpool status -v
pool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Dec 12 12:14:43 2016
91.9M scanned out of 1.87T at 7.66M/s, 71h0m to go
22.6M resilvered, 0.00% done
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
sdb2 ONLINE 0 0 0
sdc2 ONLINE 0 0 0
sdd2 ONLINE 0 0 0
replacing-3 UNAVAIL 0 0 0
14456048953908038050 FAULTED 0 0 0 was /dev/sdd2
sda2 ONLINE 0 0 0 (resilvering)
errors: No known data errors
Note on this, the time to go (eg 71h0m) was wildly pessimistic - actually took around 4 hours
Step five: Reboot
Once the reslivering process has finished, you can reboot the machine to make sure that everything is back to normal health
root@cluster1 zpool status -v
pool: rpool
state: ONLINE
scan: resilvered 456G in 4h42m with 0 errors on Mon Dec 12 16:57:37 2016
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sdc2 ONLINE 0 0 0
sdd2 ONLINE 0 0 0
sde2 ONLINE 0 0 0
sdb2 ONLINE 0 0 0
errors: No known data errors