Yesterday evening I got an email that on my Proxmox server a disk has failed. In my ZFS Raidz1 I have 4 different drives of two manufactures: 2x HGST and 2x Seagate.
In the last 7 years I also used some Western Digitals. The only faulty hard drives I had in this years were from Seagate. This was the third… So this morning I bought a new hard disk, this time a Western Digital Red, and replaced the failed disk.
SSH into my server and checked the zpool data. Because I already removed the failed disk, it’s marked as unavailable.
failed disk: wwn-0x5000c5009c14365b
Now I had to find the Id of my new disk. With fdisk -l, I found my new disk as /dev/sde, but there was no id listed.
sudo fdisk -l
To be sure I checked again with:
sudo lsblk -f
With disk by-id I now got the Id.
ls /dev/disk/by-id/ -l | grep sde
new disk: ata-WDC_WD40EFRX-68N32N0_WD-WCC7K1CSDLRT
and again the failed disk: wwn-0x5000c5009c14365b
Before replacing the disks, I did a short SMART test.
sudo smartctl -a /dev/sde sudo smartctl -t short /dev/sde sudo smartctl -a /dev/sde
The new disk had no errors. And because it is a new disk, I don’t had to wipe any file systems from it.
So first I took the failed disk offline. Not sure if that was necessary, but to be on the safe side…
sudo zpool offline data 2664887927330352988
Next run the replace command.
sudo zpool replace data /dev/disk/by-id/wwn-0x5000c5009c14365b-part2 /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K1CSDLRT
The resilver process for the 3TB disk took about 10 hours.