Yesterday evening I got an email that on my Proxmox server a disk had failed. In my ZFS Raidz1 I have 4 different drives of two manufactures: 2x HGST and 2x Seagate.
In the last 7 years I also used some Western Digitals. The only faulty hard drives I had in this years were from Seagate. This was the third… So this morning I bought a new hard disk, this time a Western Digital Red, and replaced the failed disk.
SSH into my server and checked the zpool data. Because I already removed the failed disk, it’s marked as unavailable.
failed disk: wwn-0x5000c5009c14365b
Now I had to find the Id of my new disk. With fdisk -l
, I found my new disk as /dev/sde, but there was no id listed.
sudo fdisk -l
To be sure I checked again with:
sudo lsblk -f
With disk by-id I now got the Id.
ls /dev/disk/by-id/ -l | grep sde
new disk: ata-WDC_WD40EFRX-68N32N0_WD-WCC7K1CSDLRT
and again the failed disk: wwn-0x5000c5009c14365b
Before replacing the disks, I did a short SMART test.
sudo smartctl -a /dev/sde
sudo smartctl -t short /dev/sde
sudo smartctl -a /dev/sde
The new disk had no errors. And because it is a new disk, I don’t had to wipe any file systems from it.
So first I took the failed disk offline. Not sure if that was necessary, because I already had removed the disk.
sudo zpool offline data 2664887927330352988
Next run the replace command.
sudo zpool replace data /dev/disk/by-id/wwn-0x5000c5009c14365b-part2
/dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K1CSDLRT
The resilver process for the 3TB disk took about 10 hours.
Pingback: ProxmoxでZFS RAID10とRAIDZ1の検証④ ~復旧テスト編~ | 鯖屋さん – 初心者インフラエンジニアの構築メモ