Homelab, Linux, JS & ABAP (~˘▾˘)~
 

[ZFS] Replace failed disk on my Proxmox Host


Yesterday evening I got an email that on my Proxmox server a disk had failed. In my ZFS Raidz1 I have 4 different drives of two manufactures: 2x HGST and 2x Seagate.
In the last 7 years I also used some Western Digitals. The only faulty hard drives I had in this years were from Seagate. This was the third… So this morning I bought a new hard disk, this time a Western Digital Red, and replaced the failed disk.

SSH into my server and checked the zpool data. Because I already removed the failed disk, it’s marked as unavailable.

failed disk: wwn-0x5000c5009c14365b

Now I had to find the Id of my new disk. With fdisk -l, I found my new disk as /dev/sde, but there was no id listed.

sudo fdisk -l

To be sure I checked again with:

sudo lsblk -f

With disk by-id I now got the Id.

ls /dev/disk/by-id/ -l | grep sde

new disk: ata-WDC_WD40EFRX-68N32N0_WD-WCC7K1CSDLRT
and again the failed disk: wwn-0x5000c5009c14365b

Before replacing the disks, I did a short SMART test.

sudo smartctl -a /dev/sde
sudo smartctl -t short /dev/sde
sudo smartctl -a /dev/sde

The new disk had no errors. And because it is a new disk, I don’t had to wipe any file systems from it.

So first I took the failed disk offline. Not sure if that was necessary, because I already had removed the disk.

sudo zpool offline data 2664887927330352988

Next run the replace command.

sudo zpool replace data /dev/disk/by-id/wwn-0x5000c5009c14365b-part2
/dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K1CSDLRT

The resilver process for the 3TB disk took about 10 hours.

One comment

  1. Pingback: ProxmoxでZFS RAID10とRAIDZ1の検証④ ~復旧テスト編~ | 鯖屋さん – 初心者インフラエンジニアの構築メモ

Leave a Reply

Your email address will not be published. Required fields are marked *