Replacing a drive in a pool using the Proxmox VE 6.3 web interface doesn’t allow you to replace a disk in a pool. Instead, you’ll need to do it through the command line. I wrote this guide as I needed to perform this replacement myself. This guide can work on any ZFS system, not just Proxmox.
What’s Being Replaced
I’ve had two 1TB drives that both were purchased well over a decade ago. Roughly, when these drives were about $100 retail. So, very old. Both of these drives have been used as one of the backup pools in my Proxmox for VMs in a RAIDZ1, which is essentially a mirror, meaning, one drive can fail, and the data will be safe. Which is exactly what happened here.
Proxmox had flagged one of the drives as having some SMART errors, and marked the pool as degraded. Knowing the age of these drives, I knew it was time to replace them both.
While I had issues with purchasing White Label drives when I first built my TrueNAS server a few years ago, I chose to try it out again. The same Amazon seller, GoHardDrive, had 2 TB NAS HDD drives for $45.
Adding the new Drives
Luckily, this drive didn’t fail entirely. So I’ll be off-lining the disk first, and then replacing the devices. This will be a lengthy process because each I can only replace one disk at a time, because the Pool will resliver – or copy the data – to the new disk.
I thought about creating just another pool with these new disks and removing the old one. That’s the smarter thing to do, it really is. However, I needed to challenge myself.
I had two empty SATA drives which makes this process simpler as I only have to shutdown the device once before I add the new drives, and then again after I remove the old drives.
First, you need to find out the device and the device ids. I’m using lsblk, and grepping with “sd” to only display SATA devices.
lsblk | grep sd sda 8:0 0 223.6G 0 disk ├─sda1 8:1 0 223.6G 0 part └─sda9 8:9 0 8M 0 part sdb 8:16 0 223.6G 0 disk ├─sdb1 8:17 0 223.6G 0 part └─sdb9 8:25 0 8M 0 part sdc 8:32 0 1.8T 0 disk ├─sdc1 8:33 0 1.8T 0 part └─sdc9 8:41 0 8M 0 part sdd 8:48 0 931.5G 0 disk ├─sdd1 8:49 0 931.5G 0 part └─sdd9 8:57 0 8M 0 part sde 8:64 0 1.8T 0 disk sdf 8:80 0 931.5G 0 disk ├─sdf1 8:81 0 931.5G 0 part └─sdf9 8:89 0 8M 0 part sdg 8:96 0 111.8G 0 disk ├─sdg1 8:97 0 1007K 0 part ├─sdg2 8:98 0 512M 0 part └─sdg3 8:99 0 111.3G 0 part sdh 8:112 0 111.8G 0 disk ├─sdh1 8:113 0 1007K 0 part ├─sdh2 8:114 0 512M 0 part └─sdh3 8:115 0 111.3G 0 part
I know that I’m replacing 1TB drives with 2TB drives. Therefore, the old disks are sdd and sdf, and the new disks are sdc and sde.
Now I need to find out their device-id. I used the following command for each drive by replacing the letter after sdc.
root@johnny5:~# ls -la /dev/disk/by-id | grep sdc lrwxrwxrwx 1 root root 9 Dec 29 17:16 ata-WL2000GSA6454_WD-WMAY02272888 -> ../../sdc lrwxrwxrwx 1 root root 10 Dec 29 17:16 ata-WL2000GSA6454_WD-WMAY02272888-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Dec 29 17:16 ata-WL2000GSA6454_WD-WMAY02272888-part9 -> ../../sdc9 lrwxrwxrwx 1 root root 9 Dec 29 17:16 wwn-0x50014ee6abcf8e35 -> ../../sdc lrwxrwxrwx 1 root root 10 Dec 29 17:16 wwn-0x50014ee6abcf8e35-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Dec 29 17:16 wwn-0x50014ee6abcf8e35-part9 -> ../../sdc9
Now I can see that sdc’s device-id is wwn-0x50014ee6abcf8e35. I ran this three more times with all the drives:
2TB sdc wwn-0x50014ee6abcf8e35
2 TB sde wwn-0x50014ee05808492d
1 TB sdd wwn-0x50014ee2ad4b90e8
1 TB sdf wwn-0x5000c500116662ed
Now that I have this information, I can perform the following: zpool offline hdd_pool wwn-0x5000c500116662ed
This removes the old device from operation within the pool. However, the disk is still within the pool. Next, I replace the old drive with the new drive: zpool replace hdd_pool wwn-0x5000c500116662ed wwn-0x50014ee6abcf8e35
hdd_pool is the name of this pool, because its the old drives on this server that uses HDDs. This immediately starts the resilver process between the two disks in the pool and the new drive. I can watch progress using zpool status hdd_pool:
root@johnny5:~# zpool status hdd_pool pool: hdd_pool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Dec 29 19:15:59 2021 471G scanned at 4.58G/s, 7.08G issued at 70.4M/s, 471G total 7.08G resilvered, 1.50% done, 0 days 01:52:31 to go config: NAME STATE READ WRITE CKSUM hdd_pool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 replacing-0 DEGRADED 0 0 0 wwn-0x5000c500116662ed OFFLINE 0 0 1 wwn-0x50014ee6abcf8e35 ONLINE 0 0 0 (resilvering) wwn-0x50014ee2ad4b90e8 ONLINE 0 0 0 errors: No known data errors
However, you can watch the progress of the resilvering within the Proxmox UI. Go to Datacenter > Node > Disks > ZFS. Then double-click the pool in question.
It took about 2.5 hours to resilver each drive. Of course, now I have to do this whole thing again with the other two drives:
zpool offline hdd_pool wwn-0x50014ee2ad4b90e8;zpool replace hdd_pool wwn-0x50014ee2ad4b90e8 wwn-0x50014ee05808492d
Another 2 hours later, and another zpool status hdd_pool displays:
root@johnny5:~# zpool status hdd_pool pool: hdd_pool state: ONLINE scan: resilvered 472G in 0 days 01:13:55 with 0 errors on Wed Dec 29 22:46:55 2021 config: NAME STATE READ WRITE CKSUM hdd_pool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 wwn-0x50014ee6abcf8e35 ONLINE 0 0 0 wwn-0x50014ee05808492d ONLINE 0 0 0 errors: No known data errors
Now, our two new disks have fully resilvered and our pool is working as expected.
Expand Disks to Use All Available Disk Space
Next, we need to expand the disks to use all the available disk space. Otherwise, the pool will only continue to use the 1 TB of space that the original two drives offered:
root@johnny5:~# zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT hdd_pool 928G 471G 472G 1.35T - - 1% 25% 1.00x ONLINE - rpool 111G 17.0G 94.0G - - 6% 15% 1.00x ONLINE - sdd_pool 222G 70.9G 151G - - 36% 31% 1.00x ONLINE - vm_pool 3.62T 589G 3.05T - - 22% 15% 1.00x ONLINE -
To do this, I ran the following:
zpool online -e hdd_pool wwn-0x50014ee6abcf8e35
zpool online -e hdd_pool wwn-0x50014ee05808492d
This command expands the disks within the hdd_pool for each of the device-ids.
root@johnny5:~# zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT hdd_pool 1.81T 472G 1.35T - - 1% 25% 1.00x ONLINE - rpool 111G 17.0G 94.0G - - 6% 15% 1.00x ONLINE - sdd_pool 222G 70.9G 151G - - 36% 31% 1.00x ONLINE - vm_pool 3.62T 589G 3.05T - - 22% 15% 1.00x ONLINE -
Now you can see that hdd_pool has expanded to use both disks fully:
Run SMART Tests
Honestly, this should have been the first step, but I forgot to do it. This step runs a long SMART test:
smartctl -t long /dev/sdc; smartctl -t long /dev/sde
Please wait 295 minutes for test to complete.
This command runs the SMART long test on both the sdc and sde disks, which are the two new disks. While this is running, I decided to just run a quick report on both of them:
smartctl -a /dev/sdc;smartctl -a /dev/sde
=== START OF INFORMATION SECTION === Device Model: WL2000GSA6454 Serial Number: WD-WMAY02097019 LU WWN Device Id: 5 0014ee 05808492d Firmware Version: 00.0NS03 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 2.6, 3.0 Gb/s Local Time is: Wed Dec 29 23:22:50 2021 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled
The areas marked in bold are the values that show that I received the wrong disks. I was supposed to receive two 2TB SATA III drives at 5400RPM. This is the second time that I recieved drives that were incorrectly described by this White Label drive manufacturer. I sent the seller on Amazon a questing why this states this, so I’ll wait to hear back.
Conclusion
Regardless of the incorrectly described product, these two drives are working, quiet, and have increased my backup capabilities. This pool is strictly used to as destination for VM backups. The same backups are also sent to my TrueNAS device which is my primary backup destination.
I hope this article was helpful in replacing your ZFS pool with new disks.