• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Aaron Weiss

  • Home
  • Blog
  • About
  • Contact

backups

Proxmox ZFS Disk Replacement and Drive Expansion

December 30, 2021 by Aaron Weiss

Replacing a drive in a pool using the Proxmox VE 6.3 web interface doesn’t allow you to replace a disk in a pool. Instead, you’ll need to do it through the command line. I wrote this guide as I needed to perform this replacement myself. This guide can work on any ZFS system, not just Proxmox.

What’s Being Replaced

I’ve had two 1TB drives that both were purchased well over a decade ago. Roughly, when these drives were about $100 retail. So, very old. Both of these drives have been used as one of the backup pools in my Proxmox for VMs in a RAIDZ1, which is essentially a mirror, meaning, one drive can fail, and the data will be safe. Which is exactly what happened here.

Proxmox had flagged one of the drives as having some SMART errors, and marked the pool as degraded. Knowing the age of these drives, I knew it was time to replace them both.

While I had issues with purchasing White Label drives when I first built my TrueNAS server a few years ago, I chose to try it out again. The same Amazon seller, GoHardDrive, had 2 TB NAS HDD drives for $45.

Adding the new Drives

Luckily, this drive didn’t fail entirely. So I’ll be off-lining the disk first, and then replacing the devices. This will be a lengthy process because each I can only replace one disk at a time, because the Pool will resliver – or copy the data – to the new disk.

I thought about creating just another pool with these new disks and removing the old one. That’s the smarter thing to do, it really is. However, I needed to challenge myself.

I had two empty SATA drives which makes this process simpler as I only have to shutdown the device once before I add the new drives, and then again after I remove the old drives.

First, you need to find out the device and the device ids. I’m using lsblk, and grepping with “sd” to only display SATA devices.

lsblk | grep sd
sda 8:0 0 223.6G 0 disk
├─sda1 8:1 0 223.6G 0 part
└─sda9 8:9 0 8M 0 part
sdb 8:16 0 223.6G 0 disk
├─sdb1 8:17 0 223.6G 0 part
└─sdb9 8:25 0 8M 0 part
sdc 8:32 0 1.8T 0 disk
├─sdc1 8:33 0 1.8T 0 part
└─sdc9 8:41 0 8M 0 part
sdd 8:48 0 931.5G 0 disk
├─sdd1 8:49 0 931.5G 0 part
└─sdd9 8:57 0 8M 0 part
sde 8:64 0 1.8T 0 disk
sdf 8:80 0 931.5G 0 disk
├─sdf1 8:81 0 931.5G 0 part
└─sdf9 8:89 0 8M 0 part
sdg 8:96 0 111.8G 0 disk
├─sdg1 8:97 0 1007K 0 part
├─sdg2 8:98 0 512M 0 part
└─sdg3 8:99 0 111.3G 0 part
sdh 8:112 0 111.8G 0 disk
├─sdh1 8:113 0 1007K 0 part
├─sdh2 8:114 0 512M 0 part
└─sdh3 8:115 0 111.3G 0 part

I know that I’m replacing 1TB drives with 2TB drives. Therefore, the old disks are sdd and sdf, and the new disks are sdc and sde.

Now I need to find out their device-id. I used the following command for each drive by replacing the letter after sdc.

root@johnny5:~# ls -la /dev/disk/by-id | grep sdc
lrwxrwxrwx 1 root root 9 Dec 29 17:16 ata-WL2000GSA6454_WD-WMAY02272888 -> ../../sdc
lrwxrwxrwx 1 root root 10 Dec 29 17:16 ata-WL2000GSA6454_WD-WMAY02272888-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Dec 29 17:16 ata-WL2000GSA6454_WD-WMAY02272888-part9 -> ../../sdc9
lrwxrwxrwx 1 root root 9 Dec 29 17:16 wwn-0x50014ee6abcf8e35 -> ../../sdc
lrwxrwxrwx 1 root root 10 Dec 29 17:16 wwn-0x50014ee6abcf8e35-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Dec 29 17:16 wwn-0x50014ee6abcf8e35-part9 -> ../../sdc9

Now I can see that sdc’s device-id is wwn-0x50014ee6abcf8e35. I ran this three more times with all the drives:

2TB sdc wwn-0x50014ee6abcf8e35
2 TB sde wwn-0x50014ee05808492d
1 TB sdd wwn-0x50014ee2ad4b90e8
1 TB sdf wwn-0x5000c500116662ed

Now that I have this information, I can perform the following: zpool offline hdd_pool wwn-0x5000c500116662ed

This removes the old device from operation within the pool. However, the disk is still within the pool. Next, I replace the old drive with the new drive: zpool replace hdd_pool wwn-0x5000c500116662ed wwn-0x50014ee6abcf8e35

hdd_pool is the name of this pool, because its the old drives on this server that uses HDDs. This immediately starts the resilver process between the two disks in the pool and the new drive. I can watch progress using zpool status hdd_pool:

root@johnny5:~# zpool status hdd_pool
  pool: hdd_pool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Dec 29 19:15:59 2021
        471G scanned at 4.58G/s, 7.08G issued at 70.4M/s, 471G total
        7.08G resilvered, 1.50% done, 0 days 01:52:31 to go
config:
        NAME                          STATE     READ WRITE CKSUM
        hdd_pool                      DEGRADED     0     0     0
          mirror-0                    DEGRADED     0     0     0
            replacing-0               DEGRADED     0     0     0
              wwn-0x5000c500116662ed  OFFLINE      0     0     1
              wwn-0x50014ee6abcf8e35  ONLINE       0     0     0  (resilvering)
            wwn-0x50014ee2ad4b90e8    ONLINE       0     0     0
errors: No known data errors

However, you can watch the progress of the resilvering within the Proxmox UI. Go to Datacenter > Node > Disks > ZFS. Then double-click the pool in question.

Proxmox ZFS Pool Detail
Example of Proxmox’s ZFS Pool Details and the resilvering process.

It took about 2.5 hours to resilver each drive. Of course, now I have to do this whole thing again with the other two drives:

zpool offline hdd_pool wwn-0x50014ee2ad4b90e8;zpool replace hdd_pool wwn-0x50014ee2ad4b90e8 wwn-0x50014ee05808492d
Another 2 hours later, and another zpool status hdd_pool displays:

root@johnny5:~# zpool status hdd_pool 
  pool: hdd_pool 
 state: ONLINE 
  scan: resilvered 472G in 0 days 01:13:55 with 0 errors on Wed Dec 29 22:46:55 2021 
config:
        NAME                        STATE     READ WRITE CKSUM
        hdd_pool                    ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            wwn-0x50014ee6abcf8e35  ONLINE       0     0     0
            wwn-0x50014ee05808492d  ONLINE       0     0     0
errors: No known data errors

Now, our two new disks have fully resilvered and our pool is working as expected.

Expand Disks to Use All Available Disk Space

Next, we need to expand the disks to use all the available disk space. Otherwise, the pool will only continue to use the 1 TB of space that the original two drives offered:

Proxmox ZFS Disk Pre-expanded
Example of the pool working, but only showing the original usable disk space in the pool
root@johnny5:~# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
hdd_pool  928G 471G   472G  1.35T        -         -     1%    25%  1.00x    ONLINE  -
rpool      111G  17.0G  94.0G        -         -     6%    15%  1.00x    ONLINE  -
sdd_pool   222G  70.9G   151G        -         -    36%    31%  1.00x    ONLINE  -
vm_pool   3.62T   589G  3.05T        -         -    22%    15%  1.00x    ONLINE  -

To do this, I ran the following:

zpool online -e hdd_pool wwn-0x50014ee6abcf8e35
zpool online -e hdd_pool wwn-0x50014ee05808492d

This command expands the disks within the hdd_pool for each of the device-ids.

root@johnny5:~# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
hdd_pool  1.81T   472G  1.35T        -         -     1%    25%  1.00x    ONLINE  -
rpool      111G  17.0G  94.0G        -         -     6%    15%  1.00x    ONLINE  -
sdd_pool   222G  70.9G   151G        -         -    36%    31%  1.00x    ONLINE  -
vm_pool   3.62T   589G  3.05T        -         -    22%    15%  1.00x    ONLINE  -

Now you can see that hdd_pool has expanded to use both disks fully:

Proxmox Pool expanded
Pool now uses all the available disk space of the new disks.

Run SMART Tests

Honestly, this should have been the first step, but I forgot to do it. This step runs a long SMART test:

smartctl -t long /dev/sdc; smartctl -t long /dev/sde

Please wait 295 minutes for test to complete.

This command runs the SMART long test on both the sdc and sde disks, which are the two new disks. While this is running, I decided to just run a quick report on both of them:

smartctl -a /dev/sdc;smartctl -a /dev/sde

=== START OF INFORMATION SECTION ===
Device Model: WL2000GSA6454
Serial Number: WD-WMAY02097019
LU WWN Device Id: 5 0014ee 05808492d
Firmware Version: 00.0NS03
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Wed Dec 29 23:22:50 2021 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

The areas marked in bold are the values that show that I received the wrong disks. I was supposed to receive two 2TB SATA III drives at 5400RPM. This is the second time that I recieved drives that were incorrectly described by this White Label drive manufacturer. I sent the seller on Amazon a questing why this states this, so I’ll wait to hear back.

Conclusion

Regardless of the incorrectly described product, these two drives are working, quiet, and have increased my backup capabilities. This pool is strictly used to as destination for VM backups. The same backups are also sent to my TrueNAS device which is my primary backup destination.

I hope this article was helpful in replacing your ZFS pool with new disks.

Filed Under: Proxmox Tagged With: backups, hard drive, hdd, proxmox, resilvering, zfs, zfs mirror

How My WordPress Website Got Hacked and How I Recovered

November 26, 2019 by Aaron Weiss

My WordPress website was hacked, and it was super embarrassing.

Just when my recent blog post about why you shouldn’t download nulled versions of BackupBuddy was starting to rank well for various keywords and gaining some decent traffic, my site began to redirect to another website. I couldn’t log into my website at all. I wasn’t able to find much information about this particular hack to fix it especially since I couldn’t gain access to my site.

However, I still had access to my server, and because I had an awesome disaster and recovery plan I was able to return my website back to a running instance quickly.

Why did my Website get Hacked?

I have not figured out what exactly happened. It could have been a bad plugin, which is making me reconsider what plugins are really necessary. I’ve always felt that the plugins that I’ve chosen were solid, but time to weed out plugins whose features can be moved to a functions.php file or other implementation.

I had also moved to Austin, TX, and not updated my site as I normally had done. I’d say this was my biggest mistake. I should have found time to maintain my website. I knew this in the back of my mind, and I didn’t commit to it.

How I Recovered My Site

Typically, I wold have ran a BackupBuddy recovery using importbuddy.php. However, since my website and dashboard was redirecting to another website, I was unable to access my site from a browser. Therefore, that was out of the picture.

Since I still had access to my server, I was able to utilize Digital Ocean’s backups and recover my site from a version that was less than one week old. Given that I didn’t have any new publishes or changes made to the website, this was fine and worked.

What are the Plans for the Future?

Essentially, better maintenance and updating of the website and platform on a more regular and automated basis.

I’ve previously created Bash scripts that check the site’s core installation, theme, and plugins for any known CVE vulnerabilities, created a full site backup, then optimizes the database, and notifies my by email was updates are available. However, the CVE vulnerability check stopped working. Since I was busy moving, I never had a chance to see this gap. However, this has been corrected as of late.

I don’t believe in automatic updates as any update can cause problems and I like to test updates, especially core and theme updates, very carefully before I commit. So my future automation will take that into consideration.

How Do I Feel About This Now?

I’m okay about it. It’s embarrassing, but I’ve also realized that it’s okay. It happens. I had a plan to recover and executed it perfectly. This happens with WordPress websites, and it gives me a chance to recognize the gaps in my WordPress maintenance and re-commit to what’s necessary for my website.

The absolute worst thing about this is that I lost lots of momentum with some SEO traffic for my BackupBuddy article, but that’s the name of the game. I believe if I continue to work on creating a great website, I don’t have much to work about long-term and my rankings will return.

Filed Under: Website Administration, WordPress Tagged With: backups, hacked, wordpress, wordpress maintenance

You Need a Backup and Disaster Recovery Plan

June 15, 2019 by Aaron Weiss

You’ll never realize that a backup and and disaster recovery plan will help you sleep better at night if you run a website, even if you never have to recover your website.

Recently, two hosting platforms and their users suffered missteps.

a2 Hosting, a shared hosting provider I’ve been using since 2013, has had their Windows servers shut down for over a week as the company suffers from a ransomware attack. Additionally, the available backups the company has for customers appear to be over 2 months old.

DigitalOcean mistook a user’s script as a crypto-mining operation, and shut down a startup’s servers.

I’m fortunate to not be affected as my a2 Hosting account is Linux-based, and my DigitalOcean VPS is a low-profile risk. However, this is devastating for these company and their users. I’m sure there are terms of service policies that cover these hosting companies for situations like these to a certain aspect.

There’s a much to learn from these situations, and this is a good time to reflect on having plans for your website in situations like these.

Restorable backup plan

There’s no excuse not to have a backup plan and infrastructure for your computer, websites, and any important data. Here are some of the backups I have set in my digital life:

  • For my main computer, I have a full weekly backup with daily incremental backups, that are then synced to my FreeNAS box, which are also synced to a Backblaze B2 Bucket.
  • For my FreeNAS server, I have a backup of the config file that is backed up to Dropbox, Backblaze, and mirrored on a second USB drive.
  • For my websites, my entire cPanel host instance is backed up each week, and then downloaded to my FreeNAS server. The individual websites have backups with BackupBuddy which have weekly and daily schedules relative to their respective performance, which are then synced with Dropbox. Some sites also backup to Amazon S3.

As you can see, take my data very seriously. Some data has 3 or 4 destinations. I’m ready to launch a new computer image, FreeNAS box, or return my entire cPanel instance or individual website back from the dead.

In fact, I recently had a botched release of improvements to this very website that went poorly. I was able to bring the site back up in less than 30 minutes because I have the infrastructure and documentation in place to recover.

Disaster Recovery Plan and Exercises

Just having a back up isn’t enough. Knowing how to recover those backups is an important aspect too.

In the case of a2 Hosting, had someone had recent backups of their website, they could have found a new service, restored their backups, and changed their DNS to the new service. After DNS propagation, a website could return to full operations within 24 hours at the latest.

At my day job, I’ve participated in Disaster Recovery Exercises where I help validate whether or not applications I use can perform critical tasks after the recovery begins. It’s a boring exercise, but I now see how important it really is.

My recommendation is to have a test environment that is nearly identical to your site’s live environment, do something to make it no longer work, and then restore the site from a backup. You might even want to try and see if you can also find a new vendor and restore your site to that vendor. Having that knowledge will help you sleep better at night.

Planning for the Future

Despite this situation, I’m still sticking with a2 Hosting and DigitalOcean for the immediate future. a2 Hosting has been a great partner, and I have only a few support tickets with them since I started. I know if I was on the other side of the table, I’d be furious. Companies make mistakes, and no company is infallible.

The moral of this story is: a company as large as these two companies are, they should have had their customer’s data backed up to a separate location (although customers should be responsible for their own data) and had a plan in place to return their service to functionality at a quicker pace.

You don’t have to be a2 Hosting or DigitalOcean, or their users. You now have the knowledge to be better.

Filed Under: Website Administration Tagged With: a2 hosting, amazon s3, backblaze, backup plans, backups, digitalocean, disaster recovery, dropbox

Primary Sidebar

Recent Posts

  • TrueNAS Virtual Machine with Ubuntu 18.04 Halting with Signal 11
  • Proxmox ZFS Disk Replacement and Drive Expansion
  • How To Create ZFS Backups in Proxmox
  • Multiple UPS on the same NUT-Server
  • Learning Graylog for Fun and Profit

Categories

  • Film & Television
  • Google Analytics
  • Guitar
  • Lifestyle
  • Projects
  • Proxmox
  • Search Engine Optimization
  • Technology
  • TrueNAS
  • Uncategorized
  • Website Administration
  • WordPress
  • Home
  • Blog
  • About Aaron Weiss
  • Contact
  • Privacy Policy
  • Affiliate Disclosure

© Aaron Weiss. Built with WordPress using the Genesis Framework. Hosted on a2 Hosting and DigitalOcean. Aaron Weiss's LinkedIn Profile