resilvering …

The last Ubuntu kernel update seemingly kicked two hard disks out of a ZFS raidz – sigh. With ZFS on Linux this poses an issue:

Two hard drives that previously where in this ZFS pool named “storagepool” where reassigned a completely different device-id by Linux. So /dev/sdd became /dev/sdf and so on.

ZFS uses a specific metadata structure to encode information about that hard drive and it’s relationship to storage pools. When Linux reassigned a name to the hard drive apparently some things got shaken up in ZFS’ internal structures and mappings.

The solution was these steps

  • export the ZFS storage pool (=taking it offline for access/turning it off)
  • use the zpool functionality “labelclear” to clear off the data partition table of the hard drives that got “unavailable” to the storage pool
  • import the ZFS storage pool back in (=taking it online for access)
  • using the replace functionality of zpool to replace the old drive name with the new drive name.

After poking around for about 2 hours the above strategy made the storage pool to start rebuilding (resilvering in ZFS speak). Well – learning something every day.

4+ hours to go.

Bonus: I was not immediately informed of the DEGRADED state of the storage pool. That needs to change. A simple command now is run by cron-tab every hour.

zpool status -x | grep state: | tr –delete state: |mosquitto_pub -t house/stappenbach/server/poppyseeds/zpool -l

This pushes the ZFS storage pool state to MQTT and get’s worked on by a small NodeRed flow.