Warning
These are my very short notes about using ZFS. Be sure check manpages zfs(8) and zpool(8) and consult additional documentation in case anything is unclear. Commands may need to be changed for your systems etc.
Installation
For unattended processes it is recommended to setup
zfs-dkms
first and only afterwards do
zfsutils-linux
because otherwise the utils may want to
start some services while the modules are not loaded yet. All of the
packages are in Debian contrib
repository.
# In case it works
aptitude install zfs-dkms zfs-zed zfsutils-linux
# Safe for automation (to be verified/from memory)
apt-get install zfs-dkms
modprobe zfs
apt-get install zfsutils-linux
On live systems and other systems which are not using ZFS yet, issue
modprobe zfs
before using any of the ZFS commands.
Introspection
At any time, consult the status of ZFS as follows:
zpool status # list pools and report degraded info
zpool list # list pools short
zpool iostat 10 # display I/O stats every 10 sec.
zfs list # list file systems on pools
Creating ZFS Pools and Volumes
- The recommended mode of operation for pools is to have the whole device node as part of the pool.
- It is sufficient to give the basename of the device node.
- It is advisable to chose ID-based names from
/dev/disk/by-id
over the potentially changing/dev/sda
etc. Theby-id
approach often allows WWN-based identification and a longer string with the manufacturer name. It is advisable to chose the one that also appears on the disk’s label. - For root-on-ZFS refer to https://openzfs.github.io/openzfs-docs/Getting%20Started/Debian/Debian%20Buster%20Root%20on%20ZFS.html. Summary: There are special options that need to be used in case of root-on-ZFS. GRUB can be used w/o separate non-ZFS partition. Regular installer is not convenient to run, run live system + debootstrap or live system + installer running in live system.
Create a Pool
# zpool create ... <pool> mirror <dev0> <dev1>
zpool create -m none -o ashift=12 masysmahdd6t mirror \
ata-TOSHIBA_... ata-TOSHIBA_...
-m none
-
Avoid creating a mountpoint for the pool as-is. It is recommended to
create
zfs
inside the pool instead! -o ashift=12
- Recommended setting for advanced format devices (4K physical sectors, 212 = 4096). Recommended value for Intel P4510 SSDs is also 12. See https://wiki.lustre.org/Optimizing_Performance_of_SSDs_and_Advanced_Format_Drives and https://zfsonlinux.topicbox.com/groups/zfs-discuss/T9839aba9fccf2954-M602d366a6f37bcc2f2c6523b/zfs-optimisation-for-nvme-and-shared-hosting-scenario
Create a File System
# zfs create ... <pool>/<name>
zfs create -o reservation=300g -o quota=400g -o mountpoint=/data \
masysmahdd6t/data
-o reservation=300g
- Attempt to guarantee that this FS can hold 300 GiB (?) of data.
-o quota=400g
-
Cap the size of this FS. This is what
df -h
will display as “free”. -o mountpoint=/data
-
Mount this ZFS. Note:
/etc/fstab
is not used and the mountpoint directory is created if not already existent.
ZFS for Swap?
Using a ZFS volume for swap is not directly advised against by the manpage, but may result in system lockups, see https://github.com/openzfs/zfs/issues/7734.
Current idea is to use MDADM for swap which loses the ability to use whole devices rather than partitions but may otherwise be the most practical variant.
Regular Maintenance
Run zfs scrubs. Like mdadm RAID resync?
# zpool scrub <pool>
zpool scrub masysmahdd6t
Follow the progress with zpool status
.
Move ZFS to Other System
Before removing disks from previous server.
# zpool export <pool>
zpool export masysmahdd6t
After inserting disks into new server.
# zpool import ... [pool]
zpool import -d /dev/disk/by-id masysmahdd6t
If pool name is left out, it will display the pools available for importing.
Replace a Failed Disk
(!) WARNING: This has not been verified yet (!) Quoted command from https://www.unixsheikh.com/articles/battle-testing-data-integrity-verification-with-zfs-btrfs-and-mdadm-dm-integrity.html:
# zpool replace <pool-name> <failed-disk-id> <new-disk-id>
zpool replace pool1 ata-ST31000340NS_9QJ0EQ1V ata-ST31000340NS_9QJ0DVN2
As per zpool replace
manpage, one could als replace a
device “by itself” after failure:
zpool replace pool1 sda sda
This may also be the command to use in case of “failure” after an intermittent drive connection issue?
Note that the ZFS term for resync in case of new/replaced disks is “resilver”.
Docker+ZFS Write Amplification
See also https://www.reddit.com/r/homelab/comments/h0n2h9/any_idea_why_docker_zfs_is_causing_a_lot_of_write/. This may enhance lifetime of mostly-idle SSDs at the expense of higher losses in case of power failure.
# check ongoing I/O on raw devices
iostat -k 10
# Enlarge zfs_txg_timeout (beware power outrages!)
# Default is 5; other users use up to 60 seconds.
echo 20 > /sys/module/zfs/parameters/zfs_txg_timeout
Making the change persistent:
# /etc/modprobe.d/masysma-zfs.conf
# Enlarge zfs_txg_timeout from 5 to 20sec to reduce idle SSD writes
options zfs zfs_txg_timeout=20