ZFS on Linux for Home NAS: Why and How to Avoid the 5 Common Beginner Mistakes

Listen to this post

AI-narrated version of this post using a synthetic voice. Great for accessibility or listening while busy.

AI assistance: Drafted with AI assistance and edited by Auburn AI editorial.

ZFS has a reputation for being the filesystem that serious homelab people use when they stop trusting RAID and start caring about data integrity. That reputation is mostly earned. But it also comes with a learning curve that catches people off guard, because ZFS doesn’t behave like ext4 or NTFS, and some of its defaults make perfect sense on a server with 256 GB of RAM and zero sense on a home NAS with 16 GB. The five mistakes covered here aren’t edge cases – they’re the same traps that show up repeatedly on r/homelab, the Proxmox forums, and TrueNAS community threads. If you’re setting up ZFS on Linux for the first time, or if your pool is already running and something feels off, this post is for you.

Mistake 1: Treating ZFS Like Hardware RAID

This is the foundational misunderstanding. People come from a RAID controller background, plug in four drives, and expect ZFS to work the same way. It doesn’t, and the differences matter.

Hardware RAID controllers have a battery-backed write cache. They also lie to the OS – the controller reports a write as complete before it’s actually on disk. ZFS, by design, needs to talk directly to the drives. When a hardware RAID controller sits in the middle, ZFS can’t verify that a write actually landed where it thinks it did. This breaks the entire integrity guarantee ZFS is built around.

The correct setup is to pass drives directly to ZFS using an HBA (Host Bus Adapter) in IT mode, not RAID mode. Cards like the LSI 9207-8i flashed to IT mode are the standard homelab solution. You want the OS to see individual drives, not a RAID volume presented by a controller.

A related mistake is using drives with unreliable power-loss protection. Consumer drives vary a lot here. If a drive lies about flushing its cache during an unexpected power cut, ZFS can write corrupted data. An UPS on your NAS is not optional if you care about integrity. A cheap CyberPower 850VA unit is around $120 CAD and is genuinely worth it.

Mistake 2: Building the Wrong Pool Layout From the Start

ZFS pool topology is permanent in ways that matter. You can add vdevs to a pool, but you can’t change the stripe width of an existing mirror or RAIDZ vdev without destroying and recreating it. Getting this wrong on day one is expensive in time, if not in hardware.

RAIDZ1, RAIDZ2, or Mirrors?

RAIDZ1 (equivalent to RAID-5) tolerates one drive failure per vdev. RAIDZ2 tolerates two. Mirrors are simpler: every drive in a mirror has a full copy. For a home NAS with four drives, two common sensible layouts are:

2x mirror vdevs (2-wide mirrors, striped): Good random read/write performance, tolerates one failure per mirror pair. You lose 50% of raw capacity to redundancy.
1x RAIDZ2 vdev (4-wide): Tolerates two simultaneous failures. Better capacity efficiency, but random write performance is lower. Rebuilds are slower.

For spinning disks in a home NAS doing mostly sequential workloads (media, backups), RAIDZ2 with four or more drives is a reasonable choice. For a VM datastore where random IOPS matter, mirrors win. What we found surprising when testing both configurations is how significant the IOPS difference is even on a 2.5GbE home network – the network isn’t always the bottleneck.

To create a RAIDZ2 pool on four drives (assuming drives are at /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde):

zpool create -o ashift=12 tank raidz2 /dev/sdb /dev/sdc /dev/sdd /dev/sde

Use disk IDs instead of /dev/sdX labels in production – those labels can shift after a reboot:

zpool create -o ashift=12 tank raidz2 \
  /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-XXXXXXXX \
  /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-YYYYYYYY \
  /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-ZZZZZZZZ \
  /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-AAAAAAAA

Set ashift Correctly at Creation

The ashift property sets the minimum block size for the pool. Modern drives – both 4K-native HDDs and NVMe SSDs – should use ashift=12 (which means 2^12 = 4096 byte sectors). Some drives lie and report 512-byte sectors for compatibility reasons. If you create a pool with ashift=9 on a drive that’s physically 4K, every write causes a read-modify-write cycle internally, which kills performance and drive longevity.

Check your drive’s physical sector size before creating the pool:

smartctl -i /dev/sdb | grep -i sector

When in doubt, use ashift=12. There is no performance penalty for using 4K blocks on a 512-byte drive. There is a significant penalty for doing it the other way around. And you cannot change ashift after pool creation.

Mistake 3: Not Tuning ARC for a Home System

ARC – the Adaptive Replacement Cache – is ZFS’s in-memory read cache. It is genuinely excellent. It is also, by default, configured to consume up to half your system RAM on Linux (or more on some distributions). On a dedicated server with 128 GB, that’s fine. On a home NAS running Debian with 16 GB that also handles a few containers, handing 8 GB to ARC by default is too aggressive.

Check your current ARC size in bytes:

cat /proc/spl/kstat/zfs/arcstats | grep -E "^c |^size"

The c line is the target ARC size in bytes, and size is the current usage. To cap ARC at, say, 4 GB on a 16 GB system, add this to /etc/modprobe.d/zfs.conf:

options zfs zfs_arc_max=4294967296

That value is 4 × 1024³ = 4,294,967,296 bytes. You also want to set a minimum to prevent ARC from shrinking below something useful under memory pressure:

options zfs zfs_arc_min=1073741824

That’s 1 GB minimum. After editing, run update-initramfs -u on Debian/Ubuntu systems and reboot. On systems using dracut, run dracut --force instead.

Our reading suggests a reasonable starting point for a 16 GB home NAS is an ARC max of 4-6 GB, leaving the rest for OS, containers, and application cache. Monitor with:

arc_summary

If you don’t have arc_summary, install the zfs-auto-snapshot package or grab it from the OpenZFS project. The hit ratio is the key number – anything above 80% for your workload means ARC is working well. A ratio below 50% on a stable workload suggests you might need more RAM, not more ARC tuning.

Mistake 4: Skipping Scrubs and Ignoring the Scrub Schedule

ZFS stores checksums for every block of data. A scrub reads every block, verifies the checksum, and fixes any corruption it can (using redundant copies from mirrors or RAIDZ parity). This is how ZFS catches silent data corruption – bit rot on old drives, cosmic ray flips, whatever the cause. But the scrub only helps you if you actually run it.

On a fresh ZFS install on Debian 12 or Ubuntu 22.04, a scrub cron job is often included but disabled, or set to run monthly. Check:

cat /etc/cron.d/zfsutils-linux

The default on many systems runs scrubs on the first Sunday of each month. That’s acceptable. For a home NAS with spinning disks that are powered down some of the time, you may want to confirm the schedule is actually firing. Run a manual scrub to establish a baseline:

zpool scrub tank

Check status while it’s running:

zpool status tank

A four-drive RAIDZ2 pool with 4 TB drives will typically take 4-8 hours to scrub fully. That’s normal. What’s not normal is errors in the output – even one checksum error on a drive that’s less than two years old is worth investigating. Pull SMART data immediately:

smartctl -a /dev/sdb

Pay attention to Reallocated_Sector_Ct, Current_Pending_Sector, and Offline_Uncorrectable. Non-zero values on any of these are warning signs. ZFS will tell you which drive had the error via zpool status.

Mistake 5: Not Using Datasets Properly (and Forgetting Snapshots)

One of ZFS’s most practical features for a home NAS is datasets with per-dataset properties, combined with snapshots that are nearly instant and space-efficient. A lot of beginners create one pool and dump everything into it as a flat directory structure. This gives up most of what makes ZFS useful day-to-day.

Datasets Are Your Friend

Create separate datasets for different categories of data:

zfs create tank/media
zfs create tank/backups
zfs create tank/documents
zfs create tank/vm-storage

Now you can set compression per dataset. LZ4 compression is essentially free on modern CPUs and often saves 20-40% on text, logs, and database files:

zfs set compression=lz4 tank/documents
zfs set compression=lz4 tank/vm-storage

Don’t bother compressing already-compressed data:

zfs set compression=off tank/media

You can set different snapshot schedules, quotas, and share settings per dataset. This is much cleaner than trying to manage everything at the pool level.

Snapshots: Use Them, Automate Them

A ZFS snapshot is a point-in-time, read-only copy of a dataset. Creating one takes milliseconds and uses almost no space until data changes. For a home NAS, automated snapshots are cheap insurance against accidental deletion – which is far more common than drive failure in practice.

The sanoid tool is the standard automation layer for ZFS snapshots on Linux. Install it on Debian/Ubuntu:

apt install sanoid

Configure it at /etc/sanoid/sanoid.conf:

[tank/documents]
  use_template = production

[tank/vm-storage]
  use_template = production

[template_production]
  frequently = 0
  hourly = 24
  daily = 30
  monthly = 3
  yearly = 0
  autosnap = yes
  autoprune = yes

This keeps 24 hourly snapshots, 30 daily snapshots, and 3 monthly snapshots. Enable the sanoid timer:

systemctl enable --now sanoid.timer

Snapshots are visible in tank/documents/.zfs/snapshot/ – users on a Samba share can browse them directly if you enable zfs set snapdir=visible tank/documents. That alone has saved a lot of people from a bad afternoon.

One More Thing: Data Residency and Canadian Context

If you’re running a home NAS in Canada and storing any personal data for clients or family members, it’s worth knowing that PIPEDA (the Personal Information Protection and Electronic Documents Act) applies even to small-scale data custodians in some contexts. Running your own ZFS NAS in Calgary rather than pushing everything to a US cloud provider keeps your data under Canadian jurisdiction. The Office of the Privacy Commissioner has readable guidance on this.

Self-hosting with proper ZFS redundancy, regular scrubs, and off-site backup replication (using syncoid, which pairs with sanoid) is a genuinely reasonable privacy-conscious alternative to cloud storage for non-enterprise use. It’s not zero effort, but none of this is magic – it’s just configuration you do once and then maintain.

ZFS on Linux rewards the people who take the time to understand what it’s actually doing, and it punishes assumptions borrowed from other filesystems – take the half-hour to read the OpenZFS documentation before you create your first pool, and most of these mistakes become avoidable.

– Auburn AI editorial, Calgary AB

Related Auburn AI Products

Building a homelab or self-hosting content site? Auburn AI has practical kits:

500 Homelab and Self-Hosting Blog Titles ($27)
Auburn AI Monitoring Stack ($37) – 6 production PowerShell scripts
Podcast Automation Kit ($37)
Browse all Auburn AI products