Backing Up a Self-Hosted Stack Without Cloud Vendor Lock-In (My Restic + Local NAS Setup)

Last February, a ZFS pool on my main Proxmox node threw a fit after a failed drive replacement. I had backups. I thought I had backups. Turns out I had a Proxmox Backup Server job that had been silently failing for three weeks because a datastore ran out of space and I had never set up proper alerting. That incident forced me to rebuild my entire backup strategy from scratch, and what I landed on is a Restic-based setup that I’ve now been running reliably for about fourteen months.

Why I Stopped Trusting Single-Vendor Backup Solutions

Before the February incident I was doing what a lot of homelab people do: relying on whatever backup tooling came with the hypervisor. Proxmox Backup Server is genuinely good software, but I was using it as both the backup tool and the backup destination on the same physical node. That is not a backup. That is a copy with extra steps.

I also had a brief period of using a cloud-hosted backup service, which I won’t name, but it was running me about $18 CAD per month for the storage tier I needed. Fine, but the restore process involved their proprietary client, their servers, and their uptime. When I needed a file quickly at 2 a.m. during the incident, their web portal was down for maintenance. I’m in Calgary. My internet is Shaw â now Rogers, after the merger â and while the speeds are decent, I was also staring down the reality that a full restore of my stack over residential internet would take the better part of a day even without throttling.

Restic solved most of these problems. It’s open-source, the repository format is documented, and it can target almost any backend: local filesystem, SFTP, S3-compatible object storage, Backblaze B2, and more. If Restic itself disappeared tomorrow, there are community forks and the format is readable enough that you could theoretically write your own tooling. That’s the kind of insurance I want.

The Actual Hardware Setup

The NAS

My primary backup target is a TrueNAS Scale box sitting on a separate shelf in the same rack. It’s a cheap mini-ITX build I put together about two years ago: a used Xeon E3 processor, 16GB of ECC RAM, and four 4TB Seagate IronWolf drives in a RAIDZ1 configuration. Total usable storage comes out to roughly 10.9TB. I’m currently using about 4.2TB of that for backup repositories across all my services.

The TrueNAS box is on a separate UPS from my main compute. This matters. If a power spike takes out the compute nodes, I don’t want it taking out the backup destination at the same time.

Remote Replication

For offsite, I replicate the Restic repositories to a VPS I rent from a Canadian provider â specifically one with a datacenter in Vancouver â for about $22 CAD per month for the storage tier I use. I chose a Canadian provider deliberately. Between PIPEDA considerations and the fact that some of my smart home data includes things like presence detection logs and camera archives, I’d rather that data stay under Canadian jurisdiction. Your mileage may vary on how much this matters to you, but it was a conscious decision for me.

The replication from TrueNAS to the VPS runs as a nightly rclone job, not as a second Restic job. This is an important distinction I’ll come back to.

How the Restic Jobs Are Structured

What Gets Backed Up

I’m running about 22 services across three Proxmox nodes. The backup scope breaks down roughly like this:

Container volumes: Anything in /var/lib/docker/volumes on my Docker hosts, plus bind-mount paths I’ve documented in a flat text file I keep updated religiously.
Config directories: All my Ansible playbooks, compose files, and custom configs live in a Git repo, but I also back that up with Restic because relying solely on Git for recovery requires Git itself to be running.
Database dumps: I don’t back up live database files directly. MariaDB and PostgreSQL don’t react well to being copied mid-write. I run pre-backup hooks that dump to a staging directory, Restic backs up the dump, and the staging directory gets cleared afterward.
Home Assistant configuration: This one is its own Restic job because it’s precious and I’ve spent an embarrassing number of hours tuning automations.

The Restic Command Structure

Each job is a shell script called by a systemd timer. I deliberately avoided restic wrappers like autorestic for a long time because I wanted to understand exactly what was happening. I’ve recently started using autorestic on one of my secondary nodes just to evaluate it, but my primary jobs are still plain shell scripts.

A simplified version of my main backup script looks like this:

Set the RESTIC_REPOSITORY and RESTIC_PASSWORD_FILE environment variables, run any pre-backup hooks (database dumps), call restic backup with the paths and appropriate excludes, then call restic forget with a retention policy, and finally restic check at a lower frequency to verify repository integrity.

My retention policy is: keep 7 daily, 4 weekly, 6 monthly snapshots. This is not magic â it’s just what felt right for my use case. I can recover a file from yesterday or from six months ago, which covers probably 95% of the scenarios I’d actually face.

Encryption and the Password File

Restic encrypts everything at rest by default. The password file lives on the source host in a root-only-readable location and is also stored offline in a KeePassXC database that I keep on an encrypted USB drive in a fireproof box. I cannot stress this enough: if you lose the Restic password, the repository is gone. Doesn’t matter that you have the files. You’ve got encrypted noise.

Why rclone for Remote Sync Instead of a Second Restic Remote

This tripped me up for a while. The natural instinct is to configure Restic to write to two repositories simultaneously â one local, one remote. You can do this, but it means two separate repositories with two separate states. If you run restic forget on one, the other doesn’t know about it. You’d need to keep both in sync manually, which creates operational overhead and potential inconsistency.

My approach: Restic writes to the local NAS repository. That repository is the source of truth. Then rclone syncs the entire repository directory from TrueNAS to the VPS nightly. The VPS has a valid, complete Restic repository at all times, encrypted, that I can restore from directly if the local NAS is gone. I’ve tested this. It works.

The rclone sync runs after the Restic job completes. The order matters. You don’t want rclone syncing a repository mid-write.

The Restore Incident: What Actually Broke

About eight months into running this setup, I had to do a real restore. Not a test â a real one. My Home Assistant VM got corrupted after an ill-advised in-place OS upgrade attempt. I needed to recover about three weeks’ worth of configuration changes that weren’t in Git yet because I’d been lazy about committing.

Here’s what broke, or at least didn’t go smoothly:

Problem One: I Forgot the Exclude Paths

When I set up the Home Assistant backup job, I excluded the .storage directory inside the config folder because it can be large and regenerates automatically. What I forgot is that some integrations store state in .storage that doesn’t regenerate â it accumulates. I lost about two weeks of energy monitoring history. Not catastrophic, but annoying. I’ve since revised the exclude list and documented the reasoning inline in the backup script.

Problem Two: The Database Dump Script Had a Silent Failure Mode

My MariaDB pre-backup hook was running mysqldump and writing to the staging directory. What I hadn’t tested was what happens when the database is briefly unavailable â like when it’s restarting after an update. The dump script exited with a non-zero code, but my wrapper script wasn’t checking exit codes correctly. Restic backed up an empty staging directory for four days before I caught it during a routine check. I now fail the entire backup job loudly if any pre-backup hook exits non-zero, and I have a Gotify notification set up for failed systemd timer jobs.

Problem Three: Restore Speed Over the Local Network

This was less of a problem and more of a calibration issue. Restoring 12GB of Home Assistant configs and media from the NAS over my 10GbE LAN took about four minutes. Fine. But I hadn’t mentally mapped out the full restore sequence â what order services need to come back up, what depends on what. I now maintain a one-page restore runbook in my Obsidian vault that lists the sequence and any gotchas. It doesn’t need to be elaborate. It just needs to exist when your brain is fried at 11 p.m.

The Honest Tradeoffs

This setup is not without its compromises. A few things I’d want anyone reading this to understand before copying it wholesale:

The NAS is a single point of failure for local backups. RAIDZ1 protects me from one drive failure, but a fire, flood, or catastrophic controller failure would take out all local backups. The offsite VPS replication is the answer to that, but there’s a 24-hour lag on it. Something I backed up at 6 a.m. today might not be offsite until tomorrow morning.
Restic is not fast on large datasets. The initial backup of anything over 100GB is slow. Incrementals are fast after that, but be prepared to let the first run go overnight. On my largest dataset, the first Restic backup ran for about six hours.
Managing multiple Restic repositories gets complicated. I have seven separate repositories right now. Keeping track of which password file maps to which repo, which retention policy applies where â it’s manageable but it requires documentation discipline. If I were starting fresh today I’d probably consolidate more aggressively.
This doesn’t replace VM-level snapshots for everything. For rapid rollback of a whole Proxmox VM, I still use PBS. Restic handles the data I actually care about long-term. They’re complementary, not competing.

The Restore Test Cadence

I do a partial restore test every three months. I pick one service at random, restore its most recent snapshot to a throwaway LXC container, verify the files look correct, and delete the container. Takes maybe thirty minutes. Once a year I do a full test of the remote VPS repository â actually connecting to the VPS, running restic restore, and verifying a meaningful chunk of data. This is the test that would have caught my PBS silent failure from February if I’d been doing it before that incident.

Log the results somewhere. Even just a note in a text file with the date and what you tested. Future you will want to know when the last test was.

If you’re running a self-hosted stack without a tested restore path, the first thing to do this weekend is pick your most important service and actually restore it to a test environment. Not because something is about to go wrong â but because you want to find out what’s already wrong before it matters.

Related Auburn AI Products

Building a homelab or self-hosting content site? Auburn AI has practical kits:

500 Homelab and Self-Hosting Blog Titles ($27)
Auburn AI Monitoring Stack ($37) – 6 production PowerShell scripts
Podcast Automation Kit ($37)
Browse all Auburn AI products