Backing up TrueNAS to a second remote server with Tailscale

It’s been quite some time since I had to abandon using CrashPlan Pro for backups when I found out that the limit of “unlimited” is 10TB per system. I would have simply sharded my backups across multiple “systems”, but I had individual datasets that exceeded that and would have have worked for me. I looked at cloud backups, but the volume of data I have (70TB and slowly growing) means the pricing, even for Amazon S3 Glacier Deep Archive, would be hundreds a month. The obvious solution is to make my own backup server and put it in a remote location. Someone asked for a walkthrough of what I did, so I decided to write one up.

Bandwidth considerations

This is a backup solution designed for people with exceptionally good upstream bandwidth, and a remote location with comparable downstream bandwidth. I’m currently sitting on gigabit fiber, and the remote location is 1000M/100M on the same ISP in the same city. Before I could get gigabit upload, the backup time would have been prohibitively long. When I was still slumming on 20M upload with the cable company, the initial backup of much less data took literally months to complete. It also helps that I don’t have any data caps, nor has anyone from my ISP asked about why I use so much bandwidth. My advice is that this only works if you have sufficient upload that a full backup would take less time to complete than your snapshots are retained for, your incremental backups can finish overnight so that they do not impact your regular usage, and there are no data caps to contend with.

Evaluating cost

Even with appropriate Internet connections on both sides, this solution only makes sense for large backups. As mentioned before, I’m at 70TB and growing. On Backblaze, that would be around $420/mo. A properly configured Amazon S3 Glacier Deep Archive setup may be cheaper, but AWS is infamous for billing shock. You’re also going to end up writing your own tools to avoid the two-step backup problem (copy to standard S3 and then change the storage class), accidentally hitting the penalty for removing files less than 180 days old, and so on. You’d probably also have to rent one of their Snowball devices for the initial backup which is another cost. Yes, you may be able to do it cheaper, but you will spend a lot of your time and you might not find the cost of your time to be worth it. None of this accounts for the extreme cost of restoration, a sticker shock you might not be prepared for in the event of a total loss.

I’m lucky that I can leave my backup box at a friend’s house. Not everyone is willing to let you park a server in their living room to make noise and chew $15-20/mo in power. If your remote option is a colocation facility, be prepared to spend $100/mo or more just to park the box there. This is in addition to a server (~$300) and hard drives (~$1200 for 12x 10TB HGST SAS), plus some boot disks (~$50 for a pair of SSDs). In my situation, $1600 for a server plus $20/mo in power means I break even on Backblaze in around 4 months, and around 18 months compared to cloud storage from Azure or AWS. Also, restoration is basically “free”. Always do the math before you move forward.

Picking backup server hardware

I opted for a Supermicro X9 server with 12 3.5″ drive bays in 2U formfactor. They’re cheap to acquire, and one or more of your homelab buddies may be eager to just give you one to get it out of the way. You can probably save some power with a newer system like a first gen Epyc or Xeon Scalable, but the payback period is exceptionally long. The E-2600v1/2 is definitely long in the tooth, but you don’t need a lot of CPU (or RAM) just for storing data. I replaced the CPUs that came with mine with E5-2630Ls, a 65W part that will sip power. I stuck with the stock 64GB RAM, but 32GB would be more than sufficient. You might even be able to drop to 16GB, but you’re likely to see some performance drops. For drives, I picked up cheap 10TB HGST SAS drives off of eBay as they are well-regarded for being performant but inexpensive. Prices are doing weird things as of this writing, but you could often find drives in the $6-7/TB range when I set this up. I’m using the onboard gigabit NICs as both sides of the backup have gigabit fiber (on the same ISP, so better speeds). If you’re doing multigig, grab one of the excellent Mellanox ConnectX-3 cards going for $20 or so. A pair of SATA SSDs provide the boot drive with some resiliency. This box doesn’t have hotswap bays dedicated for them, so I just have them sitting loose in the case.

I’m ordinarily a fanatic for doing RAID-Z2 ZFS volumes. A backup server doesn’t need the same level of resiliency that the primary does, and I think it’s worth the trade-off to get more storage space at the cost of a more urgent need to replace a disk should one fail. I configured the pool with 4-disk RAID-Z1 VDEVs in a single pool. This means you’ll get 75% of your total storage usable, but still have some fault tolerance when (not if) drives get to the bad side of that bathtub curve. It’s also easy to resilver one VDEV at a time when you’re ready for new drives.

I’m going to assume at this point that you know how to make the VDEVs and pool in TrueNAS SCALE. If not, there are many excellent tutorials on doing so.

Configuring backups

These backups were originally configured on TrueNAS CORE, but I’m now running TrueNAS SCALE. If you’re still on CORE, you should migrate since it’s not going to get any future feature updates and will eventually go EOL. (I don’t like it either. FreeBSD is bae.) As of this writing, the systems are running TrueNAS SCALE 24.04.

First, configure a user account for backups on the server to be backed up (referred to going forward as Prod). Make sure you have granted it group membership to the datasets you need to be backed up in Auxiliary Groups, and provide it an SSH key. For security reasons, I would recommend disabling the option for SSH password login enabled so that it’s forced to use key-based authentication. On the backup server (referred to going forward as Backup), go to Credentials, Backup Credentials, and add the key for the user you just created under SSH Keypairs. Once that’s done, add the user you just created to SSH Connections. You’ll want to fill in the Host field with the FQDN or IP address of Prod. (I’ll cover making sure DNS works across your Tailnet later.)

For backups to work, you need to be using snapshots. The default is to keep snapshots for up to two weeks, though for larger backup sets I would recommend setting snapshot retention to be slightly longer than the amount of time to takes to do a full backup from scratch. If you do not, the initial (or a new full) backup will fail as the first snapshots it copied become unavailable. You can view your snapshot configuration from Data Protection, Periodic Snapshot Tasks. You should have a snapshot task for each pool on Prod. I recommend the default Naming Schema of auto-%Y%m%d.%H%M-2w (auto-YYYYMMDD.HHMM-2w). You can adjust the Snapshot Lifetime for a longer retention if needed. Hourly snapshots are highly recommended. You can reduce the number of snapshots by setting them to only occur during hours the system is likely to be writing new data, but a 24/7 system is probably better suited to snapshots all day long. Make sure that Recursive box is checked so that all datasets in the pool are being snapshooted.

With the user setup done and snapshots in place, now it’s time to configure the backup. You’ll want both Prod and Backup on the same local network so we can validate the backup works as expected. On Backup, go to Data Protection, Replication Tasks to add a new one. You can pick the SSH coinnection you set up earlier, set the direction to PULL, and (VERY VERY VERY important for performance) set the Transport to SSH+NETCAT. This will ensure that you’re using an SSH tunnel between the systems with Backup pulling data from Prod. Netcat Active Side should also be set to LOCAL.

In the Source field, you can pick the pools or datasets you want to replicate from Prod to Backup. Ensure you check Recursive so that it gets everything, as well as Include Dataset Properties. Under Include snapshots with the name, you’ll want to specify the same format you used to set up the snapshots in the first place. If you do not, it will not match snapshots and you’ll get nothing backed up. The Destination field can be set to the one large pool you created when setting up Backup. For extra backup protection, you can choose SET for Destination Dataset Read-only Policy. This ensures that once replication is done, the backup is read-only and cannot be tampered with on the backup side. I would also recommend checking the Replication from scratch option so that if the backup process determines there are missing snapshots and it can’t sync the diffs, it’ll start from scratch. You can also configure a custom snapshot retention policy on Backup if you want the ability to restore from a specific snaphot on Prod.

At this point, you should have the backup configured, and now is the time to run it and let the initial backup complete. This will also let you validate if something isn’t working as expected. Better to find that out now than when you drop off your backup box somewhere else!

Setting up your Tailnet

Tailscale is very awesome. It’s basically just some wrappers around Wireguard with a centralized management server, but it makes setting it up very easy. It’s free for up to 100 nodes (as of right now), and you can install your own server on a VPS with Headscale if you want. This is the secret sauce to making any system look like it’s sitting on your local network, and the Tailscale application available in TrueNAS SCALE is a breeze to set up. There’s also a plugin for OPNsense 25.1 or higher (which I use) so you can enable it at the router level. I use this setup so that I can access my LAN from anywhere with my laptop.

First, you need to set up a Tailscale node on your local network. You can generate a key in your Tailscale admin console from Settings, Personal Settings, Keys, Generate auth key. Make sure it’s NOT Ephemeral since the device will be deauthed if it goes offline for any amount of time.

If you’re going the OPNsense route like me, make sure you’re on 25.1 or higher. You can install it from System, Firmware, Plugins and the plugin name is os-tailscale. Once installed, you can configure it from VPN, Tailscale. Enter the key under Authentication and hit Apply. Under settings, make sure you check the options for Accept DNS and, most importantly, Accept Subnet Routes. This second option is what allows a Tailscale client to access the LAN. It’s up to you if you want to enable Use Exit Node. This optionally allows all traffic over Tailscale to exit from your router as the default is split tunnel. Click Apply, and then go to the Advertised Routes. This is where you need to specify what subnets you’ll let a client access. You’ll need to make sure that the one with Prod is added, if you have more than one. Once you add it and click apply, you need to approve it in your Tailscale admin console.

The TrueNAS SCALE route is about as easy. Navigate to Apps and click Discover Apps. Search for Tailscale and install it. In the configuration, you’ll have more-or-less the same options as on OPNsense, just in slightly different locations. Fill in your Auth Key and set up the other settings as appropriate. If you’re setting it up on Prod rather than OPNsense (or another device), make sure you set up Advertise Routes and auth it in the admin console. On Backup, you can skip that step.

Once you’ve set up the nodes, you can validate that both are present and online in the Tailscale admin console. You’re done!

Wrapping up

Now that you have Tailscale configured, you should be able to move Backup to a remote location and run a sync task without issue! Always make sure you have email alerting enabled on your server so that you’ll know about drive failures or power loss so that you can stay on top of it. You also have the added bonus of easy LAN access from anywhere. Any questions, just ask in the comments!

You may also like...

Leave a Reply