My Backup Strategy

Details about the backup system for all of my data.


Regularly backing up all the data I care about is very important to me. This article outlines my strategy to make sure I never lose essential data.

Motivation

Backups should be as automatic as possible. This ensures laziness and forgetfulness won't interfere with the regularity.

All software used to create and store the backups should be free and open source so I'm not depending on the survival of a company.

Backups need to be tested to ensure they are correct and happening regularly. Multiple copies of the backups should exist, including at least one offsite to protect against my building burning down.

Backups should also be incremental when possible (rather than mirror copies) so an accidental deletion isn't propagated into the backups, making the file irrecoverable.

Strategy

The key is to have one central location that all your files, projects, and data are cloned to and then back that directory up to multiple locations.

I have one backup folder /mnt/backup on my media server at home that serves as the destination for all my backup sources. All scheduled automatic backups write to their own subfolder inside of it.

This backup folder is then synced to encrypted 2.5" 1 TB hard drives which I rotate between my bag, offsite, and my parents' house.

a diagram of my setup. servers and computers on the left, pointing to my home server in the middle, pointing to external hard drives on the right

Backup Sources

I use the tool rdiff-backup extensively because it allows me to take incremental backups locally or over SSH. It acts very similar to rsync and has no configuration.

Email

I have every email since 2010 backed up continuously in case my email provider disappears.

I use offlineimap to sync my mail to the directory ~/email on my media server as a Maildir. Since offlineimap is only a syncing tool, the emails need to be copied elsewhere to be backed up. I run rdiff-backup from a weekly cron job:

(I'll explain what backup_check.txt does below)

*/15 * * * * offlineimap > /var/log/offlineimap.log 2>&1
00 12 * * 1 date -Iseconds > /home/email/email/backup_check.txt

20 12 * * 1 rdiff-backup /home/email/email /mnt/backup/local/email/
40 12 * * 1 rdiff-backup --remove-older-than 12B --force /mnt/backup/local/email/

Here's my .offlineimaprc for reference:

[general]
accounts = main
[Account main]
localrepository = Local
remoterepository = Remote
[Repository Local]
type = Maildir
localfolders = ~/email
[Repository Remote]
type = IMAP
readonly = True
folderfilter = lambda foldername: foldername not in ['Trash', 'Spam', 'Drafts']
remotehost = example.com
remoteuser = mail@example.com
remotepass = supersecret
sslcacertfile = /etc/ssl/certs/ca-certificates.crt

Notes

I use Standard Notes to take notes and wrote the tool standardnotes-fs to mount my notes as a file system to view and edit them as plain text files.

I take weekly backups of the mounted file system on my media server with cron:

00 12 * * 1 date -Iseconds > /home/notes/notes/backup_check.txt
15 12 * * 1 rdiff-backup /home/notes/notes /mnt/backup/local/notes/

Nextcloud

I self-host a Nextcloud instance to store all my personal documents (non-code projects, tax forms, spreadsheets, etc.). Since it's only a syncing software, the files need to be copied elsewhere to be backed up.

I take weekly backups of the Nextcloud data folder with cron:

00 12 * * 1 rdiff-backup /var/www/nextcloud/data/tanner/files /mnt/backup/local/nextcloud/
30 12 * * 1 rdiff-backup --remove-older-than 12B --force /mnt/backup/local/nextcloud/

Gitea

I self-host a Gitea instance to store all my git repositories for code-based projects. My home folder is also a git repo so I can easily sync my config files and password database between servers and machines.

I take weekly backups of the Gitea data folder with cron:

00 12 * * 1 date -Iseconds > /home/gitea/gitea/data/backup_check.txt
10 12 * * 1 rdiff-backup --exclude **data/indexers --exclude **data/sessions /home/gitea/gitea/data /mnt/backup/local/gitea/
35 12 * * 1 rdiff-backup --remove-older-than 12B --force /mnt/backup/local/gitea/

Telegram

Telegram Messenger is my main app for communication. My parents, most of my friends, and friend groups are on there so I don't want to lose those messages in case Telegram disappears or my account gets banned.

Telegram includes a data export feature, but it can't be automated. Instead I run the deprecated software telegram-export hourly with cron:

(Saves the messages to a sqlite db)

0 * * * * bash -c 'timeout 50m /home/tanner/opt/telegram-export/env/bin/python -m telegram_export' > /var/log/telegramexport.log 2>&1

It likes to hang, so timeout kills it if it's still running after 50 minutes. Hasn't corrupted the database yet.

Phone

Signal Messenger automatically exports a copy of my text messages database, and Aegis allows me to export an encrypted JSON file of my two-factor authentication codes.

I mount my phone's internal storage as a file system on my desktop using adbfs-rootless. I then rsync the files over to my media server:

$ ./adbfs ~/mntphone 
$ time rsync -Wav \
  --exclude '*cache' --exclude nobackup \
  --exclude '*thumb*' --exclude 'Telegram *' \
  --exclude 'collection.media' \
  --exclude 'org.thunderdog.challegram' \
  --exclude '.trashed-*' --exclude '.pending-*' \
  ~/mntphone/storage/emulated/0/ \
  localmediaserver:/mnt/backup/files/phone/

Unfortunately this is a manual process because I need to plug my phone in each time. Ideally it would happen automatically while I'm asleep and the phone is charging.

Miscellaneous Files

The directory /backup/files is a repository for any kind of files I want to keep forever. My phone data, old archives, computer files, Minecraft worlds, files from previous jobs, and so on.

All the files will be included in the 1 TB hard drive backup rotations.

Web Services

Web services that I run like t0 Services and QotNews are backed up daily, weekly, and monthly depending on how frequently the data changes.

I run rdiff-backup on the remote server with cron:

00 14 * * * date -Iseconds > /home/tanner/tbot/t0txt/data/backup_check.txt                                                  

04 14 * * * rdiff-backup /home/tanner/tbot/t0txt/data tbotbak@remotebackup::/mnt/backup/remote/tbotbak/daily/t0txt/
14 14 * * * rdiff-backup --remove-older-than 12B --force tbotbak@remotebackup::/mnt/backup/remote/tbotbak/daily/t0txt/

24 14 * * 1 rdiff-backup /home/tanner/tbot/t0txt/data tbotbak@remotebackup::/mnt/backup/remote/tbotbak/weekly/t0txt/
34 14 * * 1 rdiff-backup --remove-older-than 12B --force tbotbak@remotebackup::/mnt/backup/remote/tbotbak/weekly/t0txt/

44 14 1 * * rdiff-backup /home/tanner/tbot/t0txt/data tbotbak@remotebackup::/mnt/backup/remote/tbotbak/monthly/t0txt/
55 14 1 * * rdiff-backup --remove-older-than 12B --force tbotbak@remotebackup::/mnt/backup/remote/tbotbak/monthly/t0txt/

The tbotbak user has write access to the /mnt/backup/remote/tbotbak directory only. It has its own passwordless SSH key that's only permitted to run the rdiff-backup --server command for security.

Protospace

I run a lot of services for Protospace, my city's makerspace.

The member portal I wrote called Spaceport creates an archive I download daily:

40 10 * * * wget --content-disposition \
  --header="Authorization: secretkeygoeshere" \
  --directory-prefix /mnt/backup/remote/portalbak/ \
  --no-verbose --append-output=/var/log/portalbackup.log \
  https://api.my.protospace.ca/backup/

The website and wiki that I sysadmin both get backed up weekly:

0 12 * * 1 mysqldump --all-databases > /var/www/dump.sql
15 12 * * 1 date -Iseconds > /var/www/backup_check.txt
20 12 * * 1 rdiff-backup /var/www pshostbak@remotebackup::/mnt/backup/remote/pshostbak/weekly/www/

The Protospace Minecraft server I run gets backed up daily:

00 15 * * * date -Iseconds > /home/tanner/minecraft/backup_check.txt
00 15 * * * rdiff-backup --exclude **CoreProtect --exclude **dynmap /home/tanner/minecraft psminebak@remotebackup::/mnt/backup/remote/psminebak/
30 15 * * * rdiff-backup --remove-older-than 12B --force psminebak@remotebackup::/mnt/backup/remote/psminebak/

I also back up our Google Drive with rclone:

45 12 * * 1  rclone copy -v protospace: /mnt/backup/files/protospace/google-drive/

Backup Copies

My backup folder /mnt/backup now looks like this:

/mnt/backup/
├── files
│   ├── docs
│   ├── phone
│   ├── protospace
│   ├── telegram
│   ├── usbsticks
│   └── ... and so on
├── local
│   ├── email
│   ├── gitea
│   ├── nextcloud
│   └── notes
└── remote
    ├── portalbak
    ├── pshostbak
    ├── psminebak
    ├── tbotbak
    └── telebak

This directory tree is the master backup and I make a copy of the entire tree every Saturday to a hard drive.

The directory is copied over with the following script:

#!/bin/bash

cryptsetup luksOpen /dev/sdf external
mount /dev/mapper/external /mnt/external

time rsync -av --delete /mnt/backup/local/ /mnt/external/backup/local/
time rsync -av --delete /mnt/backup/remote/ /mnt/external/backup/remote/
time rdiff-backup --force -v5 /mnt/backup/files/ /mnt/external/backup/files/

python3 /home/tanner/scripts/checkbackup.py

umount /mnt/external
cryptsetup luksClose external

I wrote a Python script checkbackup.py that goes through each backup and compares the timestamp in backup_check.txt files to the current time. This makes sure that the cron ran, backups were taken, and transferred over correctly.

Rotating Hard Drives

I rotate through 2.5" 1 TB hard drives each Saturday when I do a backup. They are quite cheap at $65 CAD each so I can have a bunch floating around.

I keep one connected to the server, one in my bag, one offsite, one at my mother's house, and one at my dad's house. Every Saturday I run the script above to take a copy and then swap the drive with the one in my bag. It then gets swapped when I visit my offsite location. Same for when I visit my parents (I go back home about twice per year). This means that all hard drives eventually get rotated through with new data and don't sit too long unpowered.

The drives are all encrypted with full-disk LUKS encryption using a password I'm unlikely to forget.

I run the check-summing btrfs file system on them in RAID-1 to protect against bitrot. This means I can only use 0.5 TB of storage for my backups, but the data is stored redundantly.

Here's how I set up new hard drives to do this:

$ sudo cryptsetup luksOpen /dev/sdf external
$ sudo mkfs.btrfs -f -m dup -d dup /dev/mapper/external
$ sudo mount /dev/mapper/external /mnt/external/
$ sudo mkdir /mnt/external/backup
$ sudo chown -R tanner:tanner /mnt/external/backup
$ sudo umount /mnt/external
$ sudo cryptsetup luksClose external

Future Improvements

I'm working on a system to automatically back up all my home directories to my media server. I need this to grab Bash histories and code that's work-in-progress. I've been burned by not having this once when a server died.

I'd like to automate backing up my phone by connecting it to a Raspberry Pi when I go to sleep.

I need to get better at fully testing my backups by restoring them on a blank machine.