Easy local and remote backup of your home network

Updated on 2006-09-20 to add an exclude file and clear up some rsync confusion

I hate making backups by hand. It costs a lot of time and usually I have far better things to do. Long ago (in the Windows 98 era) I made backups to CD only before I needed to reïnstall the OS, which was about once every 18 months, and my code projects maybe twice as often. A lot has changed since those dark times though. My single PC expanded into a network with multiple desktops and a server, I installed a mix of Debian an Ubuntu and ditched Windows, and I have a nice broadband link - just as my friends do. Finally a lazy git like me can set up a decent backup system that takes care of itself, leaving me time to do the "better" things (such as writing about it :-)

There are already quite a few tutorials on the internet explaining various ways to backup your Linux system using built-in commands and a script of some sorts, but I could not find one that suited me so I decided to write another one - one that takes care of backing up my entire network.

A two-stage network backup

I'll quickly explain how my home network is set up. It's very simple and I think that many geeks will have similar networks. I have a Linksys wireless ADSL router that connects me to the internet. Behind it are some desktops, my laptop and a server. I want to make regular, fast backups from all the computers to my local server, and an occasional remote backup that copies all the local backups from my server to a server that a friend on mine is running - essentially backup up over an untrusted network to an untrusted machine.

Stage 1 - rsync to the local server

Let's start with stage 1 and worry about stage 2 later. Stage 1 backups should occur regularly and fast. Incremental backups are usually the fastest so I picked rsync for the job. rscync allows me to synchronize two directories on local or remote machines and copy only the difference. It works over ssh and rsh but I chose to use the rsync daemon over TCP on my server. It's very quick to configure and poses little extra risk because it all happens on my local network shielded from the rest of the world. Read the rsyncd.conf manpage for detailed instructions on setting up the daemon. I'm not going to describe every little detail in this article.

I have created the directory /var/backups/rsync on the server. Each of my computers will back up to a subdirectory of that one, named after their respecive hostnames. E.g, my primary desktop is named tweety and will back up to /var/backups/rsync/tweety on the server. Each computer that is going to backup to the server should get it's own entry in the /etc/rsyncd.conf file. Here is an example:

[desktop]
        comment = backup for desktop
        path = /var/backups/rsync/desktop
        use chroot = true
        hosts allow = desktop
        read only = false
        write only = false
 
[laptop]
        comment = backup for laptop
        path = /var/backups/rsync/laptop
        use chroot = true
        hosts allow = laptop
        read only = false
        write only = false

You should replace the terms desktop and laptop with their respective hostnames in the above rsyncd.conf file. The next step is enabling the rsync daemon. Here's what I added to my /etc/inetd.conf:

rsync           stream  tcp     nowait  root    /usr/bin/rsync rsync --daemon

The serverside of stage 1 is ready. Now it's a matter of having the clients make the backups. My clients are not always running so a cronjob is not an option. I decided that I want to backup to the server everytime I reboot or shutdown my PC. To do that, you need to add entries in the /etc/rc0.d (for shutdown) and /etc/rc6.d (for reboot) directories. First off, here's the (one line) script that does the backup. It simply backs up the entire home directory. I placed this in /etc/init.d/backup.sh:

rsync -a --delete -v /home yourserver::${HOSTNAME}

Note that the above example is for contacting an rsync daemon directly over TCP. If you want to use SSH as a transport layer, replace the double colon (::) with a single colon (:). It depends on your rsyncd setup. If you look at the files in the rc0.d and rc6.d directory then you will see that they follow a certain naming scheme. Files starting with a K are executed on shutdown, in the order of the number that follows. On my system K00 was still unused, which would make the backup execute before anything else is killed. So, create some symlinks to the client backup.sh file:

ln -s /etc/init.d/backup.sh /etc/rc0.d/K00backup
ln -s /etc/init.d/backup.sh /etc/rc6.d/K00backup

The first time that the backup executes it will take a long time because all the files in the home directory have to be copied to the server. Subsequent backups are a lot faster, usually about 100 times faster on my system. It depends how much files you add or change. If you decide to download all 14 ISO's of Debian Sarge to your home directory, the next backup will take significantly longer :-)

Stage 2 - encrypt the backups and backup remotely

With the local backups sorted out it's time to look at the remote backups. I want to encrypt the various client backups with GPG for security and do that separately so that I don't have to deal with one giant backup file if only one client system has been hosed. I'm going to use tar, gzip and gpg to create backup files from the rsync'ed directories and then use a passwordless scp to copy them to the remote server. This backup will be scheduled with a cron job at some ungodly hour on sunday night when there is virtually no chance that me or my friend are making heavy use of our servers.

Setting up passwordless ssh/scp

Get an ssh account on the remote server and make sure that the remote ssh daemon is set up to allow public key authentication. Next, log in as root on your local server and generate an ssh key. Leave the default values and don't give in a password when prompted. We want passwordless login for this:

ssh-keygen -t rsa

You now have two files in ~/.ssh namely id_rsa (your private key) and id_rsa.pub (your public key). This would be a good time to backup the keys up a CD, a USB stick or even print them out and keep them at a safe place. Next, copy the public key to the remote server so you can login without being prompted for a password:

ssh remote_user@remote_host "mkdir .ssh; chmod 700 .ssh"
scp ~/.ssh/id_rsa.pub remotemachine:.ssh/authorized_keys2

Setting up GPG

Now we need to set up GPG in order to encrypt the backup. You only need a public key for this. If you already have a GPG key, simply import your public key into root's keychain on the local server. If you don't have a GPG key then you need to create one.

gpg --gen-key

Choose the default values, fill out your information and pick a strong passphrase. If you are prompted that the system needs to gather more entropy (randomess), bang away at the keyboard or open another console and do some work. After your keypair has been created, back this one up as well on secure media like the ssh keypair above. If you loose the ssh keypair it's merely a nuisance and you need to set it up again. If you loose the GPG keys (e.g. your house and all computers burn down) then there is no way to decrypt your backups again!

The backup script

With the technicalities sorted out, all that is left is a script to do the actual backup. You can use my script below. It runs tar, gzip and gpg over all directories in $local_rsync and places the files in $local_out, stripping out the files and patterns listed in $exclude_file. The exclude file is used to keep the size of the remote backup to a minimum. For example, I exclude ISO images, movies and other large media. Then it copies the files to the remote hosts and checks the MD5 sums to make sure that the transfer was not corrupted (thanks to Dan Siemons for that idea). The script will also send you an e-mail to tell you if any - and which - files failed to backup. I find it very usefull to be e-mailed regardless of success or failure because if I don't get an e-mail at all for some reason, then I still know something went wrong. I saved the script below as /etc/cron.weekly/backup.sh so that it will get executed once a week.

#!/bin/bash
 
#
# Configuration
# Note that all paths should end in a trailing slash
#
 
# Path to the directory that recieves the rsyncs from the clients
local_rsync="/var/backups/rsync/"
 
# The directory that holds the encrypted files
local_out="/var/backups/out/"
 
# The ID to which the files should be GPG encrypted
gpg_encrypt_to="DEADBEEF"
 
# Hostname of the remote backup server
remote_host="user@remote"
 
# Path on the remote server where the files will be copied to
remote_path="/home/user/backups/"
 
# Your e-mail address to let you know how the backup went
email="you@example.com"
 
# Path to your exclude file
exclude_file="/etc/cron.weekly/backup.exclude"
 
#
# No need to touch things below 
#
 
cd $local_rsync
backup_files=`ls`
result=""
 
for backup_file in ${backup_files}
do
 
        # Create the .tar.gz.gpg
        tar cvp --sparse ${local_rsync}${backup_file} --exclude-from ${exclude_file} \
        | gzip \
        | gpg -e -r $gpg_encrypt_to \
        > ${local_out}${backup_file}.tar.gz.gpg
 
        # Send the file to the remote server
        scp ${local_out}${backup_file}.tar.gz.gpg ${remote_host}:${remote_path}${backup_file}.tar.gz.gpg
        if [ $? != 0 ]; then
                result="${result}Transfer of {$backup_file}.tar.gz.gpg failed\n"
        fi
 
        # Check the MD5 sums
        sourceMD5=`md5sum ${local_out}${backup_file}.tar.gz.gpg`
        if [ $? != 0 ]; then
                result="${result}Could not get local MD5 sum for ${backup_file}.tar.gz.gpg\n"
        fi
        sourceMD5=`echo ${sourceMD5} | awk '{print $1}'`
 
        destMD5=`ssh ${remote_host} md5sum ${remote_path}${backup_file}.tar.gz.gpg`
        if [ $? != 0 ]; then
                result="${result}Could not get remote MD5 sum for ${backup_file}.tar.gz.gpg\n"
        fi
        destMD5=`echo ${destMD5} | awk '{print $1}'`
 
        if [ "${sourceMD5}" != "${destMD5}" ]; then
                result="${result}MD5 sums for ${backup_file}.tar.gz.gpg did not match\n"
        fi
 
done
 
if [ "${result}" != "" ]; then
        echo "${result}"
        echo "${result}" | mail -s "Backup failed" ${email}
else
        echo "Backup succesfull"
        echo "Backup succesfull" | mail -s "Backup succesfull" ${email}
fi

And the backup.exclude file:

*.mp3
*.iso
*.avi
*.mpg
*.wmv
*.zip
*.ies4linux*
*.java*
*.macromedia*
*.Trash*
*.pbuilder*

Pro's and con's

The method I described above is a fairly simple way to securely backup a small home network. Adding new clients is easy. Just add the /etc/init.d/backup.sh file, symlink it to the proper rc directories, add an entry in rsyncd.conf and you're set. There are a few downsides as well ofcourse. The most important one is that this method copies the entire backup for all the clients to the remote server each time it runs. This is unavoidable because we want to encrypt the backups before sending them over the internet which makes incermental backups like rsync impossible. Another issue is that the weekly remote backup takes quite a bit of processing power and bandwidth. Creating the GPG files takes about 40 minutes on my Dual PII and sending them takes another two to three hours. If you use your server for 24/7 services the you should schedule the remote backups during the slow hours or find a way to throttle it.

All in all, I hope that this article has been of use to you. If you have any feedback then you can leave it in the comments section. Happy coding!

Comments

#1 Anil

Posted on 2006-07-27@13:39

What if I have a windows client in my network?

#2 Sander Marechal (http://www.jejik.com)

Posted on 2006-07-27@22:57

Hi Anil.

The client portion of this backup method is basically just a small script that invokes rsync, which has a windows client as well (see http://www.itefix.no/cwrsync/). You can use the Windows task scheduler to run it every time you log off or reboot.

Lone Wolves

Web, game, and open source development

A two-stage network backup

Stage 1 - rsync to the local server

Stage 2 - encrypt the backups and backup remotely

Setting up passwordless ssh/scp

Setting up GPG

The backup script

Pro's and con's

Comments

#1 Anil

#2 Sander Marechal (http://www.jejik.com)

Comments have been retired for this article.

Menu

Pingbacks

Latest comments

Links