Duplicity + Amazon S3 = incremental encrypted remote backup

Duplicity is a backup program that only backs up the files (and parts of files) that have been modified since the last backup. Built on FLOSS (rsync, GnuPG, tar, and rdiff), it allows efficient, locally encrypted, remote backups.

Amazon S3 is a web service that provides cheap, distributed, redundant, web-accessible storage. S3 currently charges only $0.15 per GB-month storage and $0.10 per GB upload. The API is based on HTTP requests such as GET, POST, PUT, and DELETE.

The following is a description of how I made use of these to back up my laptop, which runs Ubuntu Feisty Fawn.

Installation

These packages were sufficiently up-to-date in the Ubuntu repositories, so they can be installed immediately:

  • python-dev: python development files
  • rsync1: rsync remote-delta algorithm
  • librsync-dev: rsync development files

By the time you read this, the following packages and versions may or may not have made their way into the repositories, but as of now, they have to be installed from source. Builds character.

Grab the latest .tar.gz file from each of these sources, extract it (tar -xzf whatever.tar.gz), navigate into the folder, and run python setup.py install as root or using sudo.

  • Duplicity 0.4.3-rc7 or later. Currently 0.4.2 is the latest stable (but doesn't have S3 capability), and 0.4.3-rc12 is the latest unstable.
  • Boto is a python backend which can talk to S3. Duplicity used to use BitBucket for this. Both are maintained by the same author.

Sign up for S3

Sign up for an Amazon Web Services account, then add the Simple Storage Service. This will involve giving them your credit card number, to be charged monthly.

Make sure to generate and write down your Secret Access Key, along with your Access ID Key. You'll need those for any application that interacts with S3.

Encryption keys

You should encrypt your files so that they are safe from prying eyes in transit and in storage. Signing them protects the files from alteration in storage or transit.

Decide on a GPG key to use for encryption and signing. (The one I use is 3BBF4E12, my main key.) Make sure your encryption/signing key is in your GPG keyring. (You can use separate keys for encryption and signing, but I haven't in this case.)

Optional: Visual S3 interface

JetS3t's Cockpit is a decent tool for viewing and managing your S3 buckets and objects. There is an applet version, but I use the standalone application instead.

Unzip the download to a permanent location, and find the bin/cockpit.sh file. Allow it to be executable (chmod u+x cockpit.sh). Edit it to set the environmental variables properly:

JETS3T_HOME=/home/myusername/programs/JetS3t
JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun/jre

When you run cockpit.sh, you will be presented with a somewhat complicated login screen. The simplest is the Direct Login tab. Local Folder login stores authentication info locally in a password-protected file.

Learn and configure

After scanning over man duplicity and playing with commands like duplicity /home/myusername/junk file:///home/myusername/junkBackup, create a shell script to run duplicity backups automatically. Here's what mine looks like:

export AWS_ACCESS_KEY_ID=myAccessKeyID
export AWS_SECRET_ACCESS_KEY=mySecretAccessKey

duplicity --encrypt-key=myEncryptionKeyFingerprint --sign-key=mySigningKeyFingerprint --exclude=/sys --exclude=/dev --exclude=/proc --exclude=/tmp --exclude=/mnt --exclude=/media --include=/mnt/media / s3+http://myUniqueBucketName

export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=

You'll generally want to exclude /proc and /tmp from backups, since they contain constantly changing volatile runtime data that gets wiped every time you shutdown. /dev is full of file representations of your hardware, and /sys... I don't know, it screws up my backups, and has something to do with driver-daemon communication. I exclude /mnt and /media because I don't want to back up my external drives, and I re-include /mnt/media because that's my photos, music, and video partition. Duplicity supports all the rsync options, including fancy wildcards/globbing and the same-filesystem directive.

Notes on duplicity

  • The bucket name must be unique among all S3 buckets owned by all S3 users. It does not need to exist before use.
  • Duplicity supports a number of protocols: file, ftp, scp, and s3+http are the most relevant here.
  • Switch the file path and the URL, and you're doing a restore.
  • When using cryptographic signing, duplicity will ask you to type in your key's passphrase twice at the command line. Somewhat annoying if yours is 30-odd characters.

Responses: 6 so far

  1. randy says:

    You can export the PASSPHRASE environment variable in your script and avoid typing it in on the command line. GPG will read that.

  2. Tim McCormack says:

    @randy: That's... not quite optimal from a security standpoint. I suppose it's fine if you generate a key *just* for encrypting your S3 backup, and the only reason you are encrypting is to keep the data safe from rogue employees at Amazon.

  3. randy says:

    It would also keep it private from anyone sniffing the network traffic as well. And this key is soley for the backup.

    I suppose one could blowfish the passphrase, store the encrypted string in a configuration file, write a small C/C++ application with the blowfish enryption/decryption key hard coded in the source and call that form the shell script.

    export PASSPHRASE=$(/usr/local/bin/gpg-passphrase-fetcher)

    Heck, you might as well write a C/C++ (or anyother compiled language) wrapper script rather than use the shell.

    I've implimented other security measures to keep my server private and safe. If someone breaks in and gets access to my gpg passphrase located in one root owned file, my data sitting on S3 is the least of my worries.

  4. David says:

    Getting the following message:
    No signatures found, switching to full backup.
    Traceback (most recent call last):
    File "/usr/bin/duplicity", line 425, in
    if __name__ == "__main__": with_tempdir(main)
    File "/usr/bin/duplicity", line 421, in with_tempdir
    fn()
    File "/usr/bin/duplicity", line 414, in main
    if not sig_chain: full_backup(col_stats)
    File "/usr/bin/duplicity", line 150, in full_backup
    bytes_written = write_multivol("full", tarblock_iter, globals.backend)
    File "/usr/bin/duplicity", line 94, in write_multivol
    backend.put(tdp, dest_filename)
    File "/usr/lib/python2.5/site-packages/duplicity/backends.py", line 724, in put
    self.bucket = self.conn.create_bucket(self.bucket_name)
    File "/usr/lib/python2.5/site-packages/boto/s3/connection.py", line 103, in create_bucket
    raise S3ResponseError(response.status, response.reason, body)
    boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden

    InvalidAccessKeyIdThe AWS Access Key Id you provided does not exist in our records.D87F724E05EC9FADID18Z1BSNQWF5XCTYWC3R2CuO/qHmT4X1SluELB6qD9K7mZnacGbRDP0ou8btTqu3vKk63EMkkjzbjdoR+FUnX

    Even though I am using correct ID and Secret Key

    Any ideas

  5. Tim McCormack says:

    @David: I Googled "InvalidAccessKeyId" and found a couple of threads that indicate that it is a sporadic error on Amazon's side.

  6. brian baggett dot com » Ubuntu Roundup: backups, encryption, and what’s new says:

    [...] Incremental (and encrypted) backups for your filesystem to Amazon’s S3 with duplicity. Read more here and here. [...]