Duplicity + Amazon S3 = incremental encrypted remote backup

Update: I haven't really been using this, since the bandwidth required is a bit... excessive. I think I'll stick to duplicity + external hard drive.

Duplicity is a backup program that only backs up the files (and parts of files) that have been modified since the last backup. Built on FLOSS (rsync, GnuPG, tar, and rdiff), it allows efficient, locally encrypted, remote backups.

Amazon S3 is a web service that provides cheap, distributed, redundant, web-accessible storage. S3 currently charges only $0.15 per GB-month storage and $0.10 per GB upload. The API is based on HTTP requests such as GET, POST, PUT, and DELETE.

The following is a description of how I made use of these to back up my laptop, which runs Ubuntu Feisty Fawn.

Installation

These packages are in the Ubuntu repositories:

  • duplicity: Remote encrypted incremental backup
  • python-boto: Allow Python to talk to S3

(Edit: Packages now available in repository, so no building necessary.)

Sign up for S3

Sign up for an Amazon Web Services account, then add the Simple Storage Service. This will involve giving them your credit card number, to be charged monthly.

Make sure to generate and write down your Secret Access Key, along with your Access ID Key. You'll need those for any application that interacts with S3.

Encryption keys

You should encrypt your files so that they are safe from prying eyes in transit and in storage. Signing them protects the files from alteration in storage or transit.

Decide on a GPG key to use for encryption and signing. (The one I use is 3BBF4E12, my main key.) Make sure your encryption/signing key is in your GPG keyring. (You can use separate keys for encryption and signing, but I haven't in this case.)

Optional: Visual S3 interface

JetS3t's Cockpit is a decent tool for viewing and managing your S3 buckets and objects. There is an applet version, but I use the standalone application instead.

Unzip the download to a permanent location, and find the bin/cockpit.sh file. Allow it to be executable (chmod u+x cockpit.sh). Edit it to set the environmental variables properly:

JETS3T_HOME=/home/myusername/programs/JetS3t
JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun/jre

When you run cockpit.sh, you will be presented with a somewhat complicated login screen. The simplest is the Direct Login tab. Local Folder login stores authentication info locally in a password-protected file.

Learn and configure

After scanning over man duplicity and playing with commands like duplicity /home/myusername/junk file:///home/myusername/junkBackup, create a shell script to run duplicity backups automatically. Here's what mine looks like:

export AWS_ACCESS_KEY_ID=myAccessKeyID
export AWS_SECRET_ACCESS_KEY=mySecretAccessKey

duplicity --encrypt-key=myEncryptionKeyFingerprint --sign-key=mySigningKeyFingerprint --exclude=/sys --exclude=/dev --exclude=/proc --exclude=/tmp --exclude=/mnt --exclude=/media --include=/mnt/media / s3+http://myUniqueBucketName

export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=

You'll generally want to exclude /proc and /tmp from backups, since they contain constantly changing volatile runtime data that gets wiped every time you shutdown. /dev is full of file representations of your hardware, and /sys... I don't know, it screws up my backups, and has something to do with driver-daemon communication. I exclude /mnt and /media because I don't want to back up my external drives, and I re-include /mnt/media because that's my photos, music, and video partition. Duplicity supports all the rsync options, including fancy wildcards/globbing and the same-filesystem directive.

Notes on duplicity

  • The bucket name must be unique among all S3 buckets owned by all S3 users. It does not need to exist before use.
  • Duplicity supports a number of protocols: file, ftp, scp, and s3+http are the most relevant here.
  • Switch the file path and the URL, and you're doing a restore.
  • When using cryptographic signing, duplicity will ask you to type in your key's passphrase twice at the command line. Somewhat annoying if yours is 30-odd characters.

Responses: 11 so far

  1. randy says:

    You can export the PASSPHRASE environment variable in your script and avoid typing it in on the command line. GPG will read that.

  2. Tim McCormack says:

    @randy: That's... not quite optimal from a security standpoint. I suppose it's fine if you generate a key *just* for encrypting your S3 backup, and the only reason you are encrypting is to keep the data safe from rogue employees at Amazon.

  3. randy says:

    It would also keep it private from anyone sniffing the network traffic as well. And this key is soley for the backup.

    I suppose one could blowfish the passphrase, store the encrypted string in a configuration file, write a small C/C++ application with the blowfish enryption/decryption key hard coded in the source and call that form the shell script.

    export PASSPHRASE=$(/usr/local/bin/gpg-passphrase-fetcher)

    Heck, you might as well write a C/C++ (or anyother compiled language) wrapper script rather than use the shell.

    I've implimented other security measures to keep my server private and safe. If someone breaks in and gets access to my gpg passphrase located in one root owned file, my data sitting on S3 is the least of my worries.

  4. David says:

    Getting the following message:
    No signatures found, switching to full backup.
    Traceback (most recent call last):
    File "/usr/bin/duplicity", line 425, in
    if __name__ == "__main__": with_tempdir(main)
    File "/usr/bin/duplicity", line 421, in with_tempdir
    fn()
    File "/usr/bin/duplicity", line 414, in main
    if not sig_chain: full_backup(col_stats)
    File "/usr/bin/duplicity", line 150, in full_backup
    bytes_written = write_multivol("full", tarblock_iter, globals.backend)
    File "/usr/bin/duplicity", line 94, in write_multivol
    backend.put(tdp, dest_filename)
    File "/usr/lib/python2.5/site-packages/duplicity/backends.py", line 724, in put
    self.bucket = self.conn.create_bucket(self.bucket_name)
    File "/usr/lib/python2.5/site-packages/boto/s3/connection.py", line 103, in create_bucket
    raise S3ResponseError(response.status, response.reason, body)
    boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden

    InvalidAccessKeyIdThe AWS Access Key Id you provided does not exist in our records.D87F724E05EC9FADID18Z1BSNQWF5XCTYWC3R2CuO/qHmT4X1SluELB6qD9K7mZnacGbRDP0ou8btTqu3vKk63EMkkjzbjdoR+FUnX

    Even though I am using correct ID and Secret Key

    Any ideas

  5. Tim McCormack says:

    @David: I Googled "InvalidAccessKeyId" and found a couple of threads that indicate that it is a sporadic error on Amazon's side.

  6. brian baggett dot com » Ubuntu Roundup: backups, encryption, and what’s new says:

    [...] Incremental (and encrypted) backups for your filesystem to Amazon’s S3 with duplicity. Read more here and here. [...]

  7. David Soergel says:

    I've written a similar tool that you may want to check out: http://dev.davidsoergel.com/trac/s3napback/

  8. Kearney says:

    Is there anyway to restore and overwrite everything in a directory?

    So, for instance, I have a /tmp/important/ directory with dir1, and dir2 in it.

    I delete dir2. I want to restore. Currently, I have to delete /tmp/important/ and then issue the duplicity restore.

    Is there a way to force a restore?

  9. Sorin Pohontu says:

    You can take a look on article about backing up an Plesk server on Amazon S3: http://sandbox.frontline.ro/2008/10/31/plesk-backup-to-amazon-s3-testing-phase
    As today, I think JungleDisk is the most reliable solution for mounting an S3 bucket.

    I'll update my blog with some info about JungleDisk implementation.

  10. Suave’s Blog » Blog Archive » 自动服务器文件备份到S3 says:

    [...] 另外还有一个方案,有兴趣的同学可以试试 [...]

  11. Encrypted remote backup with Duplicity — The Chronicles of Omega says:

    [...] written by wladimir, on May 25, 2010 3:36:00 PM. I've been using Duplicity for a while now. Duplicity is a bandwidth-efficient tool to do encrypted incremental remote backups via SSH, rsync, webdav and many others, for example Amazon S3. [...]