Scalable backup solutions with Rackspace Cloud Files

It’s been a while since I’ve blogged about code as recently I’ve been learning the joys of running a business. However, I recently had to come up with a backup solution for all our code. This turned into quite a project and I’m really happy with the result!

Background

I don’t like doing backups but obviously you gotta have them! My tech company Built By Giants has several gigs of code and several MySQL and MSSQL databases on different Rackspace Cloud servers. We need to keep this stuff backed up.

In particular we need a backup solution that is:

  • Painless
  • Infinitely scalable
  • Unix + Windows
  • Affordable
  • No physical media
  • On-demand access to files
  • Can be rescheduled at lower frequencies once a project moves from development to support + maintenance
  • Painless!!!

Planning

Our codebase is running off SVN (this solution could be easily modified for Git or Mercurial). The server is a Fedora based cloud server. We need to backup several different repositories, at different schedule intervals.

I chose Rackspace Cloud Files for online storage. Amazon S3 actually seems like a better service, however we’re already on Rackspace Cloud and Files still suits our purposes brilliantly. At $0.15/Gb/month you can’t go wrong for the price!

To back up a repository, we first need to export the whole codebase (including revision history) into a dump file. Then we need to push this file up to Cloud Files. (incremental backups would be far more efficient but I haven’t come up with a solution for this, yet…)

Backing up MySQL or MSSQL is pretty much the same thing. Dump the database, then push to Cloud Files.

Process

Our SVN codebase is running on Unix so we’ll use cron for the scheduling.

We’ll write a simple bash script to export the SVN repository to a dump file.

We also need a method to push the dump file to Rackspace Cloud Files. There is already a good solution online for pushing files with Duplicity however I just want to push the raw files. Fortunately Rackspace Cloud Files provides APIs so I’ll write a simple python script for this.

SVN Backup

Let’s get started! First, install the Cloud Files Python API.

git clone https://github.com/rackspace/python-cloudfiles.git
cd python-cloudfiles
python setup.py install

Create the backup script. I usually place this file in my /etc/cron.daily folder for daily backups. Make a different backup script for each repo. Let’s say we’re working with a repo called “myproject”:

/etc/cron.daily/svn-myproject-backup.sh

#!/bin/bash

# Script Variables
export DATE=$(date "+%Y%m%d")
export SVNREPO='myproject'

# Create temp folder for dump files
mkdir /backup
cd /backup
mkdir ${SVNREPO}
cd ${SVNREPO}

# Dump SVN repo
svnadmin dump /var/www/svn/${SVNREPO} > /backup/${SVNREPO}/${SVNREPO}_${DATE}.dump

# Now call python script to upload to Cloud Files
/backup/cloudfiles-backup.py --path=/backup/${SVNREPO}/ --file=${SVNREPO}_${DATE}.dump

# Finally delete the old file
rm -rf /backup/${SVNREPO}/${SVNREPO}_${DATE}.dump

Now, create the python script that handles the file upload.

/backup/cloudfiles-backup.py

#!/usr/bin/python

import cloudfiles
import sys

# Connection variables

rsc_username = '' # your rackspace cloud username
rsc_apikey = '' # your rackspace cloud password
rsc_container = '' # your rackspace cloud files container

# Get filename

for arg in sys.argv:
	if(arg.find('--file') != -1):
		filename = arg[7:]
	if(arg.find('--path') != -1):
		path = arg[7:]

# Open Rackspace Cloud connection and access the container

conn = cloudfiles.get_connection(rsc_username, rsc_apikey)
cont = conn.get_container(rsc_container)

# Upload the file

obj  = cont.create_object(filename)
obj.load_from_filename(path + filename)
print "File Uploaded!"
if cont.is_public == False:
    cont.make_public()

And of course, set permissions on both these files to full executable:

chmod 777 /etc/cron.daily/svn-myproject-backup.sh /backup/cloudfiles-backup.py

MySQL Backup

Now we need to export and backup our MySQL databases. Fortunately this is very similar to the process above, except that we need a new cron script.

I’ll make a new cron script for each MySQL database backup. That way we can schedule each one differently. Following the above example, let’s say we’re working with the ‘myproject’ database:

/etc/cron.daily/sql-myproject-backup.sh

#!/bin/bash

# Script variables
export DATE=$(date "+%Y%m%d")
export MYSQLDB='myproject'
export MYSQLUSER='your mysql user goes here'
export MYSQLPASS='your mysql pass goes here'

# Create temp folder for dump files
mkdir /backup
cd /backup
mkdir ${MYSQLDB}
cd ${MYSQLDB}

# Export MySQL database
mysqldump --user=${MYSQLUSER} --password=${MYSQLPASS} ${MYSQLDB} > /backup/${MYSQLDB}/${MYSQLDB}_${DATE}.sql

# Now call python script to upload to Cloud Files
/backup/cloudfiles-backup.py --path=/backup/${MYSQLDB}/ --file=${MYSQLDB}_${DATE}.sql

# Finally delete the dump file
rm -rf /backup/${MYSQLDB}/${MYSQLDB}_${DATE}.sql

That’s all folks!

This solution isn’t perfect for everyone, but I’m hoping you can use it as a starting point. It should be fairly simple to tinker with these scripts to customize them to your needs.

I still need to include Powershell scripts for pushing MSSQL backups. This is an entirely different can of worms and I’ll add these scripts later.

If you can suggest any ways to improve this process (ie. incremental backups) please leave comments below!

This entry was posted in Code. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">