It’s been a while since I’ve blogged about code as recently I’ve been learning the joys of running a business. However, I recently had to come up with a backup solution for all our code. This turned into quite a project and I’m really happy with the result!
Background
I don’t like doing backups but obviously you gotta have them! My tech company Built By Giants has several gigs of code and several MySQL and MSSQL databases on different Rackspace Cloud servers. We need to keep this stuff backed up.
In particular we need a backup solution that is:
Painless
Infinitely scalable
Unix + Windows
Affordable
No physical media
On-demand access to files
Can be rescheduled at lower frequencies once a project moves from development to support + maintenance
Painless!!!
Planning
Our codebase is running off SVN (this solution could be easily modified for Git or Mercurial). The server is a Fedora based cloud server. We need to backup several different repositories, at different schedule intervals.
I chose Rackspace Cloud Files for online storage. Amazon S3 actually seems like a better service, however we’re already on Rackspace Cloud and Files still suits our purposes brilliantly. At $0.15/Gb/month you can’t go wrong for the price!
To back up a repository, we first need to export the whole codebase (including revision history) into a dump file. Then we need to push this file up to Cloud Files. (incremental backups would be far more efficient but I haven’t come up with a solution for this, yet…)
Backing up MySQL or MSSQL is pretty much the same thing. Dump the database, then push to Cloud Files.
Process
Our SVN codebase is running on Unix so we’ll use cron for the scheduling.
We’ll write a simple bash script to export the SVN repository to a dump file.
We also need a method to push the dump file to Rackspace Cloud Files. There is already a good solution online for pushing files with Duplicity however I just want to push the raw files. Fortunately Rackspace Cloud Files provides APIs so I’ll write a simple python script for this.
git clone https://github.com/rackspace/python-cloudfiles.git
cd python-cloudfiles
python setup.py install
Create the backup script. I usually place this file in my /etc/cron.daily folder for daily backups. Make a different backup script for each repo. Let’s say we’re working with a repo called “myproject”:
/etc/cron.daily/svn-myproject-backup.sh
#!/bin/bash
# Script Variables
export DATE=$(date "+%Y%m%d")
export SVNREPO='myproject'
# Create temp folder for dump files
mkdir /backup
cd /backup
mkdir ${SVNREPO}
cd ${SVNREPO}
# Dump SVN repo
svnadmin dump /var/www/svn/${SVNREPO} > /backup/${SVNREPO}/${SVNREPO}_${DATE}.dump
# Now call python script to upload to Cloud Files
/backup/cloudfiles-backup.py --path=/backup/${SVNREPO}/ --file=${SVNREPO}_${DATE}.dump
# Finally delete the old file
rm -rf /backup/${SVNREPO}/${SVNREPO}_${DATE}.dump
Now, create the python script that handles the file upload.
/backup/cloudfiles-backup.py
#!/usr/bin/python
import cloudfiles
import sys
# Connection variables
rsc_username = '' # your rackspace cloud username
rsc_apikey = '' # your rackspace cloud password
rsc_container = '' # your rackspace cloud files container
# Get filename
for arg in sys.argv:
if(arg.find('--file') != -1):
filename = arg[7:]
if(arg.find('--path') != -1):
path = arg[7:]
# Open Rackspace Cloud connection and access the container
conn = cloudfiles.get_connection(rsc_username, rsc_apikey)
cont = conn.get_container(rsc_container)
# Upload the file
obj = cont.create_object(filename)
obj.load_from_filename(path + filename)
print "File Uploaded!"
if cont.is_public == False:
cont.make_public()
And of course, set permissions on both these files to full executable:
Now we need to export and backup our MySQL databases. Fortunately this is very similar to the process above, except that we need a new cron script.
I’ll make a new cron script for each MySQL database backup. That way we can schedule each one differently. Following the above example, let’s say we’re working with the ‘myproject’ database:
/etc/cron.daily/sql-myproject-backup.sh
#!/bin/bash
# Script variables
export DATE=$(date "+%Y%m%d")
export MYSQLDB='myproject'
export MYSQLUSER='your mysql user goes here'
export MYSQLPASS='your mysql pass goes here'
# Create temp folder for dump files
mkdir /backup
cd /backup
mkdir ${MYSQLDB}
cd ${MYSQLDB}
# Export MySQL database
mysqldump --user=${MYSQLUSER} --password=${MYSQLPASS} ${MYSQLDB} > /backup/${MYSQLDB}/${MYSQLDB}_${DATE}.sql
# Now call python script to upload to Cloud Files
/backup/cloudfiles-backup.py --path=/backup/${MYSQLDB}/ --file=${MYSQLDB}_${DATE}.sql
# Finally delete the dump file
rm -rf /backup/${MYSQLDB}/${MYSQLDB}_${DATE}.sql
That’s all folks!
This solution isn’t perfect for everyone, but I’m hoping you can use it as a starting point. It should be fairly simple to tinker with these scripts to customize them to your needs.
I still need to include Powershell scripts for pushing MSSQL backups. This is an entirely different can of worms and I’ll add these scripts later.
If you can suggest any ways to improve this process (ie. incremental backups) please leave comments below!
Scalable backup solutions with Rackspace Cloud Files
It’s been a while since I’ve blogged about code as recently I’ve been learning the joys of running a business. However, I recently had to come up with a backup solution for all our code. This turned into quite a project and I’m really happy with the result!
Background
I don’t like doing backups but obviously you gotta have them! My tech company Built By Giants has several gigs of code and several MySQL and MSSQL databases on different Rackspace Cloud servers. We need to keep this stuff backed up.
In particular we need a backup solution that is:
Planning
Our codebase is running off SVN (this solution could be easily modified for Git or Mercurial). The server is a Fedora based cloud server. We need to backup several different repositories, at different schedule intervals.
I chose Rackspace Cloud Files for online storage. Amazon S3 actually seems like a better service, however we’re already on Rackspace Cloud and Files still suits our purposes brilliantly. At $0.15/Gb/month you can’t go wrong for the price!
To back up a repository, we first need to export the whole codebase (including revision history) into a dump file. Then we need to push this file up to Cloud Files. (incremental backups would be far more efficient but I haven’t come up with a solution for this, yet…)
Backing up MySQL or MSSQL is pretty much the same thing. Dump the database, then push to Cloud Files.
Process
Our SVN codebase is running on Unix so we’ll use cron for the scheduling.
We’ll write a simple bash script to export the SVN repository to a dump file.
We also need a method to push the dump file to Rackspace Cloud Files. There is already a good solution online for pushing files with Duplicity however I just want to push the raw files. Fortunately Rackspace Cloud Files provides APIs so I’ll write a simple python script for this.
SVN Backup
Let’s get started! First, install the Cloud Files Python API.
Create the backup script. I usually place this file in my /etc/cron.daily folder for daily backups. Make a different backup script for each repo. Let’s say we’re working with a repo called “myproject”:
/etc/cron.daily/svn-myproject-backup.sh
#!/bin/bash # Script Variables export DATE=$(date "+%Y%m%d") export SVNREPO='myproject' # Create temp folder for dump files mkdir /backup cd /backup mkdir ${SVNREPO} cd ${SVNREPO} # Dump SVN repo svnadmin dump /var/www/svn/${SVNREPO} > /backup/${SVNREPO}/${SVNREPO}_${DATE}.dump # Now call python script to upload to Cloud Files /backup/cloudfiles-backup.py --path=/backup/${SVNREPO}/ --file=${SVNREPO}_${DATE}.dump # Finally delete the old file rm -rf /backup/${SVNREPO}/${SVNREPO}_${DATE}.dumpNow, create the python script that handles the file upload.
/backup/cloudfiles-backup.py
#!/usr/bin/python import cloudfiles import sys # Connection variables rsc_username = '' # your rackspace cloud username rsc_apikey = '' # your rackspace cloud password rsc_container = '' # your rackspace cloud files container # Get filename for arg in sys.argv: if(arg.find('--file') != -1): filename = arg[7:] if(arg.find('--path') != -1): path = arg[7:] # Open Rackspace Cloud connection and access the container conn = cloudfiles.get_connection(rsc_username, rsc_apikey) cont = conn.get_container(rsc_container) # Upload the file obj = cont.create_object(filename) obj.load_from_filename(path + filename) print "File Uploaded!" if cont.is_public == False: cont.make_public()And of course, set permissions on both these files to full executable:
MySQL Backup
Now we need to export and backup our MySQL databases. Fortunately this is very similar to the process above, except that we need a new cron script.
I’ll make a new cron script for each MySQL database backup. That way we can schedule each one differently. Following the above example, let’s say we’re working with the ‘myproject’ database:
/etc/cron.daily/sql-myproject-backup.sh
#!/bin/bash # Script variables export DATE=$(date "+%Y%m%d") export MYSQLDB='myproject' export MYSQLUSER='your mysql user goes here' export MYSQLPASS='your mysql pass goes here' # Create temp folder for dump files mkdir /backup cd /backup mkdir ${MYSQLDB} cd ${MYSQLDB} # Export MySQL database mysqldump --user=${MYSQLUSER} --password=${MYSQLPASS} ${MYSQLDB} > /backup/${MYSQLDB}/${MYSQLDB}_${DATE}.sql # Now call python script to upload to Cloud Files /backup/cloudfiles-backup.py --path=/backup/${MYSQLDB}/ --file=${MYSQLDB}_${DATE}.sql # Finally delete the dump file rm -rf /backup/${MYSQLDB}/${MYSQLDB}_${DATE}.sqlThat’s all folks!
This solution isn’t perfect for everyone, but I’m hoping you can use it as a starting point. It should be fairly simple to tinker with these scripts to customize them to your needs.
I still need to include Powershell scripts for pushing MSSQL backups. This is an entirely different can of worms and I’ll add these scripts later.
If you can suggest any ways to improve this process (ie. incremental backups) please leave comments below!