It has been quite a while since I spent some time administering my personal websites. My sites are hosted using GoDaddy's shared host, which isn't as bad as some of the reviews make it out to be. The big thing that I have been putting off is implementing a reliable and automated backup system. My previous strategy for backups was to simply dump the databases and copy down all of the files once a month, if I remembered. It would not be easy to replace the content of my websites if it were to be lost.
Monday, July 26, 2010
The first step on developing my backup strategy was to clean up my content and current installs. I deleted some web applications and code that I was playing with, but no longer used. These applications were some things that I was playing around with, but never did anything with. Once I did that I made a backup of everything by hand and then upgraded all of my web apps to the latest version. I was now ready to develop my automated system.
The first step was to get a local backup on the web server itself. This was done through shell access to the server using SSH and developing a shell script that performed all of the necessary steps. The two things that need to be backed up were the databases and the actual files. The MySQL databases are simple to backup using the mysqldump command and compressing the output and storing it to a file. The files can be backed up using a simple tar command which can also compress the files down to a reasonable size.
Once all of my database and files were compressed and organized, I took them all and packaged them up in another tar which was my final backup, a single file. This script was set to run as a cron job and the automated backup process was half way complete. The only thing left to do was to find a way to transfer the backup off-site.
My first though was to copy the backup to my personal Linux server. I eventually found a way to automate this process using scp and was happy with the results. It would have been fairly simple to automate this process, but it just didn't seem to be the solution I was looking for. The solution I went with was to store the backups on Amazon S3 using s3-bash. S3 provided very cheap storage and was easily accessed using open source tools that made the process of transferring files very painless. My estimates place the total cost of backups that will be stored on S3 at less than $0.40 a month!
Deciding to use a paid service meant that it would not be logical to store all of my backups indefinitely, and I needed to come up with a plan on how long to keep each backup. I also needed some way to delete backups after they were no longer needed. The solution I came up with was extremely simple. The backup script would run every night and generate and transfer the complete backup, about 45 MB, to the S3 servers. My plan was to keep the backup created on the first of each month for a year, this way I'll avoid data loss if something went wrong that was a long term problem. Additionally, I would keep a backup for each day of the week helping me to avoid loss of data in the short term. After 12 months of operation I would have a total of 19 backup files that would continue to be replaced as time went on. The old backups would not need to be deleted, because by uploading a file with the same key (or file name) it over writes the older version, thereby deleting the old backup.
My backup script has only been running for a few days, but I am very pleased with the results. I still want to do some testing to insure that by backups are comprehensive, but initial inspection shows reveals no problems. This set it and forget it approach is exactly what I was hoping to implement.
Posted by Jared Hatfield at 11:47 PM