TIP: Automating Website Backups – Part IIb (Reducing Backup Size Contd.)

Last time, I had told you about how to reduce the backup size. Well, this is a short note further developing on the approaches discussed there. Basically, telling tar to backup only those files which have changed. For this, Linux has a command “find” which I am very fond of. You can give it a option “-newer” followed by a f”filename” and it will return the names of the files that are newer than filename.


find ~ -type f -newer ~/backups/backup_x.tgz > files.txt

So, the command given above will find all the files that have changed since you took the backup “backup_x.tgz” and store those filenames and paths into “files.txt”. The “-type f” option makes sure that only filenames are listed and not directory names, because tar creates a lot of issues when presented with directory names (Explore yourself about this).

Now, all we have to do is give this “files.txt” to tar as an input to tell it which files to archive.

tar cvzpGf ~/backups/backup_$date.tgz -T files.txt

The “-T files.txt” option makes this happen. Moreover, I have introduced 2 new options here, that were not present in our last part. They are:

  • p – Tells tar to preserve file permissions
  • G – Tells tar to ignore any file read errors etc and continue

Apart from this, you can also take a look at the “-mtime x” option for find command which lets you specify to list files which have changed in past x days. There are other similar options available for find. Look at “man find” and take your pick. Similar options exist for tar, but I have had a lot of weird issues using them, so I’d recommend sticking with this two step process of “find” followed by “tar”.

Now, the above mentioned commands and options can be used in innumerous ways and combinations to achieve your perfect balance of space and ease of use etc for backups. I’ll list down a sample script here, that will make a full backup on every first day of the week, and then make incremental backups over each day for rest 6 days. So, you’ll save a lot of space (more than 5 times), but you will have to use all 7 backup files to make a full restore. (I’m listing just the backup part, you can add the “mutt” command yourself, for e-mailing as mentioned in Part I)

#!/bin/bash
date=`date +%w`
if [ ! -e "test/a" ] || [ -z "$date" ]; then
tar cvzpGf ~/backups/backup_`date +%w`.tgz ~/public_html
echo inif
else
date2=$(($date-1))
find ~ -type f -newer ~/backups/backup_$date2 > files.txt
tar cvzpGf ~/backups/backup_$date.tgz -T files.txt
fi

That’s it for today. Lemme know if you have any doubts, or if you would like to see any other questions answered in this series. The question that will be answered next time is:

Q2: Cron? Using tar, making up the script file is enough command line for me. Isn’t there an easy way?

[tags]attachment, automate site backup, automate website backup, automated backup, automatically e-mail site backup, backup, cron, crontab, e-mail limit, site backup, tar, Webhosting, webhosting tip, website backup, webspace limit, find, mtime, newer, incremental backup[/tags]

3 comments to TIP: Automating Website Backups – Part IIb (Reducing Backup Size Contd.)

  • Can you explain regarding this point:
    “only filenames are listed … because tar creates a lot of issues when presented with directory names”

    What about duplicate names in different directories do they not get backed-up?

  • Vince,
    1. When tar is given a folder name, it will create a recursive tarball of that folder (and all files/folders inside it), then it’ll again keep adding the individual files that it is given by find. So, the final tarball size will bloat up with a lot of duplicate copies of stuff. Though it won’t create much issues while unpacking even if u do this, because files will get overwritten again and again and you’ll finally get a sane output.
    2. About duplicate names in diff directories, they will get backed up because find doesn’t just list the names, it lists the complete path.

  • [...] – Part II (減少備份檔的大小)】  【自動備份網站 – Part IIb (減少備份檔的大小-續)】  【自動備份網站 – Part III (Cron [...]

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

SUBSCRIBE!





Tweet