Difference between revisions of "Backups"

From Nuclear Physics Group Documentation Pages
Jump to navigationJump to search
Line 17: Line 17:
 
[http://www.mikerubel.org/computers/rsync_snapshots/ Here] is a nice little guide on incremental, hardlinked backups via rsync. He sets up some nice little tricks with NFS mounts so that users can access their stuff read-only for any backup that's actually stored. We should do this.
 
[http://www.mikerubel.org/computers/rsync_snapshots/ Here] is a nice little guide on incremental, hardlinked backups via rsync. He sets up some nice little tricks with NFS mounts so that users can access their stuff read-only for any backup that's actually stored. We should do this.
  
<font color="red">On lentil, perl was obliterated at 0:16:00 on 2007-6-21. This date is BEFORE we started even looking at perl stuff. Its filesize was 0 bytes. A quick fix was to overwrite the 0-byte lentil perl binary with a copy of improv's perl binary.  Using rpm to force reinstall perl-5.8.5 from yum's cache restored the correct version. </font>
+
On lentil, <font color="red">pre RHEL5</font>perl was obliterated at 0:16:00 on 2007-6-21. This date is BEFORE we started even looking at perl stuff. Its filesize was 0 bytes. A quick fix was to overwrite the 0-byte lentil perl binary with a copy of improv's perl binary.  Using rpm to force reinstall perl-5.8.5 from yum's cache restored the correct version.
  
 
== Legacy Backups ==
 
== Legacy Backups ==

Revision as of 18:31, 29 June 2007

NPG backups runs from the dedicated backup server: Lentil, which has 4 hot-swappable drive bays, generally containing SATA drives.

To put in a new (fresh drive):

  1. Locate the oldest disk.
  2. Make sure it is not mounted.
  3. Open the appropriate drive door and slide out drive.
  4. Put new drive in. (there are 4 screws holding the drive in place).
  5. Slide it back in. Take note which Linux drive it registers as: /dev/sdb or /dev/sdc or /dev/sdd or /dev/sde
    NOTE: The order does NOT correspond with the slots, and this order can change after a reboot!
  6. Run /usr/local/bin/format_archive_disk.pl <disk no> <device>
    I.E: /usr/local/bin/format_archive_disk.p 29 /dev/sde[1]
  7. Check that the drive is available: ls /mnt/npg-daily/29

Current Backup System

Newer backups do something involving hard-linking files that haven't changed between backup sessions. Seems like a good idea, but we need to learn exactly how it works. For old backups in the new format, consolidation works by putting all the data from each backup session into one place, overwriting with the newest data. Nobody's going to look for a specific version of a file they had in 2004 that only existed for three days, so this method is relatively safe, in terms of data retention.

The script that does the backing-up is /usr/local/bin/rsync_backup.pl and the script that periodically runs it and sends out a notification e-mail is /etc/cron.daily/rsync_backup. rsync_backup.pl determines what disk to put the backup onto, etc.

Here is a nice little guide on incremental, hardlinked backups via rsync. He sets up some nice little tricks with NFS mounts so that users can access their stuff read-only for any backup that's actually stored. We should do this.

On lentil, pre RHEL5perl was obliterated at 0:16:00 on 2007-6-21. This date is BEFORE we started even looking at perl stuff. Its filesize was 0 bytes. A quick fix was to overwrite the 0-byte lentil perl binary with a copy of improv's perl binary. Using rpm to force reinstall perl-5.8.5 from yum's cache restored the correct version.

Legacy Backups

The really old amanda-style backups are tar'ed, gzip'ed, and have weird header info. Looking at the head of them gives instructions for extraction. A script was written to extract and consolidate these backups. Shrinks hundreds of gigs down to tens of gigs, and zipping that shrinks it further. Very handy for OLD files we're not going to look at ever again.

Amanda backup consolidator

#!/bin/bash
# This script was designed to extract data from the old tape-style backups
# and put the data in an up-to-date (according to the backups) directory
# tree. We can then, either manually or a different script, tar that into a
# comprehensive backup. This should be quite a bit more space-efficient than
# incrementals.
# -----------------------------
# My first attempt at a "smart" bash script, and one that takes input.
# Probably not the best way to do it, but it works!
# ~Matt Minuti
if [ -z $1 ]
then
	echo "Syntax: amandaextract.sh [string to seach for]"
	echo "This script searches /mnt/tmp for files containing the"
	echo "given string, and does the appropriate extraction"
	exit
fi

ls /mnt/tmp2/$1			# Test to see if destination directory exists
if [ $? = 1 ]			#
then				#
	mkdir /mnt/tmp2/$1	# If it doesn't, make it!
fi
  
NPG=$( ls /mnt/tmp/ )		# Set where to look for backup tree
for i in $NPG; do		# Cycle through the folders in order to...
	cd /mnt/tmp/$i
	cd ./data/
	FILES=$( ls )		# Get a listing of files
	for j in $( echo "$FILES" | grep "\.$1" ) ; do	
		echo "Extracting file $j from $( pwd ) to /mnt/tmp2/$1"
		dd if=$j bs=32k skip=1 | /usr/bin/gzip -dc | /bin/tar -xf - -C /mnt/tmp2/$1
	done	# The above for statement takes each matching file and extracts
done		# it to the desired location.

An example of how I've been using it is I have an amanda backup drive on /mnt/tmp, and an empty drive (or one with enough space) on /mnt/tmp2. Running amandabackup.sh einstein will go into each folder in /mnt/tmp, looking for anything with a name containing "einstein," and doing the appropriate extraction into /mnt/tmp2/einstein. The effect is extracting the oldest backups first, and then overwriting with newer and newer partial backups, ending finally with the most up-to-date backup in that amanda set. I then tar and bzip the resultant folder structure to save even more space.