Difference between revisions of "Backups"

From Nuclear Physics Group Documentation Pages
Jump to navigationJump to search
 
Line 33: Line 33:
 
===Client Configuration===
 
===Client Configuration===
 
An important aspect of the current backup system is that it requires ssh to get through to the node you want to backup. The rsync program uses ssh to pull data from the node. This requires a special setup for ssh on each node to allow this to happen. Each node has a file /etc/rsync-backup.conf that controls what is backed up for that node. The backup system then executes a remote command: <code>rsync --server --daemon --config=/etc/rsync-backup.conf .</code> on the node. In the ssh configuration file (''/root/.ssh/authorized_keys'') the node has a special line for allowing this command to be executed by Lentil.  Don't forget to have the file ''/etc/rsync-backup.conf'' on each machine, and to have something meaningful in it.  ''/root/debug_rsync'' is also needed.
 
An important aspect of the current backup system is that it requires ssh to get through to the node you want to backup. The rsync program uses ssh to pull data from the node. This requires a special setup for ssh on each node to allow this to happen. Each node has a file /etc/rsync-backup.conf that controls what is backed up for that node. The backup system then executes a remote command: <code>rsync --server --daemon --config=/etc/rsync-backup.conf .</code> on the node. In the ssh configuration file (''/root/.ssh/authorized_keys'') the node has a special line for allowing this command to be executed by Lentil.  Don't forget to have the file ''/etc/rsync-backup.conf'' on each machine, and to have something meaningful in it.  ''/root/debug_rsync'' is also needed.
 +
 +
===Excluding Directory Content===
 +
You can exclude content in any directory in the /net/home directory tree by adding a file .rsync-filter with content like:
 +
  + ./README
 +
  - *
 +
 +
Which should exclude everything in the directory except the README file.
  
 
===Important to note===
 
===Important to note===

Latest revision as of 14:17, 13 July 2021

Current Backup System

NPG backups runs from the dedicated backup server Lentil.

Backups occur daily (usually around 2am), beginning with a copy from the previous day's backup (using hardlinks and preserving permissions). After the copy is complete, rsync is used to pull actual changes/additions/deletions from each machine, and updates the day's backup directory to reflect these changes.

The backup script is found at /usr/local/bin/rsync_backup.py. This script is periodically run by a daily chron script /etc/cron.daily/rsync_backup which sends out a notification e-mail to all admins containing the output of the backup script. Client machines must have Lentil's public SSH key, and Lentil must have the appropriate automount configuration.

rsync documentation

Adding a new drive

To put in a new (fresh) drive:

  1. Locate the oldest disk.
  2. Make sure it is not mounted.
  3. Open the appropriate drive door and slide out drive.
  4. Put new drive in. (there are 4 screws holding the drive in place).
  5. Slide it back in. Take note which Linux drive it registers as: /dev/sdb or /dev/sdc or /dev/sdd or /dev/sde
    NOTE: The order does NOT correspond with the slots, and this order can change after a reboot!
  6. Run /usr/local/bin/format_archive_disk.pl <disk no> <device>
    I.E: /usr/local/bin/format_archive_disk.p 29 /dev/sde
    See also rsync_snapshots
  7. Check that the drive is available: ls /mnt/npg-daily/29

With VERY LARGE BACKUP DRIVES

In this case you can run out of inodes. To check you can execute: "df --inodes". If it shows 100%, then you are in trouble. Too many small files, i.e. too many hard links, which is what we run into on our system. There is no solution without a reformat of the drive. See: All out of inodes If that happens, you can start deleting old backups, but that will only have the problem re-appear later. Instead, the drive should have been formatted with the -i to get more inode space. So rotate to the next drive, and next time, format better.

Troubleshooting

The backup script /usr/local/bin/rsync_backup.py can be launched with a "--debug" flag which will run the script but not actually do anything. It will instead print out the commands it would normally perform - which are extremely helpful when tracking down any issues that may arise.

Since the backup script uses rsync in a special way over ssh, you need to make sure that ssh isn't blocked. Check the target machine to make sure you can connect to it from lentil. Lentil has special access permissions for running only the backup script. Try running:

rsync -e "ssh -v -T -x -i /root/.ssh/rsync_id_rsa -e none" taro::

and look at any hints in the diagnostics. It is possible that for instance, the denyhost system is blocking lentil.

Client Configuration

An important aspect of the current backup system is that it requires ssh to get through to the node you want to backup. The rsync program uses ssh to pull data from the node. This requires a special setup for ssh on each node to allow this to happen. Each node has a file /etc/rsync-backup.conf that controls what is backed up for that node. The backup system then executes a remote command: rsync --server --daemon --config=/etc/rsync-backup.conf . on the node. In the ssh configuration file (/root/.ssh/authorized_keys) the node has a special line for allowing this command to be executed by Lentil. Don't forget to have the file /etc/rsync-backup.conf on each machine, and to have something meaningful in it. /root/debug_rsync is also needed.

Excluding Directory Content

You can exclude content in any directory in the /net/home directory tree by adding a file .rsync-filter with content like:

 + ./README
 - *

Which should exclude everything in the directory except the README file.

Important to note

  • Do NOT use disks smaller than 450 GB for backup!!, since those will not even fit one copy of what needs to be backed up.
  • The link /mnt/npg-daily-current must exist and point to an actual drive.

Issues/Improvements

Here is a nice little guide on incremental, hardlinked backups via rsync. He sets up some nice little tricks with NFS mounts so that users can access their stuff read-only for any backup that's actually stored. We should do this.

On lentil, pre-HDD-change, perl was obliterated at 0:16:00 on 2007-6-21. This date is BEFORE we started even looking at perl stuff. Its filesize was 0 bytes. A quick fix was to overwrite the 0-byte lentil perl binary with a copy of improv's perl binary. Using rpm to force reinstall perl-5.8.5 from yum's cache restored the correct version. The cause was later found to be due to the drive going bad. 2008-05-23: Since this has happened again, I've tar'd up the backup script, /etc/ssh/, and the automount configs and saved them to einstein:/root/lentil.tar.

Legacy Backups

The really old amanda-style backups are tar'ed, gzip'ed, and have weird header info. Looking at the head of them gives instructions for extraction. A script was written to extract and consolidate these backups. Shrinks hundreds of gigs down to tens of gigs, and zipping that shrinks it further. Very handy for OLD files we're not going to look at ever again.

Amanda backup consolidator

#!/bin/bash
# This script was designed to extract data from the old tape-style backups
# and put the data in an up-to-date (according to the backups) directory
# tree. We can then, either manually or a different script, tar that into a
# comprehensive backup. This should be quite a bit more space-efficient than
# incrementals.
# -----------------------------
# My first attempt at a "smart" bash script, and one that takes input.
# Probably not the best way to do it, but it works!
# ~Matt Minuti
if [ -z $1 ]
then
	echo "Syntax: amandaextract.sh [string to seach for]"
	echo "This script searches /mnt/tmp for files containing the"
	echo "given string, and does the appropriate extraction"
	exit
fi
				# Test to see if destination directory exists
if [ -d /mnt/tmp2/$1 ]		#
then				#
	echo "Directory /mnt/tmp2/$1 already exists."
else
	mkdir /mnt/tmp2/$1	# If it doesn't, make it!
fi
  
NPG=$( ls /mnt/tmp/ )		# Set where to look for backup tree
for i in $NPG; do		# Cycle through the folders in order to...
	cd /mnt/tmp/$i
	cd ./data/
	FILES=$( ls )		# Get a listing of files
	for j in $( echo "$FILES" | grep "\.$1" ) ; do	
		echo "Extracting file $j from $( pwd ) to /mnt/tmp2/$1"
		dd if=$j bs=32k skip=1 | /usr/bin/gzip -dc | /bin/tar -xf - -C /mnt/tmp2/$1
	done	# The above for statement takes each matching file and extracts
done		# it to the desired location.

An example of how I've been using it is I have an amanda backup drive on /mnt/tmp, and an empty drive (or one with enough space) on /mnt/tmp2. Running amandabackup.sh einstein will go into each folder in /mnt/tmp, looking for anything with a name containing "einstein," and doing the appropriate extraction into /mnt/tmp2/einstein. The effect is extracting the oldest backups first, and then overwriting with newer and newer partial backups, ending finally with the most up-to-date backup in that amanda set. I then tar and bzip the resultant folder structure to save even more space.

Emergency Backup

An easy way to make a backup of, say, lentil when its root drive is dying, is to use the program dd_rescue from a rescue disc, to copy the drive contents to another. The backup can then be mounted as a loopback device to access its contents.

This came in handy in the case of a certain computer, say, lentil. We forgot to copy the autofs scripts and ssh keys, but it wasn't a big deal since we just mounted the drive image and bam! Everything was nice.