Sysadmin Checklist
From Nuclear Physics Group Documentation Pages
Jump to navigationJump to searchThese are the most basic things you need to do, if you can every day.
- Check the backups:
- Did backup run? (you should have gotten an email)
- Did all the machines, especially Gourd, Einstein, Roentgen, get backed up completely?
- Do we need to insert another disk and file an old one?
- Check the mail system:
- Is einstein up and can port 25 be reached (i.e. nc -z -w5 einstein.unh.edu 25; echo $? returns "succeeded")
- If failtoban or denyhosts runing properly?
- Check the einstein /var/log/maillog and /var/log/messages. Anything odd, any errors, break in attempts that may have been successful?
- This is where spunk can be really helpful, unfortunately nobody has been maintaining spunk.
- Don't kill yourself looking at the log, but please do check them at times for oddities. Make a log of the oddities that are actually normal so others know what to look for. Link that log here.
- Check VMs
- Are the VMs running on Gourd?
- Is there sufficient disk space for / on all machines?
- What are the CPU use levels? Are they normal, is there a machine using too much resources, CPU or memory?
- Check LDAP:
- Does the LDAP server (einstein) return required information quickly?
- Check the DNS server(s):
- Does "nslookup taro" return 10.0.0.247 while "nslookup taro.unh.edu 10.0.0.253" returns quickly with 132.177.88.86. If it takes too long, named is not working properly.
- Check the Web Server
- Is it running, is it reachable, it is getting attacked?
- Server room:
- Occasionally go to the server room. Is the airco working, any beeping noises, all machines powered, no water on the floor, no fire.....