Difference between revisions of "Sysadmin Checklist"

From Nuclear Physics Group Documentation Pages
Jump to navigationJump to search
(Created page with "These are the most basic things you need to do, if you can every day. # '''Check the backups:''' ## Did backup run? (you should have gotten an email) ## Did all the machines,...")
 
 
(One intermediate revision by one other user not shown)
Line 20: Line 20:
 
## Does "nslookup taro" return 10.0.0.247 while "nslookup taro.unh.edu 10.0.0.253" returns quickly with 132.177.88.86. If it takes too long, named is not working properly.
 
## Does "nslookup taro" return 10.0.0.247 while "nslookup taro.unh.edu 10.0.0.253" returns quickly with 132.177.88.86. If it takes too long, named is not working properly.
 
# '''Check the Web Server'''
 
# '''Check the Web Server'''
## Is it running, is it reachable, it is getting attached?
+
## Is it running, is it reachable, it is getting attacked?
 +
# Server room:
 +
## Occasionally go to the server room. Is the airco working, any beeping noises, all machines powered, no water on the floor, no fire.....

Latest revision as of 21:20, 26 September 2014

These are the most basic things you need to do, if you can every day.

  1. Check the backups:
    1. Did backup run? (you should have gotten an email)
    2. Did all the machines, especially Gourd, Einstein, Roentgen, get backed up completely?
    3. Do we need to insert another disk and file an old one?
  2. Check the mail system:
    1. Is einstein up and can port 25 be reached (i.e. nc -z -w5 einstein.unh.edu 25; echo $? returns "succeeded")
    2. If failtoban or denyhosts runing properly?
    3. Check the einstein /var/log/maillog and /var/log/messages. Anything odd, any errors, break in attempts that may have been successful?
      1. This is where spunk can be really helpful, unfortunately nobody has been maintaining spunk.
      2. Don't kill yourself looking at the log, but please do check them at times for oddities. Make a log of the oddities that are actually normal so others know what to look for. Link that log here.
  3. Check VMs
    1. Are the VMs running on Gourd?
    2. Is there sufficient disk space for / on all machines?
    3. What are the CPU use levels? Are they normal, is there a machine using too much resources, CPU or memory?
  4. Check LDAP:
    1. Does the LDAP server (einstein) return required information quickly?
  5. Check the DNS server(s):
    1. Does "nslookup taro" return 10.0.0.247 while "nslookup taro.unh.edu 10.0.0.253" returns quickly with 132.177.88.86. If it takes too long, named is not working properly.
  6. Check the Web Server
    1. Is it running, is it reachable, it is getting attacked?
  7. Server room:
    1. Occasionally go to the server room. Is the airco working, any beeping noises, all machines powered, no water on the floor, no fire.....