Difference between revisions of "Sysadmin Todo List"

From Nuclear Physics Group Documentation Pages
Jump to navigationJump to search
Line 8: Line 8:
 
* Update Tomato to RHEL5 and check all services einstein currently provides. Then switch einstein <-> tomato, and then upgrade what was originally einstein. Look into making an einstein, tomato failsafe setup.
 
* Update Tomato to RHEL5 and check all services einstein currently provides. Then switch einstein <-> tomato, and then upgrade what was originally einstein. Look into making an einstein, tomato failsafe setup.
 
* Get on NPG mailing list.
 
* Get on NPG mailing list.
* I set up "splunk" on einstein (production 2.2.3 version) and taro (beta 3 v2). I like the beta's functionality bette, but it has a memory leak. Look for update to beta that fixes this and install update. (See: [http://www.splunk.com/base/forum:SplunkGeneral www.splunk.com/base/forum:SplunkGeneral]
+
* I set up "splunk" on einstein (production 2.2.3 version) and taro (beta 3 v2). I like the beta's functionality better, but it has a memory leak. Look for update to beta that fixes this and install update. (See: [http://www.splunk.com/base/forum:SplunkGeneral www.splunk.com/base/forum:SplunkGeneral] '''While this sounds like it could only be indirectly related to our issue, it does sound close enough and is the only official word on splunk's memory usage that I could find:'''[http://www.splunk.com/doc/latest/releasenotes/KnownIssues] <pre>When forwarding cooked data you may see the memory usage spike and kill the splunkd process. This should be fixed for beta 3.</pre>'''So, waiting for the next beta or later sounds like the best bet. I'm wary of running beta software on einstein, anyhow.'''
 +
 
 
== Ongoing ==
 
== Ongoing ==
 
* Clean up 202
 
* Clean up 202

Revision as of 14:37, 18 June 2007

Important

  • Printer queue for Copier: Konica Minolta Bizhub 750. IP=pita.unh.edu Seems like we need info from the Konica guy to get it set up on Red Hat and OS X. The installation documentation for the driver doesn't mention things like the passcode, because those are machine-specific. Katie says that if he doesn't come on Monday, she'll make an inquiry.
  • Printer in 323 is not hooked up to a dead network port. Actually managed to ping it. Can't print though.
  • Make and install power supply mount for taro
  • Figure out what network devices on jalapeno and tomato are doing what
  • Look into monitoring that is already installed: RAIDs etc.
  • Look into getting computers to pull scheduled updates from rhn when they check in.
  • Update Tomato to RHEL5 and check all services einstein currently provides. Then switch einstein <-> tomato, and then upgrade what was originally einstein. Look into making an einstein, tomato failsafe setup.
  • Get on NPG mailing list.
  • I set up "splunk" on einstein (production 2.2.3 version) and taro (beta 3 v2). I like the beta's functionality better, but it has a memory leak. Look for update to beta that fixes this and install update. (See: www.splunk.com/base/forum:SplunkGeneral While this sounds like it could only be indirectly related to our issue, it does sound close enough and is the only official word on splunk's memory usage that I could find:[1]
    When forwarding cooked data you may see the memory usage spike and kill the splunkd process. This should be fixed for beta 3.
    So, waiting for the next beta or later sounds like the best bet. I'm wary of running beta software on einstein, anyhow.

Ongoing

  • Clean up 202
    • Figure out what's worth keeping
    • Figure out what doesn't belong here
  • Take a look at spamassassin - Improve Performance of this if possible.
  • Test unknown equipment:
    • UPS
  • Eventually one day come up with a plan to deal with pauli2's kernel issue
  • Maybe we should netboot all the machines. As steve pointed out, we've already got roaming profiles, next step from that is full netbooting. Something to think about.

Completed

  • Get ahold of a multimeter so we can test supplies and cables. Got a tiny portable one.
  • Order new power supply for Taro Ordered from Newegg 5/24/2007
  • Weed out unneccesary cables, we don't need a full box of ata33 and another of ata66. Consolodated to one box
  • Installed new power supply for Taro.
  • Get printer (Myriad) working.
  • Set up skype.
  • Fix sound on improv so we can use skype and music. sound was set to alsa, setting to OSS fixed it. should have worked either way though.
  • Check out 250gb sata drive found in old maxtor box. Clicks/buzzes when powered up.
  • Look into upgrades/patches for all our systems. Scheduled critical updates to be applied to all machines but einstein. If they don't break the other machines, like they didn't break the first few, I'll apply them to einstein too.
  • Started download of fedora 7 for the black computer. 520KB/s downstream? Wow.
  • Get Pauli computers up. I think they're up. They're plugged in and networked.
  • Find out what the black computer's name is! We called it blackbody.
  • "blackbody" is Currently online. For some reason, grub wasn't properly installed in the MBR by the installer.
  • Consolodate backups to get a drive for gourd. Wrote a little script to get the amanda backup stuff into a usable state, taking up WAY less space.
  • Submitted DNS request for blackbody. blackbody.unh.edu is now online
  • Label hard disks Labeled all that we knew what they were!
  • Labeled machines' farm Ethernet ports
  • Made RHEL5 server discs.
  • Replaced failed drive in Gourd - 251gb maxtor sata. Apparently the WD drives are 250, maxtor are 251.
  • Repair local network connection on Gourd.
  • Repair LDAP on Gourd (probably caused by net connection). Replacing the drive fixed every gourd problem!! Seems to have been related to the lack of space on the RAID when a disk was missing. If not that, IIAM (It is a mystery).
  • New combo set on door for 202.
  • Tested old PSU's, network cables, and fans.
  • All machines now report to rhn. None of them pull scheduled updates though. Client-side issue?
  • Documentation for networking - we have no idea which config files are actually doing anything on some machines. Pretty much figured out, but improv and ennui still aren't reachable via farm IPs. But considering the success of getting blackbody set up from scratch, it seems that we know enough to maintain the machines that are actually part of the farm.
  • Make network devices eth0 and eth0.2 on all machines self-documenting by giving them aliases "farm" and "unh" jalapeno and tomato remain
  • Figure out why pauli nodes don't like us. (Low-priority!) Aside from pauli2's kernel issue this is taken care of
  • Learned how to use the netboot stuff on tomato. Why aren't we using this for the paulis anymore?
  • Set up the nuclear.unh.edu web pages to serve up the very old but working pages instead of the new but broken XML ones.