Completed in January 2008

From Nuclear Physics Group Documentation Pages
Jump to navigationJump to search
  • Corn Virtualization issues resolved!
    • Subscription is now a "virtual subscription"
    • Corn now has 2 ethernets, one to farm one to unh, and resolves "einstein" to
    • All disks are now mountable.
  • Properly configure iptables on corn and Pumpkin
    • Copy /usr/local/bin/ from taro (requires perl-LDAP RPM). Copy /etc/init.d/iptables-netgroups, make sure it starts for run level 3 and 5.
    • On pumpkin, make sure the guest OSes don't get blocked by pumpkin's iptables.
    • I edited the iptables-npg on pumpkin to be far more rational. The version on Roentgen has far too many ports open for a standard server
  • Properly configure access restrictions to "farm" and root login only from einstein.
  • Lentil Backup issue resolved!
    • The cron job is mailing this message: "archive disk '/mnt/npg-daily-current' does not exist or is not a symlink at /usr/local/bin/ line 44, line 1." That link exists, though. I can see it and its contents as a regular user; why can't the script when run by cron? The e-mail may have been outdated … today's was a successful listing. It is fixed, hooray. We just need to fix the ssh keys for tomato and corn so they can be connected to. Lentil now knows all the ssh keys of everyone, made as human-readable as possible.
    • First of all Do NOT use disks smaller than 350 GB for backup!!, since those will not even fit one copy of what needs to be backed up.
    • The link /mnt/npg-daily-current must exist and point to an actual drive.
    • Old entry: Lentil's not doing backups. I tried manually runing the script friday afternoon and the email log looks like it was backing up and stopped for no real reason. Checking the space on the drives (since the script seems unable to do so now), I found that npg-daily/28 is basically full, and npg-daily/29 is an untouched 250gb. Maybe an update screwed around with how the script checks free space, preventing it from knowing how to move to the next drive. It's probably not any update - lentil was working fine until "The Friday Taro Event". I manually made the new symbolic link from /mnt/npg-daily-current -> /mnt/npg-daily/29 . Maybe this'll fix it? That seems to be a no. Lots of unable to make hard link errors, invalid cross-device link, and similar errors. It needs to know to copy the data it's backing up to the disk since it's a new disk. I still think it's got something to do with that unable to statfs error.
  • New pumpkin network problems: It's possible to reach the farm subnet if pumpkin is booted without starting iptables. Double-check the configs. The problem was that iptables was getting its config from both /etc/iptables and /etc/iptables-npg. Since pepper doesn't have /etc/iptables, I just moved it to /etc/iptables.bak and voilà: everything works.
  • benfranklin is apparently up and running somewhere, because it's reporting drive issues too: Benfranklin is Dan's workstation, it's in the room next to Maurik's office. BenFranklin is a Pentium III "Coppermine" at 800MHz. I have ordered a replacement system already, so we can decommission the old BenFranklin. The new BF has arrived.
  • Try to pull as much data from Jim William's old drives as possible, if there's even anything on them. They seem dead. Maybe we can swap one board to the other drive and see if it works? What room is he in? His computer is working now (the ethernet devices will have to be changed to a non-farm setup once the machine is back in his office). The computer is delivered, and he says everything's back. Leads me to believe that all his data wasn't on his drives, but on his home directory. Those drives can be junked now.
  • At some point, cacti stopped being able to monitor einstein. Update-related? There are no errors in cacti.log, but the status page for einstein just says "down". Cacti was set to use the wrong version of rrdtool.
  • Added Steve and Matt to the environmental mailing list. There seems to be a problem with more than 1 cc recipient, so to get around this the monitor sends to Tested and it works.
  • Install the right SNMP stuff on tomato so that it can be graphed
  • Service snmpd won't start on okra ("Starting snmpd: /usr/sbin/snmpd: error while loading shared libraries: cannot enable executable stack as shared object requires: Permission denied", supposedly SELinux-related). It was SELinux. Fixed!
  • Taro has become unstable again when running multi-processor. Try another Power supply. If that is not it, give up? Put the new one in, passed memtest, and booted smp. Let's see how it handles itself! -- Note: The /data disk on Taro is still read only!
  • Lentil has a dead disk ("hde0", probably IDE) in its RAID1. It needs replaced. Had a spare Seagate sitting around of just the right size.
  • Heiseinberg dropped pauli off today. Says it's his power supply. Very low priority. Gave it jalapeno's old power suplly and got rid of its broken fans and it seems to work fine