Difference between revisions of "Pumpkin"

From Nuclear Physics Group Documentation Pages
Jump to navigationJump to search
Line 28: Line 28:
  
 
= Network Configuration =
 
= Network Configuration =
*IP Address Farm 1: 10.0.0.242 (eth0)
+
*IP Address Farm: 10.0.0.243 (eth0)
*IP Address Farm 2: 10.0.0.243 (eth1)
+
<em>ifcfg-farm</em>
*IP Address UNH: 132.177.88.228 (eth2)
+
  DEVICE=eth0
 +
  BOOTPROTO=none
 +
  ONBOOT=yes
 +
  HWADDR=00:e0:81:79:50:bd
 +
  TYPE=Ethernet
 +
  USERCTL=no
 +
  IPV6INIT=no
 +
  PEERDNS=yes
 +
  NETMASK=255.255.255.0
 +
  IPADDR=10.0.0.243
 +
 
 +
*IP Address UNH: 132.177.88.228 (eth1)
 +
<em>ifcfg-unh</em>
 +
  DEVICE=eth1
 +
  BOOTPROTO=none
 +
  ONBOOT=yes
 +
  DHCP_HOSTNAME=pumpkin.unh.edu
 +
  TYPE=Ethernet
 +
  IPADDR=132.177.88.228
 +
  NETMASK=255.255.252.0
 +
  GATEWAY=132.177.88.1
 +
  USERCTL=no
 +
  IPV6INIT=no
 +
  PEERDNS=yes
 +
  HWADDR=00:e0:81:79:50:bf
 +
 
 
*IP Address RAID: 10.0.0.99 [http://10.0.0.99]
 
*IP Address RAID: 10.0.0.99 [http://10.0.0.99]
 +
Not sure about this
  
 
=Software and Services=
 
=Software and Services=

Revision as of 18:21, 30 May 2012

Pumpkin is our new 8 CPU 24 disk monster machine. It is really, really nice. Pumpkin used to have Xen running, but now it is just running the standard 64-bit version of RHEL 5.3.

Pumpkins

Hardware Details

  • Microway Quote # MWYQ9518-03
  • Sales contact: Eliot Eshelman
  • Microway 5U 4-Way Opteron Server with up to 24 Drives
  • 5U Storage Chassis with 24 SAS/SATA-II Hot-Swap Drive Bays with SATA Multilane Backplane (I think it is a Chenbro case)
  • 1350 Watt Hot Swap Redundant Power Supply
  • Microway Navion-T (TM) Quad Opteron Motherboard (Tyan S4985):
    • Four sockets for Socket F 8000 series processors
    • Nvidia nForce Pro 2200 + 2050
    • Four banks of memory (16 DIMM slots)
    • Supports up to 64GB of DDR2-667 memory
    • Two x16 PCI Express, Two x4 PCI Express,
    • One PCI 32 bit expansion slots
    • Integrated dual Marvell 88E1111 GbE ports
    • Integrated Intel 82541Pl GbE Port
    • SIS/Xabre Integrated Graphics 16MB
    • Integrated SATA-2 Controller (8 ports)
  • 4x AMD Dual Core Socket F Opteron 8222 3.0 GHz, 1 MB Cache / core, 95 watts
  • 8x 2GB DDR2 667 MHz ECC/Registered Memory
  • 16x 750 GB Seagate Barracuda ES Nearline SATA/300 ST3750640NS 16MB Cache, 3Gb/s, NCQ, 7200rpm, 1.2 million hours MTBF
  • Areca ARC-1280 24 port SATA II Raid - PCI Express x8
  • Areca ARC-6120 Battery Backup
  • 6x Mini-SAS to ML backplane Cable .5M - 4 SATA Drives
  • Pioneer DVR-112 Dual Layer DVD/CD writer Internal (Black) 18x write DVD-R/+R, 10x write Dual Layer DVD-R/+R
  • Tyan M3291 IPMI card (REMOVED)

Network Configuration

  • IP Address Farm: 10.0.0.243 (eth0)

ifcfg-farm

 DEVICE=eth0
 BOOTPROTO=none
 ONBOOT=yes
 HWADDR=00:e0:81:79:50:bd
 TYPE=Ethernet
 USERCTL=no
 IPV6INIT=no
 PEERDNS=yes
 NETMASK=255.255.255.0
 IPADDR=10.0.0.243
  • IP Address UNH: 132.177.88.228 (eth1)

ifcfg-unh

 DEVICE=eth1
 BOOTPROTO=none
 ONBOOT=yes
 DHCP_HOSTNAME=pumpkin.unh.edu
 TYPE=Ethernet
 IPADDR=132.177.88.228
 NETMASK=255.255.252.0
 GATEWAY=132.177.88.1
 USERCTL=no
 IPV6INIT=no
 PEERDNS=yes
 HWADDR=00:e0:81:79:50:bf
  • IP Address RAID: 10.0.0.99 [1]

Not sure about this

Software and Services

IPTables

Pumpkin uses the standard iptables configuration. Note that eth0 and eth1 are both farm ports and should be configured accordingly.

Splunk

Pumpkin is the master Splunk node and stores all of the splunk data in /data1/splunk. If you want to access the Splunk web interface it is at https://pumpkin.unh.edu:8000 (if you're connected via the Farm), or you can forward port 8000 over SSH.

NFS

Pumpkin shares two data stores (/data1 and /data2) over NFS. They can be accessed at /net/data/pumpkin from any machine.

/etc/exports

/data1	@servers(rw,sync) @npg_clients(rw,sync) \
	10.0.0.0/24(rw,no_root_squash,sync)
/data2	@servers(rw,sync) @npg_clients(rw,sync) \
	10.0.0.0/24(rw,no_root_squash,sync)

RAID

The RAID is currently split. This allows for much easier maintenance and, in the future, possible upgrades.

Disk 1 to 11
RAID Set 0, which holds the RAID Volumes: System (300GB, RAID6, SCSI:0.0.0), System1(300GB, RAID6, SCSI:0.0.1), Data1 (6833GB, RAID5, SCSI:0.0.2)
Disk 11 to 22
RAID Set 1, which holds the RAID Volume: Data2 (7499GB, RAID5, SCSI:0.0.3)
Disk 23 and 24
Passthrough (single disks) at SCSI:0.0.6 and SCSI:0.0.7. These can be used as spares, as backup, or to expand the other RAID sets later on. Currently they are seen as /dev/sde* and /dev/sdf*. /dev/sdf and /dev/sde are currently used for Virtual Systems.

The RAID card can be monitored at http://10.0.0.99/ login as "admin" with a password that is described on the RAID page.

  • To use this card with Linux you need a driver: arcmsr. This must be part of the initrd for the kernel, else you cannot boot from the RAID. You can also install from the CDs, if you have a driver floppy. It will then add the arcmsr driver into the initrd for you. You will still always need to have this driver!
  • The kernel module can be built from the sources located on /dev/sdf in /usr/src/kernels/Acera_RAID. Just run make.

There exists a temporary drive which holds a RHEL5 distro and the original RHEL4 distro from the manufacturer. It is currently disconnected from pumpkin.


Removing Xen

1. Install the latest version of kernel that is not xen, you might have to do a yum search kernel to find the specific name of a non-xen kernel and install it.

2. Now remove the kernel-xen with this command and make sure to enter the second one.

 yum groupremove Virtualization
 yum remove kernel-xenU

3. Then restart with the normal non-xen kernel and proceed to do upgrades with yum update command.

To Do

  • There appears to be a bug in the driver (forcedeth) for the Nvidia Ethernet devices (eth0 and eth2). It seems to occasionally cause the network interface to freeze up during large nfs transfers. The bug report suggests that it can be fixed a fix by adding this option to the modprobe.conf file:
options forcedeth max_interrupt_work=10

The value given seems to vary from one discussion to the next. Some say they still get errors at 10 and 15, but not at 20. We should investigate this potential fix to see if it solves the issues with the network interfaces freezing up. For now the farm interface is set to the device with the Intel chipset (eth1) since most large data transfers happen over the back end. There is also a second farm interface set up (eth0 at 10.0.0.242) in case one goes down. This should mitigate the problem in the short term.

  • There must be other things....

Done

    • Pumpkin's iptables seem messed up after this morning's (1/8/2008) GRUB trouble. With the old config (pepper's), iptables wouldn't let anything in at all, it seemed (specifically things like pingbacks, LDAP…). I've copied roentgen's /etc/sysconfig/iptables-npg to pumpkin for now, and everything seems to be working as usual. Previously it had a copy of pepper's, and pepper works, so I wonder what the real problem is. This was the wrong iptables!. I fixed it with a new set.
  • Sane iptables using ldap.
  • Setup ethernet.
  • Setup RAID volumes.
  • Setup partitions and create file systems.
  • Move the system to System drive and remove the current temp drive.
  • Setup mount points for the data drives.
  • Setup LDAP for users to log in.
  • Setup Exports, so other systems can see the drives. There were issues with firewall, so I modeled the firewall after taro's. Seems to be working, I can successfully ls /net/data/pumpkin1 and ls /net/data/pumpkin2 on einstein.
  • Setup autofs so that it can see other drives. What other drives? It's working for einstein:/home Other drives such as data drives
  • Setup smartd so we will know when a disk is going bad. This can be done inside the RAID card using a system to send SNMP and EMAIL. but it needs to be done. E-mail seems to be set up, let's see if we get any through npg-admins
  • Restrict access (/etc/security/access.conf)
  • Setup sudo on both pumpkin and corn.
  • Add the new systems to the lentil backup script. They're on there; lentil just needs the right SSH keys to rsync them.
  • Setup SNMP for cacti monitoring.
  • Setup sensors.