Difference between revisions of "Pumpkin"

From Nuclear Physics Group Documentation Pages
Jump to navigationJump to search
 
(40 intermediate revisions by 6 users not shown)
Line 1: Line 1:
= Pumpkin =
+
Pumpkin is a 24 disk large storage system. It runs CENTOS 7.
Pumpkin is our new 8 CPU 24 disk monster machine. It is really, really nice.
+
=Hardware =
 +
* 5U Storage Chassis with 24 SAS/SATA-II Hot-Swap Drive Bays with SATA Multilane Backplane (I think it is a Chenbro case)
 +
* 1350 Watt Hot Swap Redundant Power Supply
 +
* [[Areca]] ARC-1280 24 port SATA II Raid - PCI Express x8 -- Address: 10.0.0.199
 +
* Areca ARC-6120 Battery Backup
 +
* 6x Mini-SAS to ML backplane Cable .5M - 4 SATA Drives
 +
* Pioneer DVR-112 Dual Layer DVD/CD writer Internal (Black) 18x write DVD-R/+R, 10x write Dual Layer DVD-R/+R
  
== Basic Setup ==  
+
== Old Hardware ==
We run Xen on this so that it has two RHEL5 personalies: Pumpkin, 64-bit, and Corn, 32-bit.
+
Pumpkin is our new 8 CPU 24 disk monster machine. It is really, really nice. Pumpkin runs Xen. It is 64-but CENTOS-7
* [http://www.cl.cam.ac.uk/research/srg/netos/xen/readmes/user/user.html Xen project documentation]
+
[[Image:pumpkins.jpg|thumb|200px|Pumpkins]]
* [http://www.redhat.com/rhel/virtualization/ RHEL virtualization front page]
+
=== Old Hardware Details ===
* These may also come in handy:  
+
* Microway Quote # MWYQ9518-03  purchased 10/22/2007 for $18260.
** [http://www.linuxtopia.org/online_books/rhel5/rhel5_xen_virtualization/rhel5_virt-install-wizard.html Creating a VM, by Linuxtopia]
+
* Sales contact: Eliot Eshelman
** [http://www.redhat.com/docs/manuals/enterprise/RHEL-5-manual/en-US/RHEL510/Virtualization_Guide/index.html RHEL Virtualization Guide]
+
* Microway 5U 4-Way Opteron Server with up to 24 Drives
The RAID is currently split. This allows for much easier maintenance and, in the future, possible upgrades.
+
* 5U Storage Chassis with 24 SAS/SATA-II Hot-Swap Drive Bays with SATA Multilane Backplane (I think it is a Chenbro case)
; Disk 1 to 11 : RAID Set 0, which holds the RAID Volumes: System (300GB, RAID6, SCSI:0.0.0), System1(300GB, RAID6, SCSI:0.0.1), Data1 (6833GB, RAID5, SCSI:0.0.2)
+
* 1350 Watt Hot Swap Redundant Power Supply
; Disk 11 to 22 : RAID Set 1, which holds the RAID Volume: Data2 (7499GB, RAID5, SCSI:0.0.3)
+
* Microway Navion-T (TM) Quad Opteron Motherboard ([http://www.tyan.com/product_board_detail.aspx?pid=271 Tyan S4985]):
; Disk 23 and 24 : Passthrough (single disks) at SCSI:0.0.6 and SCSI:0.0.7. These can be used as spares, as backup, or to expand the other RAID sets later on. Currently they are seen as /dev/sde* and /dev/sdf*. /dev/sdf1 and /dev/sdf2 hold the old RHEL4 install the system came with.
+
** Four sockets for Socket F 8000 series processors
The RAID card can be monitored at http://10.0.0.99/ login as "admin" with a password that is the same as the door combo.
+
** Nvidia nForce Pro 2200 + 2050
* To use this card with Linux you need a driver: arcmsr. This '''must be part of the initrd''' for the kernel, else you cannot boot from the RAID. You can also install from the CDs, if you have a driver floppy. It will then add the arcmsr driver into the initrd for you. You will still '''always need to have this driver!'''
+
** Four banks of memory (16 DIMM slots)  
* The kernel module can be built from the sources located on /dev/sdf in ''/usr/src/kernels/Acera_RAID''. Just run make.
+
** Supports up to 64GB of DDR2-667 memory
There exists a temporary drive which holds a RHEL5 distro and the original RHEL4 distro from the manufacturer. It is currently disconnected from pumpkin. This drive was mirrored to /dev/sdf*, and /dev/sde{1,2} and also has an old RHEL5 distro. We can try to use these as temporary drives for cloning other systems.
+
** Two x16 PCI Express, Two x4 PCI Express,  
 +
** One PCI 32 bit expansion slots
 +
** Integrated dual Marvell 88E1111 GbE ports
 +
** Integrated Intel 82541Pl GbE Port
 +
** SIS/Xabre Integrated Graphics 16MB
 +
** Integrated SATA-2 Controller (8 ports)  
 +
* 4x AMD Dual Core Socket F Opteron 8222 3.0 GHz, 1 MB Cache / core, 95 watts
 +
* 8x 2GB DDR2 667 MHz ECC/Registered  Memory
 +
*16x 750 GB Seagate Barracuda ES Nearline SATA/300  ST3750640NS 16MB Cache, 3Gb/s, NCQ, 7200rpm, 1.2 million hours MTBF
 +
* [[Areca]] ARC-1280 24 port SATA II Raid - PCI Express x8
 +
* Areca ARC-6120 Battery Backup
 +
* 6x Mini-SAS to ML backplane Cable .5M - 4 SATA Drives
 +
* Pioneer DVR-112 Dual Layer DVD/CD writer Internal (Black) 18x write DVD-R/+R, 10x write Dual Layer DVD-R/+R
 +
* [http://www.tyan.com/product_accessories_spec.aspx?pid=5 Tyan M3291] IPMI card (REMOVED)
  
== Virtual Host: Corn ==
+
= Network Configuration =
We run a '''32-bit''' personality "corn" using the Xen virtualization on pumpkin's /dev/sdb. Corn is a para-virtualized RHEL5 system, with pumpkin as the master host, or "domain0". It is a fully separate system (that could be booted as the main system with a few modifications to config files. Hint: don't do that!). This means that any system stuff installed on Pumpkin needs to be installed on corn separately.
 
  
Subscription Issue: A virtual host needs to be setup special. See the [http://kbase.redhat.com/faq/FAQ_103_10754.shtm RedHat documentation]. Both host and guest need rhn-virtualization-common and rhn-virtualization-host installed. This is now fixed.
+
The network card eth2 no longer works, and is unused. <== needs  verification:
 +
    4: wlp6s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
 +
    link/ether a4:c4:94:1f:6b:cf brd ff:ff:ff:ff:ff:ff
  
The virtual host needs to have both ethernets bridged. According to [http://wiki.xensource.com/xenwiki/XenNetworking Xen wiki], this is done by modifying the ''/etc/xen/scripts/network-bridge'' script, which is now network-bridge-two which calls the original twice. For the host, create two interfaces: first the one to xenbr1 and then the one to xenbr0, so that the first one ends up being eth1 and the second one eth0. Yes, it seems backwards, but it now works. The key is to have the lines
+
*IP Address Farm: 10.0.0.243 (farm)
alias eth0 xennet
+
*IP Address UNH: 132.177.88.228 (unh)
alias eth1 xennet
+
*IP Address RAID: 10.0.0.99 [http://10.0.0.99]
in the /etc/modprobe.conf file. This is now working.
 
  
== Creating a new Virtual Host from a previous installed disk. ==
+
=Software and Services=
It seems one can do the following to create a clone of a physical system as a virtual host. <font color="red">'''This still needs to be tested better!'''</font>
 
  
* Stick the disk with the operating system on it in slot 23 or 24.
+
==IPTables==
* Create a new fully virtualized host:
 
** In virtual machine manager, click create new.
 
** Choose a name, as in: VHost23_sde
 
** Choose for a fully virtualized host, this allows more flexibility for the kernels etc.
 
** Choose to install from the image at /data1/. This is a RHEL5 image. Choose the operating system version.
 
** Choose the disk: /dev/sde or /dev/sdf
 
** Set ethernet to xenbr1
 
** Set memory (probably 1024) and cpus (probably 2)
 
** Save config (click finish)
 
* You can now boot the virtual system. (It will do this automatically.) When prompted '''Do not install a new system''' (idiot!) instead type "linux rescue".
 
* When the rescue boots, it will look for installed operating systems.
 
* From the rescue console you can figure out what the hardware signature is. Now some files on the operating system need to be adapted to the new (temporary) hardware signature. You could also do this ahead of time by mounting the disks on pumpkin and modifying the files there.
 
Backup all files you modify to a *_Physical version, so you can undo this before sticking it back into the physical system. Keep track of your changes on the wiki!
 
  
Hard disk will be /dev/hda*  ==> Modify grub.conf (or use LABEL=ROOT and label your partition), with LVM systems you should be OK.
+
Pumpkin uses the standard [[iptables]] configuration.
                              ++> A problem with labels is that they don't "stick". It seems that if you made the label while the
+
 
                              ++> disk was mounted on the host, it is not seen while mounted on the guest. You need to do these
+
==Splunk==
                              ++> things from the guest operating system. This is also true for installing (re-installing) grub.
+
 
+
Pumpkin is the master [[Splunk]] node and stores all of the splunk data in /data1/splunk. If you want to access the Splunk web interface it is at https://pumpkin.unh.edu:8000 (if you're connected via the Farm), or you can forward port 8000 over SSH.
                              ==> Modify /etc/fstab (probably change /dev/sda* to /dev/hda*)
+
 
 +
==[[NFS]]==
 +
 
 +
Pumpkin shares two data stores (/data and /scratch) over [[NFS]]. They can be accessed at /net/data/pumpkin and /net/data/scratch from any machine.
 +
 
 +
= RAID =
 +
The RAID is currently split.
 +
; Disk 1 to 17 @ 4TB
 +
* Disk 17 is hot spare
 +
* RAID Set #00
 +
* Volume: data lun(0/0/0) is RAID6 = 56 TB
 +
* Translates to /dev/sda1 mounted on /data
 +
; Disk 19 to 24 @ 750 GB
 +
* RAID Set #01
 +
* Volume: scratch lun(0/0/1) is RAID0 = 4.5 TB
 +
* Translates to /dev/sdb1 mounted on /scratch
 +
; Disk18 is N/A 
 +
 
 +
The RAID card can be monitored at http://10.0.0.99/ login as "admin" with a password that is described on the [[RAID]] page.
  
Ethernet                      ==> Modify /etc/modules.conf or /etc/modprobe.conf and alias eth0 8139cp (REALTEK 8139cp driver) '''Needs test'''
+
== Software RAID ==
                              ==> Same for eth1
+
There are 2 internal drives in Pumpkin forming a software RAID.
* Exit the console, or shutdown the machine. Now add another ethernet card to the config, hooking this up to xenbr0
+
Note these are *not identical* partitions on the drives, so the RAID does not look symmetrical. Blame the installer.
* Restart your virtual system. '''Make sure all VM disks are unmounted from pumpkin'''
 
  
If you need to ''fix'' things, or poke around, and want to boot from the iso image again, change the '''disk''' line in the /etc/xen configuration file to:
+
Model: ATA WDC WD7500AAKS-0 (scsi)
  boot="d"
+
Disk /dev/sdc: 750GB
  disk = [ 'phy:/dev/sdf,hda,w','file:/data1/rhel-5.1.-server-i386-dvd.iso,hdc:cdrom,r']    
+
Sector size (logical/physical): 512B/512B
 +
Partition Table: gpt
 +
Disk Flags:
 +
 +
Number  Start  End    Size    File system  Name                  Flags
 +
  1      1049kB  79.7MB  78.6MB  fat16        EFI System Partition boot
 +
  2      79.7MB 300GB  300GB                                      raid
 +
  3      300GB  550GB  250GB                                      raid
 +
  4      550GB  650GB  100GB                                      raid
 +
  5      650GB  718GB  67.5GB                                    raid
 +
  6      718GB  750GB  32.0GB                                    lvm
 +
  7      750GB  750GB  251MB  ext4                              raid
 +
 +
Model: ATA WDC WD7500AAKS-0 (scsi)
 +
Disk /dev/sdd: 750GB
 +
Sector size (logical/physical): 512B/512B
 +
Partition Table: gpt
 +
Disk Flags:
 +
 +
Number  Start  End    Size    File system  Name  Flags
 +
  1     1049kB  300GB  300GB                      raid
 +
  2      300GB  550GB  250GB                      raid
 +
  3      550GB  650GB  100GB                      raid
 +
  4      650GB  718GB  67.5GB                    raid
 +
  5      718GB  750GB  32.0GB                    lvm
 +
  6      750GB  750GB  251MB   ext4              raid
 
   
 
   
Then run "xm create hvmconfig_file" to load your changes and boot the new config. It will boot from the cdrom image.
 
This seems to work! Remember to change ''boot="d"'' back to ''boot="c"'' to boot from your disk.
 
  
 +
Personalities : [raid1]
 +
md123 : active raid1 sdc5[0] sdd4[1]
 +
      65853440 blocks super 1.2 [2/2] [UU]
 +
      bitmap: 0/1 pages [0KB], 65536KB chunk
 +
 +
md124 : active raid1 sdc4[0] sdd3[1]
 +
      97655808 blocks super 1.2 [2/2] [UU]
 +
      bitmap: 0/1 pages [0KB], 65536KB chunk
 +
 
 +
md125 : active raid1 sdc2[0] sdd1[1]
 +
      292968448 blocks super 1.2 [2/2] [UU]
 +
      bitmap: 2/3 pages [8KB], 65536KB chunk
 +
 +
md126 : active raid1 sdc3[0] sdd2[1]
 +
      244140032 blocks super 1.2 [2/2] [UU]
 +
      bitmap: 0/2 pages [0KB], 65536KB chunk
 +
 +
md127 : active raid1 sdc7[0] sdd6[1]
 +
      244672 blocks super 1.0 [2/2] [UU]
 +
      bitmap: 0/1 pages [0KB], 65536KB chunk
  
For the "FullVirt24_sdf" system I ran into an difficult problem: the initrd was no good. There was no way to "repair" this, since it needs a booted system to create a new initrd. I guess I could get an initrd from elsewhere and put that on the /boot with matching kernel. Instead I decided to reinstall RHEL4, calling this sytstem "Landau".
+
Filesystem      Size  Used Avail Use% Mounted on
 +
/dev/md123      62G  85M  59G  1% /kvm
 +
/dev/md124      92G  527M  87G  1% /var
 +
/dev/md125      275G  24G  238G  9% /
 +
/dev/md126      230G  4.7G  213G  3% /usr
 +
/dev/md127      228M  224M    0 100% /boot
 +
/dev/sdc1        75M  9.4M  66M  13% /boot/efi
 +
/dev/sda1        51T  30T  22T  58% /data
 +
/dev/sdb1      4.1T  89M  4.1T  1% /scratch
  
== To Do ==
+
= To Do =
* Setup SNMP for cacti monitoring.
 
* Add the new systems to the lentil backup script
 
* There must be other things....
 
* Setup sensors so that we can monitor the system. '''Will have to wait for a kernel that supports it'''
 
  
== Done ==
+
= Done =
* Sane iptables using ldap.
 
* Setup ethernet.
 
* Setup RAID volumes.
 
* Setup partitions and create file systems.
 
* Move the system to System drive and remove the current temp drive.
 
* Setup mount points for the data drives.
 
* Setup LDAP for users to log in.
 
* Setup [[Exports]], so other systems can see the drives. '''There were issues with firewall, so I modeled the firewall after taro's.''' Seems to be working, I can successfully <code>ls /net/data/pumpkin1</code> and <code>ls /net/data/pumpkin2</code> on einstein.
 
* Setup autofs so that it can see other drives. '''What other drives? It's working for einstein:/home''' Other drives such as data drives
 
* Setup [[smartd]] so we will know when a disk is going bad. '''This can be done inside the RAID card''' using a system to send SNMP and EMAIL. but it needs to be done. '''E-mail seems to be set up, let's see if we get any through npg-admins'''
 
* Restrict access  (/etc/security/access.conf)
 
* Setup sudo on both pumpkin and corn.
 

Latest revision as of 20:34, 3 January 2018

Pumpkin is a 24 disk large storage system. It runs CENTOS 7.

Hardware

  • 5U Storage Chassis with 24 SAS/SATA-II Hot-Swap Drive Bays with SATA Multilane Backplane (I think it is a Chenbro case)
  • 1350 Watt Hot Swap Redundant Power Supply
  • Areca ARC-1280 24 port SATA II Raid - PCI Express x8 -- Address: 10.0.0.199
  • Areca ARC-6120 Battery Backup
  • 6x Mini-SAS to ML backplane Cable .5M - 4 SATA Drives
  • Pioneer DVR-112 Dual Layer DVD/CD writer Internal (Black) 18x write DVD-R/+R, 10x write Dual Layer DVD-R/+R

Old Hardware

Pumpkin is our new 8 CPU 24 disk monster machine. It is really, really nice. Pumpkin runs Xen. It is 64-but CENTOS-7

Pumpkins

Old Hardware Details

  • Microway Quote # MWYQ9518-03 purchased 10/22/2007 for $18260.
  • Sales contact: Eliot Eshelman
  • Microway 5U 4-Way Opteron Server with up to 24 Drives
  • 5U Storage Chassis with 24 SAS/SATA-II Hot-Swap Drive Bays with SATA Multilane Backplane (I think it is a Chenbro case)
  • 1350 Watt Hot Swap Redundant Power Supply
  • Microway Navion-T (TM) Quad Opteron Motherboard (Tyan S4985):
    • Four sockets for Socket F 8000 series processors
    • Nvidia nForce Pro 2200 + 2050
    • Four banks of memory (16 DIMM slots)
    • Supports up to 64GB of DDR2-667 memory
    • Two x16 PCI Express, Two x4 PCI Express,
    • One PCI 32 bit expansion slots
    • Integrated dual Marvell 88E1111 GbE ports
    • Integrated Intel 82541Pl GbE Port
    • SIS/Xabre Integrated Graphics 16MB
    • Integrated SATA-2 Controller (8 ports)
  • 4x AMD Dual Core Socket F Opteron 8222 3.0 GHz, 1 MB Cache / core, 95 watts
  • 8x 2GB DDR2 667 MHz ECC/Registered Memory
  • 16x 750 GB Seagate Barracuda ES Nearline SATA/300 ST3750640NS 16MB Cache, 3Gb/s, NCQ, 7200rpm, 1.2 million hours MTBF
  • Areca ARC-1280 24 port SATA II Raid - PCI Express x8
  • Areca ARC-6120 Battery Backup
  • 6x Mini-SAS to ML backplane Cable .5M - 4 SATA Drives
  • Pioneer DVR-112 Dual Layer DVD/CD writer Internal (Black) 18x write DVD-R/+R, 10x write Dual Layer DVD-R/+R
  • Tyan M3291 IPMI card (REMOVED)

Network Configuration

The network card eth2 no longer works, and is unused. <== needs verification:

   4: wlp6s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
   link/ether a4:c4:94:1f:6b:cf brd ff:ff:ff:ff:ff:ff
  • IP Address Farm: 10.0.0.243 (farm)
  • IP Address UNH: 132.177.88.228 (unh)
  • IP Address RAID: 10.0.0.99 [1]

Software and Services

IPTables

Pumpkin uses the standard iptables configuration.

Splunk

Pumpkin is the master Splunk node and stores all of the splunk data in /data1/splunk. If you want to access the Splunk web interface it is at https://pumpkin.unh.edu:8000 (if you're connected via the Farm), or you can forward port 8000 over SSH.

NFS

Pumpkin shares two data stores (/data and /scratch) over NFS. They can be accessed at /net/data/pumpkin and /net/data/scratch from any machine.

RAID

The RAID is currently split.

Disk 1 to 17 @ 4TB
  • Disk 17 is hot spare
  • RAID Set #00
  • Volume: data lun(0/0/0) is RAID6 = 56 TB
  • Translates to /dev/sda1 mounted on /data
Disk 19 to 24 @ 750 GB
  • RAID Set #01
  • Volume: scratch lun(0/0/1) is RAID0 = 4.5 TB
  • Translates to /dev/sdb1 mounted on /scratch
Disk18 is N/A

The RAID card can be monitored at http://10.0.0.99/ login as "admin" with a password that is described on the RAID page.

Software RAID

There are 2 internal drives in Pumpkin forming a software RAID. Note these are *not identical* partitions on the drives, so the RAID does not look symmetrical. Blame the installer.

Model: ATA WDC WD7500AAKS-0 (scsi)
Disk /dev/sdc: 750GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name                  Flags
 1      1049kB  79.7MB  78.6MB  fat16        EFI System Partition  boot
 2      79.7MB  300GB   300GB                                      raid
 3      300GB   550GB   250GB                                      raid
 4      550GB   650GB   100GB                                      raid
 5      650GB   718GB   67.5GB                                     raid
 6      718GB   750GB   32.0GB                                     lvm
 7      750GB   750GB   251MB   ext4                               raid

Model: ATA WDC WD7500AAKS-0 (scsi)
Disk /dev/sdd: 750GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End    Size    File system  Name  Flags
 1      1049kB  300GB  300GB                      raid
 2      300GB   550GB  250GB                      raid
 3      550GB   650GB  100GB                      raid
 4      650GB   718GB  67.5GB                     raid
 5      718GB   750GB  32.0GB                     lvm
 6      750GB   750GB  251MB   ext4               raid

Personalities : [raid1]
md123 : active raid1 sdc5[0] sdd4[1]
      65853440 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md124 : active raid1 sdc4[0] sdd3[1]
      97655808 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk
 
md125 : active raid1 sdc2[0] sdd1[1]
      292968448 blocks super 1.2 [2/2] [UU]
      bitmap: 2/3 pages [8KB], 65536KB chunk 

md126 : active raid1 sdc3[0] sdd2[1]
      244140032 blocks super 1.2 [2/2] [UU]
      bitmap: 0/2 pages [0KB], 65536KB chunk

md127 : active raid1 sdc7[0] sdd6[1]
      244672 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk
Filesystem      Size  Used Avail Use% Mounted on
/dev/md123       62G   85M   59G   1% /kvm
/dev/md124       92G  527M   87G   1% /var
/dev/md125      275G   24G  238G   9% /
/dev/md126      230G  4.7G  213G   3% /usr
/dev/md127      228M  224M     0 100% /boot
/dev/sdc1        75M  9.4M   66M  13% /boot/efi
/dev/sda1        51T   30T   22T  58% /data
/dev/sdb1       4.1T   89M  4.1T   1% /scratch

To Do

Done