Difference between revisions of "RAID"

From Nuclear Physics Group Documentation Pages
Jump to navigationJump to search
 
(40 intermediate revisions by 3 users not shown)
Line 3: Line 3:
 
! Hostname !! RAID Controller !! OS !! User Manual !! Web Interface Address
 
! Hostname !! RAID Controller !! OS !! User Manual !! Web Interface Address
 
|-
 
|-
| Taro.unh.edu || Areca Technology Corp. ARC-1231 12-Port PCI-Express || RHEL 5 || [http://nuclear.unh.edu/wiki/images/7/72/areca_arc-1xxx_SATA_Manual_V3.5.pdf Areca ARC-1xxx] || [http://10.0.0.97/ http://10.0.0.97/]
+
| Taro.unh.edu || Areca Technology Corp. ARC-1231 12-Port PCI-Express || RHEL 5 || [http://nuclear.unh.edu/wiki/pdfs/raid/areca_arc-1xxx_SATA_Manual_V3.5.pdf Areca ARC-1xxx] || [http://10.0.0.97/ http://10.0.0.97/]
 
|-
 
|-
| Pumpkin.unh.edu || Areca Technology Corp. ARC-1231 12-Port PCI-Express || RHEL 5 || [http://nuclear.unh.edu/wiki/images/7/72/areca_arc-1xxx_SATA_Manual_V3.5.pdf Areca ARC-1xxx] || [http://10.0.0.99/ http://10.0.0.99/]
+
| Pumpkin.unh.edu || Areca Technology Corp. ARC-1280ML 24-Port PCI-Express || RHEL 6 || [http://nuclear.unh.edu/wiki/pdfs/raid/areca_arc-1xxx_SATA_Manual_V3.5.pdf Areca ARC-1xxx] || [http://10.0.0.99/ http://10.0.0.99/]
 
|-
 
|-
| Gourd.unh.edu || Areca Technology Corp. ARC-1680 8-port PCIe SAS RAID Adpater || RHEL 5 || [http://nuclear.unh.edu/wiki/images/7/72/areca_arc-1680_manual.pdf Areca ARC-168x] || [http://10.0.0.152/ http://10.0.0.152]
+
| Gourd.unh.edu || Areca Technology Corp. ARC-1680 8-port PCIe SAS RAID Adpater || RHEL 5 || [http://nuclear.unh.edu/wiki/pdfs/raid/areca_arc-1680_manual.pdf Areca ARC-168x] || [http://10.0.0.152/ http://10.0.0.152]
 +
|-
 +
| Endeavour.unh.edu || Areca Technology Corp. ARC-1280 24-Port PCI-Express || RHEL 5 || || [http://10.0.0.199 http://10.0.0.199]
 
|-
 
|-
 
| Tomato.unh.edu ||  3ware Inc 9000-series || RHEL 3.4 || - || -
 
| Tomato.unh.edu ||  3ware Inc 9000-series || RHEL 3.4 || - || -
Line 20: Line 22:
 
|-
 
|-
 
|}
 
|}
 +
 +
== Step to move Mail, Home, KVM RAID ==
 +
* See [[Move Mail RAID]]
  
 
== New RAID cards: ARECA ==
 
== New RAID cards: ARECA ==
Line 28: Line 33:
 
*  [http://10.0.0.99/ Taro 10.0.0.97]
 
*  [http://10.0.0.99/ Taro 10.0.0.97]
 
* [http://10.0.0.152/ Gourd 10.0.0.152]
 
* [http://10.0.0.152/ Gourd 10.0.0.152]
 +
* [http://10.0.0.199/ Endeavour 10.0.0.199]
 
You log in as "admin" with the standard root password missing the prefix part.
 
You log in as "admin" with the standard root password missing the prefix part.
 +
 +
=== Areca CLI ===
 +
The Areca cards can be accessed through the command line as root.  The command areca_cli64 is installed on all machines with an Areca card.  The [http://nuclear.unh.edu/wiki/pdfs/raid/areca_CLIManual.pdf Areca CLI Manual] explains in full detail how to use the CLI interface.  Here are a few commands that are quick and easy.
 +
 +
*areca_cli64 hw info - To monitor the Areca hardware
 +
*areca_cli64 disk info - To view status of all the drives
 +
*areca_cli64 disk smart drv=<drive number> - To view the smart data on that drive
 +
*areca_cli64 disk sttest drv=<drive number> - To run a smart self-test on a specified drive
 +
 +
=== E-mail Alerts ===
 +
 +
All Areca cards should be configured to send out e-mail alerts about failed drives or other problems. Here's what you need to do to configure this feature:
 +
 +
*Login to the Areca web interface and select '''System Controls -> Alert By Mail Config'''
 +
*Enter the FARM IP address for [[Einstein]] in the '''SMTP Server IP Address''' field.
 +
*You need to enter an e-mail account and password for the Areca card to use for sending e-mail. Currently I (Adam) have them configured to use my account, but I plan to create an e-mail-only user account for this purpose.
 +
*Enter a name in the '''MailTo Name''' field, accompanied by the address for that person in the '''Mail Address''' field.
 +
*Set the Event notification configuration to '''Urgent Error Notification'''. This setting will limit e-mail alerts to only the most serious of problems. For some reason a user logging into the web interface is considered a "Serious Error", and using that setting will result in e-mail alerts every time someone logs into the web interface, which is annoying.
  
 
=== Working with Areca RAID devices ===
 
=== Working with Areca RAID devices ===
Line 41: Line 65:
 
2. The drive should now be visible in the Areca web interface. Expand the "Physical Drives" folder in the left column and then select "Create a Pass-Through disk".<br/>
 
2. The drive should now be visible in the Areca web interface. Expand the "Physical Drives" folder in the left column and then select "Create a Pass-Through disk".<br/>
 
3. Select the disk you want to create as a pass-through disk and then check the confirmation box before clicking the Submit button.<br/>
 
3. Select the disk you want to create as a pass-through disk and then check the confirmation box before clicking the Submit button.<br/>
 +
 +
====RAID Sets and Volumes====
 +
 +
Setting up a hardware RAID on an Areca card is a two step process. First you have to add the drives you want to use to a RAID set. You need to make sure you add the correct number of drives for the RAID level you want to use, but beyond that all you're doing at this point is making a group of drives you can create a volume on. You'll choose the RAID level when you create the volume. To create a RAID Set:
 +
 +
#Click "Create a RAID Set" under RAID Set Functions
 +
#Select the drives you want to add to the RAID Set. You can only add Free drives to a new RAID set.
 +
#Confirm the operation and click Submit
 +
 +
You now have a RAID set, but in order to use it as a drive you need to create a Volume set. This is where you will select the level of RAID to use. The Areca card will only give you the options for the RAID levels that are possible on a given RAID set, so for example if you only have two drives you can't create a RAID 5 or 6, which requires at least 3 or 4 drives, respectively. The Areca cards support volumes of different RAID levels on the same set of drives. The important settings to bear in mind are the Volume name, which you should set to something more useful than "Volume Set #00004", the RAID level, and the SCSI Channel/ID/LUN. Unless you have a reason to change them you can accept the defaults on everything else. To create a Volume Set:
 +
 +
#Click "Create a Volume Set"
 +
#Select the RAID set you want to create your Volume on and click Submit
 +
#Choose the settings you want to apply to the Volume Set, confirm the operation and click Submit
 +
 +
If you select background initialization you will be able to access the Volume immediately, but if you choose foreground initialization you have to wait for initialization to finish before using the volume. Depending on the size of the volume it may take a long time to initialize (two hours for a 500gb RAID volume, in my experience), so you might want to go and make yourself a cup of coffee.
  
 
==== Adding/removing scsi devices on the fly====
 
==== Adding/removing scsi devices on the fly====
  
There is a way to add and remove scsi devices without rebooting by using the [http://tldp.org/HOWTO/SCSI-2.4-HOWTO/mlproc.html proc interface]. The values for Channel, SCSI ID and Lun can be obtained from the Areca card's web interface under Information/RAID Set Hierarchy.<br/>
+
When you create a RAID volume on the Areca card the system won't see it until you reboot. There is a way to add and remove scsi devices without rebooting by using the [http://tldp.org/HOWTO/SCSI-2.4-HOWTO/mlproc.html proc interface].  
  
This is the command I used to add the first pass-through drive I created. Note that for this to work you have to be logged in as root (sudo won't work):  
+
You should have the values for Channel, SCSI ID and LUN from when you added the drive, but if not they're listed next to the volume name on the Areca web management interface. Note that the following command takes four values, but you only get three from the Areca card. The values from the Areca card should be the last three, and the first value should be zero. Use this command as root (sudo won't work) to add a scsi device:
  
 
  root@gourd# echo "scsi add-single-device 0 0 0 1" > /proc/scsi/scsi
 
  root@gourd# echo "scsi add-single-device 0 0 0 1" > /proc/scsi/scsi
  
Once you've run this command your drive should now appear in /dev/sd* with the other scsi devices. To remove this device you would use a similar command:
+
Once you've run this command your volume should appear in /dev/sd* with the other scsi devices. To remove this device you would use a similar command:
  
 
  root@gourd# echo "scsi remove-single-device 0 0 0 1" > /proc/scsi/scsi
 
  root@gourd# echo "scsi remove-single-device 0 0 0 1" > /proc/scsi/scsi
Line 58: Line 98:
 
'''DO NOT ATTEMPT THIS WITHOUT TOP SUPERVISOR PRESENT'''
 
'''DO NOT ATTEMPT THIS WITHOUT TOP SUPERVISOR PRESENT'''
  
[http://www.hardforum.com/showthread.php?t=1346321 some unchecked advice in a forum]
+
Really, this shouldn't even be here. Contact Maurik immediately. Don't even think about hosing the RAID drive.
 +
 
 +
===Alert by Mail===
 +
 
 +
The Areca cards in [[Gourd]], [[Taro]], and [[Pumpkin]] are configured to send e-mail alerts of any urgent errors or events. There are several categories of errors; urgent, serious and warning. It isn't clear how these errors are categorized, but I did discover that logging into the web interface is considered a "Serious" event. In order to keep our mailing list from filling up with a bunch of notifications from every time someone logged into a RAID card's web interface I've set the cards to only send out notifications on "Urgent" events.
 +
 
 +
In order to be able to send e-mail the RAID cards need to have a login and password to the mail server. I've given them my account info for the time being, but I intend to create a special account just for the Areca cards to be able to send mail.
 +
 
 +
== Software RAID ==
 +
 
 +
RAID volumes in Linux are created using partitions, so the first step in creating a software RAID is creating the partitions you want to add to the RAID. These partitions must all be the same size. Once you have the partitions created you can use the mdadm tool to create a new RAID device. The command should look something like this:
 +
 
 +
mdadm --create /dev/mdX --level=<num> --raid-devices=<num> <partition list>
 +
 
 +
Where mdX is the number of the RAID device you want to create, level is the RAID level you wish to use, raid devices specify how many partitions will be added to the RAID array, and the remainder of the arguments is the list of partitions you want to use for the RAID array. In order to create a RAID 1 array you would use something like the following example:
 +
 
 +
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda2 /dev/sdb2
 +
 
 +
The file /proc/mdstat will display the status of the RAID device initialization. Depending on the size of the array it may take a while to finish initializing. You still should be able to format, mount and use the RAID device before initialization has finished, though the performance may not be optimal.
 +
 
 +
To fail a drive in a RAID array, use the following command:
 +
 
 +
mdadm /dev/mdX -f <failed device>
 +
 
 +
To add a new device to a RAID array, use this command:
 +
 
 +
madam /dev/mdX -a <new device>
 +
 
 +
=== Rebuilding a Software RAID ===
 +
Should /proc/mdstat show that a problem exists with one of the RAID arrays, you can rebuild it by performing the following steps ([https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Introduction_To_System_Administration/s3-storage-raid-day2day-add.html 5.9.9.2. Rebuilding a RAID array]):
 +
*# Remove the disk from the raid array.
 +
*#* mdadm --manage /dev/md0 -r /dev/sdc3
 +
*# Remove the disk from the system.
 +
*# Using fdisk, replace the removed disk and re-format the replacement disk.
 +
*# Add the new disk back to the RAID array.
 +
*#* Note: In order to get the system (Gourd) to recognize the new drive you must go into the Areca card and delete the newly added drive as a hot spare and then add the drive as a passthrough under the menu Physical Drive.
 +
*#* Note: To make sure that the partitions match precisely, run fdisk on the working drive in the RAID array and use the cylinder numbers to create the partitions on the new drive just added.
 +
*#* mdadm --manage /dev/md0 -a /dev/sdc3
 +
*# To restore the disk, perform a "software fail" the previous spare slice:
 +
*#* mdadm --manage --set-faulty /dev/md0 /dev/sdc3
 +
*# The system will now attempt to rebuild the array on the replaced disk. Use the following command to monitor status:
 +
*#* watch -n 1 cat /proc/mdstat
 +
*# When the array is finished rebuilding, remove and then re-add the software-failed disk back to the array.
 +
*#* mdadm --manage /dev/md0 -r /dev/sdc3
 +
*#* mdadm --manage /dev/md0 -a /dev/sdc3
 +
*# Check the array.
 +
*#* mdadm --detail /dev/md0
  
 
== Old RAID cards ==
 
== Old RAID cards ==
Line 92: Line 178:
  
 
Contains a RAID with 8 Maxtor drives (251GB, Model number 7Y250M0)
 
Contains a RAID with 8 Maxtor drives (251GB, Model number 7Y250M0)
 +
 +
== RAID Drives ==
 +
{| border="1"
 +
|+ Pumpkin Drives
 +
|- style="height: 50px;"
 +
| width="250pt" | Ch 1
 +
ST3750640NS
 +
| width="250pt" | Ch 2
 +
ST3750640NS
 +
| width="250pt" | Ch 3
 +
ST3750640NS
 +
| width="250pt" | Ch 4
 +
ST3750640NS
 +
|- style="height: 50px;"
 +
| Ch 5
 +
ST3750640NS
 +
| Ch 6  (HS)(02/20/12)
 +
ST3750640AS
 +
| Ch 7
 +
ST3750640NS
 +
| Ch 8
 +
ST3750640NS
 +
|- style="height: 50px;"
 +
| Ch 9
 +
ST3750640NS
 +
| Ch 10 
 +
ST3750640NS
 +
| Ch 11  (02/17/12)
 +
ST3750640AS
 +
| Ch 12 
 +
ST3750640NS
 +
|- style="height: 50px;"
 +
| Ch 13 
 +
ST3750640NS
 +
| Ch 14 
 +
ST3750640NS
 +
| Ch 15
 +
ST3750640NS
 +
| Ch 16
 +
ST3750640NS
 +
|- style="height: 50px;"
 +
| Ch 17
 +
ST3750640NS
 +
| Ch 18
 +
ST3750640NS
 +
| Ch 19
 +
ST3750640NS
 +
| Ch 20
 +
ST3750640NS
 +
|- style="height: 50px;"
 +
| Ch 21
 +
ST3750640NS
 +
| Ch 22
 +
ST3750640NS
 +
| Ch 23
 +
WD7500AAKS-00RBA0
 +
| Ch 24  (HS)
 +
WD7500AAKS-00RBA0
 +
|}
 +
{| border="1"
 +
|+ Endeavour Drives
 +
|- style="height: 50px;"
 +
| width="250pt" | Ch 1  (12/19/11)
 +
ST31000340NS
 +
| width="250pt" | Ch 2 
 +
ST31000340NS
 +
| width="250pt" | Ch 3 
 +
ST31000340NS
 +
| width="250pt" | Ch 4  (11/12/13)
 +
ST31000524AS
 +
|- style="height: 50px;"
 +
| Ch 5 
 +
ST31000528AS
 +
| Ch 6 
 +
ST31000340NS
 +
| Ch 7 
 +
ST31000340NS
 +
| Ch 8 
 +
ST31000340NS
 +
|- style="height: 50px;"
 +
| Ch 9  (03/24/12)
 +
ST31000524AS
 +
| Ch 10  (07/22/11)
 +
ST31000526SV
 +
| Ch 11 
 +
ST31000340NS
 +
| Ch 12 
 +
ST31000340NS
 +
|- style="height: 50px;"
 +
| Ch 13 
 +
ST31000340NS
 +
| Ch 14 
 +
ST31000340NS
 +
| Ch 15 
 +
ST31000340NS
 +
| Ch 16 
 +
ST31000340NS
 +
|- style="height: 50px;"
 +
| Ch 17 
 +
ST31000340NS
 +
| Ch 18 
 +
ST31000340NS
 +
| Ch 19 
 +
ST31000340NS
 +
| Ch 20 
 +
ST31000340NS
 +
|- style="height: 50px;"
 +
| Ch 21 
 +
ST31000340NS
 +
| Ch 22 
 +
ST31000340NS
 +
| Ch 23 
 +
ST31000340NS
 +
| Ch 24 
 +
ST31000340NS
 +
|}
 +
{| border="1"
 +
|+ Gourd Drives
 +
|- style="height: 50px;"
 +
| width="250pt" | Ch 1               
 +
WD7500AAKS-00RBA0
 +
| width="250pt" | Ch 2               
 +
WD7500AAKS-00RBA0
 +
| width="250pt" | Ch 3               
 +
WD7500AAKS-00RBA0
 +
| width="250pt" | Ch 4 
 +
WD7500AAKS-00RBA0 (07/01/13)
 +
|- style="height: 50px;"
 +
| Ch 5
 +
N.A.
 +
| Ch 6
 +
N.A.
 +
| Ch 7
 +
N.A.
 +
| Ch 8
 +
WD7500AAKS-00RBA0
 +
|}
 +
{| border="1"
 +
|+ Taro Drives
 +
| width="250pt" | Ch 1  (HS)()
 +
Empty
 +
|-
 +
| Ch 2 
 +
WD1002FBYS-01A6B0
 +
|-
 +
| Ch 3 
 +
WD1002FBYS-01A6B0
 +
|-
 +
| Ch 4 
 +
WD1002FBYS-01A6B0
 +
|-
 +
| Ch 5
 +
WD1002FBYS-01A6B0
 +
|-
 +
| Ch 6 
 +
WD1002FBYS-01A6B0
 +
|-
 +
| Ch 7  (09/12/11)
 +
ST31000340NS
 +
|-
 +
| Ch 8  (HS)
 +
WD1002FBYS-01A6B0
 +
|}
 +
{| border="1"
 +
|+ Tomato Drives
 +
|- style="height: 50px;"
 +
| width="200pt" | Drive 1
 +
| width="200pt" | Drive 2
 +
| width="200pt" | Drive 3
 +
| width="200pt" | Drive 4
 +
|- style="height: 50px;"
 +
| Drive 5
 +
| Drive 6
 +
| Drive 7
 +
| Drive 8
 +
|}
 +
 +
== Drive Life Expectancy ==
 +
This is a list of drives we have in our RAID configuration and their life expectancy.  The pdfs where this information is found are located at:
 +
  [http://nuclear.unh.edu/wiki/pdfs/raid/Seagate_ST31000526SV_Manual_102387.pdf Seagate ST31000526SV]
 +
  [http://nuclear.unh.edu/wiki/pdfs/raid/Seagate_ST3750640AS_100402371k.pdf Seagate ST3750640AS]
 +
  [http://nuclear.unh.edu/wiki/pdfs/raid/Seagate_ST3750640NS_100424667b.pdf Seagate ST3750640NS]
 +
  [http://nuclear.unh.edu/wiki/pdfs/raid/WDC_WD7500AAKS-00RBA0_2879-701277.pdf WDC WD7500AAKS]
 +
=== Pumpkin ===
 +
ST3750640NS
 +
  8,760 power-on-hours per year.
 +
  250 average motor start/stop cycles per year.
 +
ST3750640AS
 +
  2400 power-on-hours per year.
 +
  10,000 average motor start/stop cycles per year.
 +
WDC WD7500AAKS-00RBA0
 +
  Start/stop cycles 50,000
 +
=== Endeavour ===
 +
ST31000340NS
 +
  xxx
 +
ST31000524AS
 +
  xxx
 +
ST31000526SV
 +
  MTBF 1,000,000 hours
 +
  Start / Stop Cycles 50,000
 +
  Non-Recoverable Errors 1 per 10^14

Latest revision as of 18:50, 24 May 2016

RAID Controllers

Hostname RAID Controller OS User Manual Web Interface Address
Taro.unh.edu Areca Technology Corp. ARC-1231 12-Port PCI-Express RHEL 5 Areca ARC-1xxx http://10.0.0.97/
Pumpkin.unh.edu Areca Technology Corp. ARC-1280ML 24-Port PCI-Express RHEL 6 Areca ARC-1xxx http://10.0.0.99/
Gourd.unh.edu Areca Technology Corp. ARC-1680 8-port PCIe SAS RAID Adpater RHEL 5 Areca ARC-168x http://10.0.0.152
Endeavour.unh.edu Areca Technology Corp. ARC-1280 24-Port PCI-Express RHEL 5 http://10.0.0.199
Tomato.unh.edu 3ware Inc 9000-series RHEL 3.4 - -
Old Gourd Promise Technology, Inc. PDC20378 (FastTrak 378/SATA 378) RHEL 3.4 - -
? 3ware Inc 7xxx/8xxx-series PATA/SATA-RAID - - -
Pepper.unh.edu Silicon Image, Inc. SiI 3114 RHEL 3.4 - -
Old Einstein Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X RHEL 5.3 - -

Step to move Mail, Home, KVM RAID

New RAID cards: ARECA

The Areca cards in Pumpkin, Taro and Gourd (the new Einstein hardware) can all be accessed with a web browser. The interfaces are all on the backend network:

You log in as "admin" with the standard root password missing the prefix part.

Areca CLI

The Areca cards can be accessed through the command line as root. The command areca_cli64 is installed on all machines with an Areca card. The Areca CLI Manual explains in full detail how to use the CLI interface. Here are a few commands that are quick and easy.

  • areca_cli64 hw info - To monitor the Areca hardware
  • areca_cli64 disk info - To view status of all the drives
  • areca_cli64 disk smart drv=<drive number> - To view the smart data on that drive
  • areca_cli64 disk sttest drv=<drive number> - To run a smart self-test on a specified drive

E-mail Alerts

All Areca cards should be configured to send out e-mail alerts about failed drives or other problems. Here's what you need to do to configure this feature:

  • Login to the Areca web interface and select System Controls -> Alert By Mail Config
  • Enter the FARM IP address for Einstein in the SMTP Server IP Address field.
  • You need to enter an e-mail account and password for the Areca card to use for sending e-mail. Currently I (Adam) have them configured to use my account, but I plan to create an e-mail-only user account for this purpose.
  • Enter a name in the MailTo Name field, accompanied by the address for that person in the Mail Address field.
  • Set the Event notification configuration to Urgent Error Notification. This setting will limit e-mail alerts to only the most serious of problems. For some reason a user logging into the web interface is considered a "Serious Error", and using that setting will result in e-mail alerts every time someone logs into the web interface, which is annoying.

Working with Areca RAID devices

These are my notes from testing out the Areca card in Gourd, and should serve as brief howtos for various features of the RAID cards.

Pass-through drives

Pass-through drives are not controlled by the RAID card. They function as an independent scsi device plugged directly into the system, and are not part of a RAID set. [Gourd] currently has two pass-thru disks set up in a software RAID. These are the steps I took to add a new pass-through device on [Gourd]:

1. Insert the drive into the drive bay
2. The drive should now be visible in the Areca web interface. Expand the "Physical Drives" folder in the left column and then select "Create a Pass-Through disk".
3. Select the disk you want to create as a pass-through disk and then check the confirmation box before clicking the Submit button.

RAID Sets and Volumes

Setting up a hardware RAID on an Areca card is a two step process. First you have to add the drives you want to use to a RAID set. You need to make sure you add the correct number of drives for the RAID level you want to use, but beyond that all you're doing at this point is making a group of drives you can create a volume on. You'll choose the RAID level when you create the volume. To create a RAID Set:

  1. Click "Create a RAID Set" under RAID Set Functions
  2. Select the drives you want to add to the RAID Set. You can only add Free drives to a new RAID set.
  3. Confirm the operation and click Submit

You now have a RAID set, but in order to use it as a drive you need to create a Volume set. This is where you will select the level of RAID to use. The Areca card will only give you the options for the RAID levels that are possible on a given RAID set, so for example if you only have two drives you can't create a RAID 5 or 6, which requires at least 3 or 4 drives, respectively. The Areca cards support volumes of different RAID levels on the same set of drives. The important settings to bear in mind are the Volume name, which you should set to something more useful than "Volume Set #00004", the RAID level, and the SCSI Channel/ID/LUN. Unless you have a reason to change them you can accept the defaults on everything else. To create a Volume Set:

  1. Click "Create a Volume Set"
  2. Select the RAID set you want to create your Volume on and click Submit
  3. Choose the settings you want to apply to the Volume Set, confirm the operation and click Submit

If you select background initialization you will be able to access the Volume immediately, but if you choose foreground initialization you have to wait for initialization to finish before using the volume. Depending on the size of the volume it may take a long time to initialize (two hours for a 500gb RAID volume, in my experience), so you might want to go and make yourself a cup of coffee.

Adding/removing scsi devices on the fly

When you create a RAID volume on the Areca card the system won't see it until you reboot. There is a way to add and remove scsi devices without rebooting by using the proc interface.

You should have the values for Channel, SCSI ID and LUN from when you added the drive, but if not they're listed next to the volume name on the Areca web management interface. Note that the following command takes four values, but you only get three from the Areca card. The values from the Areca card should be the last three, and the first value should be zero. Use this command as root (sudo won't work) to add a scsi device:

root@gourd# echo "scsi add-single-device 0 0 0 1" > /proc/scsi/scsi

Once you've run this command your volume should appear in /dev/sd* with the other scsi devices. To remove this device you would use a similar command:

root@gourd# echo "scsi remove-single-device 0 0 0 1" > /proc/scsi/scsi

Possible rescue techniques

DO NOT ATTEMPT THIS WITHOUT TOP SUPERVISOR PRESENT

Really, this shouldn't even be here. Contact Maurik immediately. Don't even think about hosing the RAID drive.

Alert by Mail

The Areca cards in Gourd, Taro, and Pumpkin are configured to send e-mail alerts of any urgent errors or events. There are several categories of errors; urgent, serious and warning. It isn't clear how these errors are categorized, but I did discover that logging into the web interface is considered a "Serious" event. In order to keep our mailing list from filling up with a bunch of notifications from every time someone logged into a RAID card's web interface I've set the cards to only send out notifications on "Urgent" events.

In order to be able to send e-mail the RAID cards need to have a login and password to the mail server. I've given them my account info for the time being, but I intend to create a special account just for the Areca cards to be able to send mail.

Software RAID

RAID volumes in Linux are created using partitions, so the first step in creating a software RAID is creating the partitions you want to add to the RAID. These partitions must all be the same size. Once you have the partitions created you can use the mdadm tool to create a new RAID device. The command should look something like this:

mdadm --create /dev/mdX --level=<num> --raid-devices=<num> <partition list>

Where mdX is the number of the RAID device you want to create, level is the RAID level you wish to use, raid devices specify how many partitions will be added to the RAID array, and the remainder of the arguments is the list of partitions you want to use for the RAID array. In order to create a RAID 1 array you would use something like the following example:

mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda2 /dev/sdb2

The file /proc/mdstat will display the status of the RAID device initialization. Depending on the size of the array it may take a while to finish initializing. You still should be able to format, mount and use the RAID device before initialization has finished, though the performance may not be optimal.

To fail a drive in a RAID array, use the following command:

mdadm /dev/mdX -f <failed device>

To add a new device to a RAID array, use this command:

madam /dev/mdX -a <new device>

Rebuilding a Software RAID

Should /proc/mdstat show that a problem exists with one of the RAID arrays, you can rebuild it by performing the following steps (5.9.9.2. Rebuilding a RAID array):

    1. Remove the disk from the raid array.
      • mdadm --manage /dev/md0 -r /dev/sdc3
    2. Remove the disk from the system.
    3. Using fdisk, replace the removed disk and re-format the replacement disk.
    4. Add the new disk back to the RAID array.
      • Note: In order to get the system (Gourd) to recognize the new drive you must go into the Areca card and delete the newly added drive as a hot spare and then add the drive as a passthrough under the menu Physical Drive.
      • Note: To make sure that the partitions match precisely, run fdisk on the working drive in the RAID array and use the cylinder numbers to create the partitions on the new drive just added.
      • mdadm --manage /dev/md0 -a /dev/sdc3
    5. To restore the disk, perform a "software fail" the previous spare slice:
      • mdadm --manage --set-faulty /dev/md0 /dev/sdc3
    6. The system will now attempt to rebuild the array on the replaced disk. Use the following command to monitor status:
      • watch -n 1 cat /proc/mdstat
    7. When the array is finished rebuilding, remove and then re-add the software-failed disk back to the array.
      • mdadm --manage /dev/md0 -r /dev/sdc3
      • mdadm --manage /dev/md0 -a /dev/sdc3
    8. Check the array.
      • mdadm --detail /dev/md0

Old RAID cards

The documentation for the controller should be available in /usr/local/doc/3dm2. There should be a deamon running, start it with "/etc/init.d/3dm2 start" With the deamon running, the device can be checked and manipulated using a web browser on the local machine pointing to http://localhost:888/. Log in as administrator with the root password.

More recently (2006) we no longer run this deamon, instead the RAIDS can be querried and controlled with tw_cli, in /usr/local/bin. Type tw_cli help for help. You must be root to run this program.

Examples (must be root):

tw_cli help info
tw_cli info c0     # info for card 0
tw_cli info c0 u0    # info for unit 0, Tells you it is RAID-5, Status OK, size, Stripe size
tw_cli info c0 p0  # info on disk0 on card0, size, serial number.
tw_cli info c0 p0 model # model number of disk (Maxtor 6B200S0)

Be totally wicked careful with any of the other commands PLEASE

TOMATO:

Contains a RAID with 12 Maxtor Diamond Max 10 drives (300GB, model number: 6B300S0). Data Sheet

Size: 300 GB
Spin: 7200 RPM
Buffer:  16 MB
Seek: <9 ms
Latency: 4.17 ms]
Current/Power - not specified. 1.2 Amp/ 15 Watt is a good guess

OLD GOURD:

Contains a RAID with 8 Maxtor drives (251GB, Model number 7Y250M0)

RAID Drives

Pumpkin Drives
Ch 1

ST3750640NS

Ch 2

ST3750640NS

Ch 3

ST3750640NS

Ch 4

ST3750640NS

Ch 5

ST3750640NS

Ch 6 (HS)(02/20/12)

ST3750640AS

Ch 7

ST3750640NS

Ch 8

ST3750640NS

Ch 9

ST3750640NS

Ch 10

ST3750640NS

Ch 11 (02/17/12)

ST3750640AS

Ch 12

ST3750640NS

Ch 13

ST3750640NS

Ch 14

ST3750640NS

Ch 15

ST3750640NS

Ch 16

ST3750640NS

Ch 17

ST3750640NS

Ch 18

ST3750640NS

Ch 19

ST3750640NS

Ch 20

ST3750640NS

Ch 21

ST3750640NS

Ch 22

ST3750640NS

Ch 23

WD7500AAKS-00RBA0

Ch 24 (HS)

WD7500AAKS-00RBA0

Endeavour Drives
Ch 1 (12/19/11)

ST31000340NS

Ch 2

ST31000340NS

Ch 3

ST31000340NS

Ch 4 (11/12/13)

ST31000524AS

Ch 5

ST31000528AS

Ch 6

ST31000340NS

Ch 7

ST31000340NS

Ch 8

ST31000340NS

Ch 9 (03/24/12)

ST31000524AS

Ch 10 (07/22/11)

ST31000526SV

Ch 11

ST31000340NS

Ch 12

ST31000340NS

Ch 13

ST31000340NS

Ch 14

ST31000340NS

Ch 15

ST31000340NS

Ch 16

ST31000340NS

Ch 17

ST31000340NS

Ch 18

ST31000340NS

Ch 19

ST31000340NS

Ch 20

ST31000340NS

Ch 21

ST31000340NS

Ch 22

ST31000340NS

Ch 23

ST31000340NS

Ch 24

ST31000340NS

Gourd Drives
Ch 1

WD7500AAKS-00RBA0

Ch 2

WD7500AAKS-00RBA0

Ch 3

WD7500AAKS-00RBA0

Ch 4

WD7500AAKS-00RBA0 (07/01/13)

Ch 5

N.A.

Ch 6

N.A.

Ch 7

N.A.

Ch 8

WD7500AAKS-00RBA0

Taro Drives
Ch 1 (HS)()

Empty

Ch 2

WD1002FBYS-01A6B0

Ch 3

WD1002FBYS-01A6B0

Ch 4

WD1002FBYS-01A6B0

Ch 5

WD1002FBYS-01A6B0

Ch 6

WD1002FBYS-01A6B0

Ch 7 (09/12/11)

ST31000340NS

Ch 8 (HS)

WD1002FBYS-01A6B0

Tomato Drives
Drive 1 Drive 2 Drive 3 Drive 4
Drive 5 Drive 6 Drive 7 Drive 8

Drive Life Expectancy

This is a list of drives we have in our RAID configuration and their life expectancy. The pdfs where this information is found are located at:

 Seagate ST31000526SV
 Seagate ST3750640AS
 Seagate ST3750640NS
 WDC WD7500AAKS

Pumpkin

ST3750640NS

 8,760 power-on-hours per year.
 250 average motor start/stop cycles per year.

ST3750640AS

 2400 power-on-hours per year.
 10,000 average motor start/stop cycles per year.

WDC WD7500AAKS-00RBA0

 Start/stop cycles 50,000

Endeavour

ST31000340NS

 xxx

ST31000524AS

 xxx

ST31000526SV

 MTBF 1,000,000 hours
 Start / Stop Cycles 50,000
 Non-Recoverable Errors 1 per 10^14