Difference between revisions of "RAID"
(47 intermediate revisions by 3 users not shown) | |||
Line 3: | Line 3: | ||
! Hostname !! RAID Controller !! OS !! User Manual !! Web Interface Address | ! Hostname !! RAID Controller !! OS !! User Manual !! Web Interface Address | ||
|- | |- | ||
− | | Taro.unh.edu || Areca Technology Corp. ARC-1231 12-Port PCI-Express || RHEL 5 || [http://nuclear.unh.edu/wiki/ | + | | Taro.unh.edu || Areca Technology Corp. ARC-1231 12-Port PCI-Express || RHEL 5 || [http://nuclear.unh.edu/wiki/pdfs/raid/areca_arc-1xxx_SATA_Manual_V3.5.pdf Areca ARC-1xxx] || [http://10.0.0.97/ http://10.0.0.97/] |
|- | |- | ||
− | | Pumpkin.unh.edu || Areca Technology Corp. ARC- | + | | Pumpkin.unh.edu || Areca Technology Corp. ARC-1280ML 24-Port PCI-Express || RHEL 6 || [http://nuclear.unh.edu/wiki/pdfs/raid/areca_arc-1xxx_SATA_Manual_V3.5.pdf Areca ARC-1xxx] || [http://10.0.0.99/ http://10.0.0.99/] |
|- | |- | ||
− | | Gourd.unh.edu || Areca Technology Corp. ARC-1680 8-port PCIe SAS RAID Adpater || RHEL 5 || [http://nuclear.unh.edu/wiki/ | + | | Gourd.unh.edu || Areca Technology Corp. ARC-1680 8-port PCIe SAS RAID Adpater || RHEL 5 || [http://nuclear.unh.edu/wiki/pdfs/raid/areca_arc-1680_manual.pdf Areca ARC-168x] || [http://10.0.0.152/ http://10.0.0.152] |
+ | |- | ||
+ | | Endeavour.unh.edu || Areca Technology Corp. ARC-1280 24-Port PCI-Express || RHEL 5 || || [http://10.0.0.199 http://10.0.0.199] | ||
|- | |- | ||
| Tomato.unh.edu || 3ware Inc 9000-series || RHEL 3.4 || - || - | | Tomato.unh.edu || 3ware Inc 9000-series || RHEL 3.4 || - || - | ||
Line 13: | Line 15: | ||
| '''Old''' Gourd || Promise Technology, Inc. PDC20378 (FastTrak 378/SATA 378) || RHEL 3.4 || - || - | | '''Old''' Gourd || Promise Technology, Inc. PDC20378 (FastTrak 378/SATA 378) || RHEL 3.4 || - || - | ||
|- | |- | ||
− | | | + | | '''?''' || 3ware Inc 7xxx/8xxx-series PATA/SATA-RAID || - || - || - |
|- | |- | ||
| Pepper.unh.edu || Silicon Image, Inc. SiI 3114 || RHEL 3.4 || - || - | | Pepper.unh.edu || Silicon Image, Inc. SiI 3114 || RHEL 3.4 || - || - | ||
Line 20: | Line 22: | ||
|- | |- | ||
|} | |} | ||
+ | |||
+ | == Step to move Mail, Home, KVM RAID == | ||
+ | * See [[Move Mail RAID]] | ||
== New RAID cards: ARECA == | == New RAID cards: ARECA == | ||
− | The Areca cards in Pumpkin, Taro and the new Einstein hardware can all be accessed with a web browser. | + | The Areca cards in Pumpkin, Taro and Gourd (the new Einstein hardware) can all be accessed with a web browser. |
The interfaces are all on the backend network: | The interfaces are all on the backend network: | ||
* [http://10.0.0.99/ Pumpkin 10.0.0.99] | * [http://10.0.0.99/ Pumpkin 10.0.0.99] | ||
* [http://10.0.0.99/ Taro 10.0.0.97] | * [http://10.0.0.99/ Taro 10.0.0.97] | ||
* [http://10.0.0.152/ Gourd 10.0.0.152] | * [http://10.0.0.152/ Gourd 10.0.0.152] | ||
+ | * [http://10.0.0.199/ Endeavour 10.0.0.199] | ||
You log in as "admin" with the standard root password missing the prefix part. | You log in as "admin" with the standard root password missing the prefix part. | ||
+ | |||
+ | === Areca CLI === | ||
+ | The Areca cards can be accessed through the command line as root. The command areca_cli64 is installed on all machines with an Areca card. The [http://nuclear.unh.edu/wiki/pdfs/raid/areca_CLIManual.pdf Areca CLI Manual] explains in full detail how to use the CLI interface. Here are a few commands that are quick and easy. | ||
+ | |||
+ | *areca_cli64 hw info - To monitor the Areca hardware | ||
+ | *areca_cli64 disk info - To view status of all the drives | ||
+ | *areca_cli64 disk smart drv=<drive number> - To view the smart data on that drive | ||
+ | *areca_cli64 disk sttest drv=<drive number> - To run a smart self-test on a specified drive | ||
+ | |||
+ | === E-mail Alerts === | ||
+ | |||
+ | All Areca cards should be configured to send out e-mail alerts about failed drives or other problems. Here's what you need to do to configure this feature: | ||
+ | |||
+ | *Login to the Areca web interface and select '''System Controls -> Alert By Mail Config''' | ||
+ | *Enter the FARM IP address for [[Einstein]] in the '''SMTP Server IP Address''' field. | ||
+ | *You need to enter an e-mail account and password for the Areca card to use for sending e-mail. Currently I (Adam) have them configured to use my account, but I plan to create an e-mail-only user account for this purpose. | ||
+ | *Enter a name in the '''MailTo Name''' field, accompanied by the address for that person in the '''Mail Address''' field. | ||
+ | *Set the Event notification configuration to '''Urgent Error Notification'''. This setting will limit e-mail alerts to only the most serious of problems. For some reason a user logging into the web interface is considered a "Serious Error", and using that setting will result in e-mail alerts every time someone logs into the web interface, which is annoying. | ||
+ | |||
+ | === Working with Areca RAID devices === | ||
+ | |||
+ | These are my notes from testing out the Areca card in Gourd, and should serve as brief howtos for various features of the RAID cards. | ||
+ | |||
+ | ==== Pass-through drives ==== | ||
+ | |||
+ | Pass-through drives are not controlled by the RAID card. They function as an independent scsi device plugged directly into the system, and are not part of a RAID set. [Gourd] currently has two pass-thru disks set up in a software RAID. These are the steps I took to add a new pass-through device on [Gourd]: | ||
+ | |||
+ | 1. Insert the drive into the drive bay<br/> | ||
+ | 2. The drive should now be visible in the Areca web interface. Expand the "Physical Drives" folder in the left column and then select "Create a Pass-Through disk".<br/> | ||
+ | 3. Select the disk you want to create as a pass-through disk and then check the confirmation box before clicking the Submit button.<br/> | ||
+ | |||
+ | ====RAID Sets and Volumes==== | ||
+ | |||
+ | Setting up a hardware RAID on an Areca card is a two step process. First you have to add the drives you want to use to a RAID set. You need to make sure you add the correct number of drives for the RAID level you want to use, but beyond that all you're doing at this point is making a group of drives you can create a volume on. You'll choose the RAID level when you create the volume. To create a RAID Set: | ||
+ | |||
+ | #Click "Create a RAID Set" under RAID Set Functions | ||
+ | #Select the drives you want to add to the RAID Set. You can only add Free drives to a new RAID set. | ||
+ | #Confirm the operation and click Submit | ||
+ | |||
+ | You now have a RAID set, but in order to use it as a drive you need to create a Volume set. This is where you will select the level of RAID to use. The Areca card will only give you the options for the RAID levels that are possible on a given RAID set, so for example if you only have two drives you can't create a RAID 5 or 6, which requires at least 3 or 4 drives, respectively. The Areca cards support volumes of different RAID levels on the same set of drives. The important settings to bear in mind are the Volume name, which you should set to something more useful than "Volume Set #00004", the RAID level, and the SCSI Channel/ID/LUN. Unless you have a reason to change them you can accept the defaults on everything else. To create a Volume Set: | ||
+ | |||
+ | #Click "Create a Volume Set" | ||
+ | #Select the RAID set you want to create your Volume on and click Submit | ||
+ | #Choose the settings you want to apply to the Volume Set, confirm the operation and click Submit | ||
+ | |||
+ | If you select background initialization you will be able to access the Volume immediately, but if you choose foreground initialization you have to wait for initialization to finish before using the volume. Depending on the size of the volume it may take a long time to initialize (two hours for a 500gb RAID volume, in my experience), so you might want to go and make yourself a cup of coffee. | ||
+ | |||
+ | ==== Adding/removing scsi devices on the fly==== | ||
+ | |||
+ | When you create a RAID volume on the Areca card the system won't see it until you reboot. There is a way to add and remove scsi devices without rebooting by using the [http://tldp.org/HOWTO/SCSI-2.4-HOWTO/mlproc.html proc interface]. | ||
+ | |||
+ | You should have the values for Channel, SCSI ID and LUN from when you added the drive, but if not they're listed next to the volume name on the Areca web management interface. Note that the following command takes four values, but you only get three from the Areca card. The values from the Areca card should be the last three, and the first value should be zero. Use this command as root (sudo won't work) to add a scsi device: | ||
+ | |||
+ | root@gourd# echo "scsi add-single-device 0 0 0 1" > /proc/scsi/scsi | ||
+ | |||
+ | Once you've run this command your volume should appear in /dev/sd* with the other scsi devices. To remove this device you would use a similar command: | ||
+ | |||
+ | root@gourd# echo "scsi remove-single-device 0 0 0 1" > /proc/scsi/scsi | ||
=== Possible rescue techniques === | === Possible rescue techniques === | ||
Line 34: | Line 98: | ||
'''DO NOT ATTEMPT THIS WITHOUT TOP SUPERVISOR PRESENT''' | '''DO NOT ATTEMPT THIS WITHOUT TOP SUPERVISOR PRESENT''' | ||
− | + | Really, this shouldn't even be here. Contact Maurik immediately. Don't even think about hosing the RAID drive. | |
+ | |||
+ | ===Alert by Mail=== | ||
+ | |||
+ | The Areca cards in [[Gourd]], [[Taro]], and [[Pumpkin]] are configured to send e-mail alerts of any urgent errors or events. There are several categories of errors; urgent, serious and warning. It isn't clear how these errors are categorized, but I did discover that logging into the web interface is considered a "Serious" event. In order to keep our mailing list from filling up with a bunch of notifications from every time someone logged into a RAID card's web interface I've set the cards to only send out notifications on "Urgent" events. | ||
+ | |||
+ | In order to be able to send e-mail the RAID cards need to have a login and password to the mail server. I've given them my account info for the time being, but I intend to create a special account just for the Areca cards to be able to send mail. | ||
+ | |||
+ | == Software RAID == | ||
+ | RAID volumes in Linux are created using partitions, so the first step in creating a software RAID is creating the partitions you want to add to the RAID. These partitions must all be the same size. Once you have the partitions created you can use the mdadm tool to create a new RAID device. The command should look something like this: | ||
+ | |||
+ | mdadm --create /dev/mdX --level=<num> --raid-devices=<num> <partition list> | ||
+ | |||
+ | Where mdX is the number of the RAID device you want to create, level is the RAID level you wish to use, raid devices specify how many partitions will be added to the RAID array, and the remainder of the arguments is the list of partitions you want to use for the RAID array. In order to create a RAID 1 array you would use something like the following example: | ||
+ | |||
+ | mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda2 /dev/sdb2 | ||
+ | |||
+ | The file /proc/mdstat will display the status of the RAID device initialization. Depending on the size of the array it may take a while to finish initializing. You still should be able to format, mount and use the RAID device before initialization has finished, though the performance may not be optimal. | ||
+ | |||
+ | To fail a drive in a RAID array, use the following command: | ||
+ | |||
+ | mdadm /dev/mdX -f <failed device> | ||
+ | |||
+ | To add a new device to a RAID array, use this command: | ||
+ | |||
+ | madam /dev/mdX -a <new device> | ||
+ | |||
+ | === Rebuilding a Software RAID === | ||
+ | Should /proc/mdstat show that a problem exists with one of the RAID arrays, you can rebuild it by performing the following steps ([https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Introduction_To_System_Administration/s3-storage-raid-day2day-add.html 5.9.9.2. Rebuilding a RAID array]): | ||
+ | *# Remove the disk from the raid array. | ||
+ | *#* mdadm --manage /dev/md0 -r /dev/sdc3 | ||
+ | *# Remove the disk from the system. | ||
+ | *# Using fdisk, replace the removed disk and re-format the replacement disk. | ||
+ | *# Add the new disk back to the RAID array. | ||
+ | *#* Note: In order to get the system (Gourd) to recognize the new drive you must go into the Areca card and delete the newly added drive as a hot spare and then add the drive as a passthrough under the menu Physical Drive. | ||
+ | *#* Note: To make sure that the partitions match precisely, run fdisk on the working drive in the RAID array and use the cylinder numbers to create the partitions on the new drive just added. | ||
+ | *#* mdadm --manage /dev/md0 -a /dev/sdc3 | ||
+ | *# To restore the disk, perform a "software fail" the previous spare slice: | ||
+ | *#* mdadm --manage --set-faulty /dev/md0 /dev/sdc3 | ||
+ | *# The system will now attempt to rebuild the array on the replaced disk. Use the following command to monitor status: | ||
+ | *#* watch -n 1 cat /proc/mdstat | ||
+ | *# When the array is finished rebuilding, remove and then re-add the software-failed disk back to the array. | ||
+ | *#* mdadm --manage /dev/md0 -r /dev/sdc3 | ||
+ | *#* mdadm --manage /dev/md0 -a /dev/sdc3 | ||
+ | *# Check the array. | ||
+ | *#* mdadm --detail /dev/md0 | ||
== Old RAID cards == | == Old RAID cards == | ||
Line 66: | Line 175: | ||
Current/Power - not specified. 1.2 Amp/ 15 Watt is a good guess | Current/Power - not specified. 1.2 Amp/ 15 Watt is a good guess | ||
− | ==== | + | ==== OLD GOURD: ==== |
Contains a RAID with 8 Maxtor drives (251GB, Model number 7Y250M0) | Contains a RAID with 8 Maxtor drives (251GB, Model number 7Y250M0) | ||
+ | |||
+ | == RAID Drives == | ||
+ | {| border="1" | ||
+ | |+ Pumpkin Drives | ||
+ | |- style="height: 50px;" | ||
+ | | width="250pt" | Ch 1 | ||
+ | ST3750640NS | ||
+ | | width="250pt" | Ch 2 | ||
+ | ST3750640NS | ||
+ | | width="250pt" | Ch 3 | ||
+ | ST3750640NS | ||
+ | | width="250pt" | Ch 4 | ||
+ | ST3750640NS | ||
+ | |- style="height: 50px;" | ||
+ | | Ch 5 | ||
+ | ST3750640NS | ||
+ | | Ch 6 (HS)(02/20/12) | ||
+ | ST3750640AS | ||
+ | | Ch 7 | ||
+ | ST3750640NS | ||
+ | | Ch 8 | ||
+ | ST3750640NS | ||
+ | |- style="height: 50px;" | ||
+ | | Ch 9 | ||
+ | ST3750640NS | ||
+ | | Ch 10 | ||
+ | ST3750640NS | ||
+ | | Ch 11 (02/17/12) | ||
+ | ST3750640AS | ||
+ | | Ch 12 | ||
+ | ST3750640NS | ||
+ | |- style="height: 50px;" | ||
+ | | Ch 13 | ||
+ | ST3750640NS | ||
+ | | Ch 14 | ||
+ | ST3750640NS | ||
+ | | Ch 15 | ||
+ | ST3750640NS | ||
+ | | Ch 16 | ||
+ | ST3750640NS | ||
+ | |- style="height: 50px;" | ||
+ | | Ch 17 | ||
+ | ST3750640NS | ||
+ | | Ch 18 | ||
+ | ST3750640NS | ||
+ | | Ch 19 | ||
+ | ST3750640NS | ||
+ | | Ch 20 | ||
+ | ST3750640NS | ||
+ | |- style="height: 50px;" | ||
+ | | Ch 21 | ||
+ | ST3750640NS | ||
+ | | Ch 22 | ||
+ | ST3750640NS | ||
+ | | Ch 23 | ||
+ | WD7500AAKS-00RBA0 | ||
+ | | Ch 24 (HS) | ||
+ | WD7500AAKS-00RBA0 | ||
+ | |} | ||
+ | {| border="1" | ||
+ | |+ Endeavour Drives | ||
+ | |- style="height: 50px;" | ||
+ | | width="250pt" | Ch 1 (12/19/11) | ||
+ | ST31000340NS | ||
+ | | width="250pt" | Ch 2 | ||
+ | ST31000340NS | ||
+ | | width="250pt" | Ch 3 | ||
+ | ST31000340NS | ||
+ | | width="250pt" | Ch 4 (11/12/13) | ||
+ | ST31000524AS | ||
+ | |- style="height: 50px;" | ||
+ | | Ch 5 | ||
+ | ST31000528AS | ||
+ | | Ch 6 | ||
+ | ST31000340NS | ||
+ | | Ch 7 | ||
+ | ST31000340NS | ||
+ | | Ch 8 | ||
+ | ST31000340NS | ||
+ | |- style="height: 50px;" | ||
+ | | Ch 9 (03/24/12) | ||
+ | ST31000524AS | ||
+ | | Ch 10 (07/22/11) | ||
+ | ST31000526SV | ||
+ | | Ch 11 | ||
+ | ST31000340NS | ||
+ | | Ch 12 | ||
+ | ST31000340NS | ||
+ | |- style="height: 50px;" | ||
+ | | Ch 13 | ||
+ | ST31000340NS | ||
+ | | Ch 14 | ||
+ | ST31000340NS | ||
+ | | Ch 15 | ||
+ | ST31000340NS | ||
+ | | Ch 16 | ||
+ | ST31000340NS | ||
+ | |- style="height: 50px;" | ||
+ | | Ch 17 | ||
+ | ST31000340NS | ||
+ | | Ch 18 | ||
+ | ST31000340NS | ||
+ | | Ch 19 | ||
+ | ST31000340NS | ||
+ | | Ch 20 | ||
+ | ST31000340NS | ||
+ | |- style="height: 50px;" | ||
+ | | Ch 21 | ||
+ | ST31000340NS | ||
+ | | Ch 22 | ||
+ | ST31000340NS | ||
+ | | Ch 23 | ||
+ | ST31000340NS | ||
+ | | Ch 24 | ||
+ | ST31000340NS | ||
+ | |} | ||
+ | {| border="1" | ||
+ | |+ Gourd Drives | ||
+ | |- style="height: 50px;" | ||
+ | | width="250pt" | Ch 1 | ||
+ | WD7500AAKS-00RBA0 | ||
+ | | width="250pt" | Ch 2 | ||
+ | WD7500AAKS-00RBA0 | ||
+ | | width="250pt" | Ch 3 | ||
+ | WD7500AAKS-00RBA0 | ||
+ | | width="250pt" | Ch 4 | ||
+ | WD7500AAKS-00RBA0 (07/01/13) | ||
+ | |- style="height: 50px;" | ||
+ | | Ch 5 | ||
+ | N.A. | ||
+ | | Ch 6 | ||
+ | N.A. | ||
+ | | Ch 7 | ||
+ | N.A. | ||
+ | | Ch 8 | ||
+ | WD7500AAKS-00RBA0 | ||
+ | |} | ||
+ | {| border="1" | ||
+ | |+ Taro Drives | ||
+ | | width="250pt" | Ch 1 (HS)() | ||
+ | Empty | ||
+ | |- | ||
+ | | Ch 2 | ||
+ | WD1002FBYS-01A6B0 | ||
+ | |- | ||
+ | | Ch 3 | ||
+ | WD1002FBYS-01A6B0 | ||
+ | |- | ||
+ | | Ch 4 | ||
+ | WD1002FBYS-01A6B0 | ||
+ | |- | ||
+ | | Ch 5 | ||
+ | WD1002FBYS-01A6B0 | ||
+ | |- | ||
+ | | Ch 6 | ||
+ | WD1002FBYS-01A6B0 | ||
+ | |- | ||
+ | | Ch 7 (09/12/11) | ||
+ | ST31000340NS | ||
+ | |- | ||
+ | | Ch 8 (HS) | ||
+ | WD1002FBYS-01A6B0 | ||
+ | |} | ||
+ | {| border="1" | ||
+ | |+ Tomato Drives | ||
+ | |- style="height: 50px;" | ||
+ | | width="200pt" | Drive 1 | ||
+ | | width="200pt" | Drive 2 | ||
+ | | width="200pt" | Drive 3 | ||
+ | | width="200pt" | Drive 4 | ||
+ | |- style="height: 50px;" | ||
+ | | Drive 5 | ||
+ | | Drive 6 | ||
+ | | Drive 7 | ||
+ | | Drive 8 | ||
+ | |} | ||
+ | |||
+ | == Drive Life Expectancy == | ||
+ | This is a list of drives we have in our RAID configuration and their life expectancy. The pdfs where this information is found are located at: | ||
+ | [http://nuclear.unh.edu/wiki/pdfs/raid/Seagate_ST31000526SV_Manual_102387.pdf Seagate ST31000526SV] | ||
+ | [http://nuclear.unh.edu/wiki/pdfs/raid/Seagate_ST3750640AS_100402371k.pdf Seagate ST3750640AS] | ||
+ | [http://nuclear.unh.edu/wiki/pdfs/raid/Seagate_ST3750640NS_100424667b.pdf Seagate ST3750640NS] | ||
+ | [http://nuclear.unh.edu/wiki/pdfs/raid/WDC_WD7500AAKS-00RBA0_2879-701277.pdf WDC WD7500AAKS] | ||
+ | === Pumpkin === | ||
+ | ST3750640NS | ||
+ | 8,760 power-on-hours per year. | ||
+ | 250 average motor start/stop cycles per year. | ||
+ | ST3750640AS | ||
+ | 2400 power-on-hours per year. | ||
+ | 10,000 average motor start/stop cycles per year. | ||
+ | WDC WD7500AAKS-00RBA0 | ||
+ | Start/stop cycles 50,000 | ||
+ | === Endeavour === | ||
+ | ST31000340NS | ||
+ | xxx | ||
+ | ST31000524AS | ||
+ | xxx | ||
+ | ST31000526SV | ||
+ | MTBF 1,000,000 hours | ||
+ | Start / Stop Cycles 50,000 | ||
+ | Non-Recoverable Errors 1 per 10^14 |
Latest revision as of 18:50, 24 May 2016
RAID Controllers
Hostname | RAID Controller | OS | User Manual | Web Interface Address |
---|---|---|---|---|
Taro.unh.edu | Areca Technology Corp. ARC-1231 12-Port PCI-Express | RHEL 5 | Areca ARC-1xxx | http://10.0.0.97/ |
Pumpkin.unh.edu | Areca Technology Corp. ARC-1280ML 24-Port PCI-Express | RHEL 6 | Areca ARC-1xxx | http://10.0.0.99/ |
Gourd.unh.edu | Areca Technology Corp. ARC-1680 8-port PCIe SAS RAID Adpater | RHEL 5 | Areca ARC-168x | http://10.0.0.152 |
Endeavour.unh.edu | Areca Technology Corp. ARC-1280 24-Port PCI-Express | RHEL 5 | http://10.0.0.199 | |
Tomato.unh.edu | 3ware Inc 9000-series | RHEL 3.4 | - | - |
Old Gourd | Promise Technology, Inc. PDC20378 (FastTrak 378/SATA 378) | RHEL 3.4 | - | - |
? | 3ware Inc 7xxx/8xxx-series PATA/SATA-RAID | - | - | - |
Pepper.unh.edu | Silicon Image, Inc. SiI 3114 | RHEL 3.4 | - | - |
Old Einstein | Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X | RHEL 5.3 | - | - |
Step to move Mail, Home, KVM RAID
- See Move Mail RAID
New RAID cards: ARECA
The Areca cards in Pumpkin, Taro and Gourd (the new Einstein hardware) can all be accessed with a web browser. The interfaces are all on the backend network:
You log in as "admin" with the standard root password missing the prefix part.
Areca CLI
The Areca cards can be accessed through the command line as root. The command areca_cli64 is installed on all machines with an Areca card. The Areca CLI Manual explains in full detail how to use the CLI interface. Here are a few commands that are quick and easy.
- areca_cli64 hw info - To monitor the Areca hardware
- areca_cli64 disk info - To view status of all the drives
- areca_cli64 disk smart drv=<drive number> - To view the smart data on that drive
- areca_cli64 disk sttest drv=<drive number> - To run a smart self-test on a specified drive
E-mail Alerts
All Areca cards should be configured to send out e-mail alerts about failed drives or other problems. Here's what you need to do to configure this feature:
- Login to the Areca web interface and select System Controls -> Alert By Mail Config
- Enter the FARM IP address for Einstein in the SMTP Server IP Address field.
- You need to enter an e-mail account and password for the Areca card to use for sending e-mail. Currently I (Adam) have them configured to use my account, but I plan to create an e-mail-only user account for this purpose.
- Enter a name in the MailTo Name field, accompanied by the address for that person in the Mail Address field.
- Set the Event notification configuration to Urgent Error Notification. This setting will limit e-mail alerts to only the most serious of problems. For some reason a user logging into the web interface is considered a "Serious Error", and using that setting will result in e-mail alerts every time someone logs into the web interface, which is annoying.
Working with Areca RAID devices
These are my notes from testing out the Areca card in Gourd, and should serve as brief howtos for various features of the RAID cards.
Pass-through drives
Pass-through drives are not controlled by the RAID card. They function as an independent scsi device plugged directly into the system, and are not part of a RAID set. [Gourd] currently has two pass-thru disks set up in a software RAID. These are the steps I took to add a new pass-through device on [Gourd]:
1. Insert the drive into the drive bay
2. The drive should now be visible in the Areca web interface. Expand the "Physical Drives" folder in the left column and then select "Create a Pass-Through disk".
3. Select the disk you want to create as a pass-through disk and then check the confirmation box before clicking the Submit button.
RAID Sets and Volumes
Setting up a hardware RAID on an Areca card is a two step process. First you have to add the drives you want to use to a RAID set. You need to make sure you add the correct number of drives for the RAID level you want to use, but beyond that all you're doing at this point is making a group of drives you can create a volume on. You'll choose the RAID level when you create the volume. To create a RAID Set:
- Click "Create a RAID Set" under RAID Set Functions
- Select the drives you want to add to the RAID Set. You can only add Free drives to a new RAID set.
- Confirm the operation and click Submit
You now have a RAID set, but in order to use it as a drive you need to create a Volume set. This is where you will select the level of RAID to use. The Areca card will only give you the options for the RAID levels that are possible on a given RAID set, so for example if you only have two drives you can't create a RAID 5 or 6, which requires at least 3 or 4 drives, respectively. The Areca cards support volumes of different RAID levels on the same set of drives. The important settings to bear in mind are the Volume name, which you should set to something more useful than "Volume Set #00004", the RAID level, and the SCSI Channel/ID/LUN. Unless you have a reason to change them you can accept the defaults on everything else. To create a Volume Set:
- Click "Create a Volume Set"
- Select the RAID set you want to create your Volume on and click Submit
- Choose the settings you want to apply to the Volume Set, confirm the operation and click Submit
If you select background initialization you will be able to access the Volume immediately, but if you choose foreground initialization you have to wait for initialization to finish before using the volume. Depending on the size of the volume it may take a long time to initialize (two hours for a 500gb RAID volume, in my experience), so you might want to go and make yourself a cup of coffee.
Adding/removing scsi devices on the fly
When you create a RAID volume on the Areca card the system won't see it until you reboot. There is a way to add and remove scsi devices without rebooting by using the proc interface.
You should have the values for Channel, SCSI ID and LUN from when you added the drive, but if not they're listed next to the volume name on the Areca web management interface. Note that the following command takes four values, but you only get three from the Areca card. The values from the Areca card should be the last three, and the first value should be zero. Use this command as root (sudo won't work) to add a scsi device:
root@gourd# echo "scsi add-single-device 0 0 0 1" > /proc/scsi/scsi
Once you've run this command your volume should appear in /dev/sd* with the other scsi devices. To remove this device you would use a similar command:
root@gourd# echo "scsi remove-single-device 0 0 0 1" > /proc/scsi/scsi
Possible rescue techniques
DO NOT ATTEMPT THIS WITHOUT TOP SUPERVISOR PRESENT
Really, this shouldn't even be here. Contact Maurik immediately. Don't even think about hosing the RAID drive.
Alert by Mail
The Areca cards in Gourd, Taro, and Pumpkin are configured to send e-mail alerts of any urgent errors or events. There are several categories of errors; urgent, serious and warning. It isn't clear how these errors are categorized, but I did discover that logging into the web interface is considered a "Serious" event. In order to keep our mailing list from filling up with a bunch of notifications from every time someone logged into a RAID card's web interface I've set the cards to only send out notifications on "Urgent" events.
In order to be able to send e-mail the RAID cards need to have a login and password to the mail server. I've given them my account info for the time being, but I intend to create a special account just for the Areca cards to be able to send mail.
Software RAID
RAID volumes in Linux are created using partitions, so the first step in creating a software RAID is creating the partitions you want to add to the RAID. These partitions must all be the same size. Once you have the partitions created you can use the mdadm tool to create a new RAID device. The command should look something like this:
mdadm --create /dev/mdX --level=<num> --raid-devices=<num> <partition list>
Where mdX is the number of the RAID device you want to create, level is the RAID level you wish to use, raid devices specify how many partitions will be added to the RAID array, and the remainder of the arguments is the list of partitions you want to use for the RAID array. In order to create a RAID 1 array you would use something like the following example:
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda2 /dev/sdb2
The file /proc/mdstat will display the status of the RAID device initialization. Depending on the size of the array it may take a while to finish initializing. You still should be able to format, mount and use the RAID device before initialization has finished, though the performance may not be optimal.
To fail a drive in a RAID array, use the following command:
mdadm /dev/mdX -f <failed device>
To add a new device to a RAID array, use this command:
madam /dev/mdX -a <new device>
Rebuilding a Software RAID
Should /proc/mdstat show that a problem exists with one of the RAID arrays, you can rebuild it by performing the following steps (5.9.9.2. Rebuilding a RAID array):
- Remove the disk from the raid array.
- mdadm --manage /dev/md0 -r /dev/sdc3
- Remove the disk from the system.
- Using fdisk, replace the removed disk and re-format the replacement disk.
- Add the new disk back to the RAID array.
- Note: In order to get the system (Gourd) to recognize the new drive you must go into the Areca card and delete the newly added drive as a hot spare and then add the drive as a passthrough under the menu Physical Drive.
- Note: To make sure that the partitions match precisely, run fdisk on the working drive in the RAID array and use the cylinder numbers to create the partitions on the new drive just added.
- mdadm --manage /dev/md0 -a /dev/sdc3
- To restore the disk, perform a "software fail" the previous spare slice:
- mdadm --manage --set-faulty /dev/md0 /dev/sdc3
- The system will now attempt to rebuild the array on the replaced disk. Use the following command to monitor status:
- watch -n 1 cat /proc/mdstat
- When the array is finished rebuilding, remove and then re-add the software-failed disk back to the array.
- mdadm --manage /dev/md0 -r /dev/sdc3
- mdadm --manage /dev/md0 -a /dev/sdc3
- Check the array.
- mdadm --detail /dev/md0
- Remove the disk from the raid array.
Old RAID cards
The documentation for the controller should be available in /usr/local/doc/3dm2. There should be a deamon running, start it with "/etc/init.d/3dm2 start" With the deamon running, the device can be checked and manipulated using a web browser on the local machine pointing to http://localhost:888/. Log in as administrator with the root password.
More recently (2006) we no longer run this deamon, instead the RAIDS can be querried and controlled with tw_cli, in /usr/local/bin. Type tw_cli help for help. You must be root to run this program.
Examples (must be root):
tw_cli help info tw_cli info c0 # info for card 0 tw_cli info c0 u0 # info for unit 0, Tells you it is RAID-5, Status OK, size, Stripe size tw_cli info c0 p0 # info on disk0 on card0, size, serial number. tw_cli info c0 p0 model # model number of disk (Maxtor 6B200S0)
Be totally wicked careful with any of the other commands PLEASE
TOMATO:
Contains a RAID with 12 Maxtor Diamond Max 10 drives (300GB, model number: 6B300S0). Data Sheet
Size: 300 GB Spin: 7200 RPM Buffer: 16 MB Seek: <9 ms Latency: 4.17 ms] Current/Power - not specified. 1.2 Amp/ 15 Watt is a good guess
OLD GOURD:
Contains a RAID with 8 Maxtor drives (251GB, Model number 7Y250M0)
RAID Drives
Ch 1
ST3750640NS |
Ch 2
ST3750640NS |
Ch 3
ST3750640NS |
Ch 4
ST3750640NS |
Ch 5
ST3750640NS |
Ch 6 (HS)(02/20/12)
ST3750640AS |
Ch 7
ST3750640NS |
Ch 8
ST3750640NS |
Ch 9
ST3750640NS |
Ch 10
ST3750640NS |
Ch 11 (02/17/12)
ST3750640AS |
Ch 12
ST3750640NS |
Ch 13
ST3750640NS |
Ch 14
ST3750640NS |
Ch 15
ST3750640NS |
Ch 16
ST3750640NS |
Ch 17
ST3750640NS |
Ch 18
ST3750640NS |
Ch 19
ST3750640NS |
Ch 20
ST3750640NS |
Ch 21
ST3750640NS |
Ch 22
ST3750640NS |
Ch 23
WD7500AAKS-00RBA0 |
Ch 24 (HS)
WD7500AAKS-00RBA0 |
Ch 1 (12/19/11)
ST31000340NS |
Ch 2
ST31000340NS |
Ch 3
ST31000340NS |
Ch 4 (11/12/13)
ST31000524AS |
Ch 5
ST31000528AS |
Ch 6
ST31000340NS |
Ch 7
ST31000340NS |
Ch 8
ST31000340NS |
Ch 9 (03/24/12)
ST31000524AS |
Ch 10 (07/22/11)
ST31000526SV |
Ch 11
ST31000340NS |
Ch 12
ST31000340NS |
Ch 13
ST31000340NS |
Ch 14
ST31000340NS |
Ch 15
ST31000340NS |
Ch 16
ST31000340NS |
Ch 17
ST31000340NS |
Ch 18
ST31000340NS |
Ch 19
ST31000340NS |
Ch 20
ST31000340NS |
Ch 21
ST31000340NS |
Ch 22
ST31000340NS |
Ch 23
ST31000340NS |
Ch 24
ST31000340NS |
Ch 1
WD7500AAKS-00RBA0 |
Ch 2
WD7500AAKS-00RBA0 |
Ch 3
WD7500AAKS-00RBA0 |
Ch 4
WD7500AAKS-00RBA0 (07/01/13) |
Ch 5
N.A. |
Ch 6
N.A. |
Ch 7
N.A. |
Ch 8
WD7500AAKS-00RBA0 |
Ch 1 (HS)()
Empty |
Ch 2
WD1002FBYS-01A6B0 |
Ch 3
WD1002FBYS-01A6B0 |
Ch 4
WD1002FBYS-01A6B0 |
Ch 5
WD1002FBYS-01A6B0 |
Ch 6
WD1002FBYS-01A6B0 |
Ch 7 (09/12/11)
ST31000340NS |
Ch 8 (HS)
WD1002FBYS-01A6B0 |
Drive 1 | Drive 2 | Drive 3 | Drive 4 |
Drive 5 | Drive 6 | Drive 7 | Drive 8 |
Drive Life Expectancy
This is a list of drives we have in our RAID configuration and their life expectancy. The pdfs where this information is found are located at:
Seagate ST31000526SV Seagate ST3750640AS Seagate ST3750640NS WDC WD7500AAKS
Pumpkin
ST3750640NS
8,760 power-on-hours per year. 250 average motor start/stop cycles per year.
ST3750640AS
2400 power-on-hours per year. 10,000 average motor start/stop cycles per year.
WDC WD7500AAKS-00RBA0
Start/stop cycles 50,000
Endeavour
ST31000340NS
xxx
ST31000524AS
xxx
ST31000526SV
MTBF 1,000,000 hours Start / Stop Cycles 50,000 Non-Recoverable Errors 1 per 10^14