Difference between revisions of "Endeavour"

From Nuclear Physics Group Documentation Pages
Jump to navigationJump to search
Line 9: Line 9:
 
This section explains some of the special use for this system.
 
This section explains some of the special use for this system.
  
== OpenPBS = Portable Batch System ==
+
== OpenPBS = Torque = Portable Batch System ==
  
 
[http://en.wikipedia.org/wiki/Portable_Batch_System PBS] is a system for scheduling compute jobs onto nodes, aka "workload management software", that was first created by NASA in the '90s. We ran this early version on our farm back then. It is very sophisticated and thus not so trivial to configure. Some things are already setup.
 
[http://en.wikipedia.org/wiki/Portable_Batch_System PBS] is a system for scheduling compute jobs onto nodes, aka "workload management software", that was first created by NASA in the '90s. We ran this early version on our farm back then. It is very sophisticated and thus not so trivial to configure. Some things are already setup.
  
The company supporting the open source version is [http://www.pbsgridworks.com/Default.aspx PBS Gridworks] which seems to be a devision of "Altair".<br>
+
The company supporting the old open source version is [http://www.pbsgridworks.com/Default.aspx PBS Gridworks] which seems to be a devision of "Altair". They haven't touched their free open version since 2001.<br>
There are no manuals for '''OpenPBS''', only for '''PBS Pro'''. To get to them, you need to create a username/password at the [https://secure.altair.com/UserArea/ PBS Pro User Area] you can then get to the [https://secure.altair.com/UserArea/docs.php Documentation]. Do not expect a one to one correspondence between the OpenPBS and PBSPro versions (like, you don't need a FLEX license for the open one.)
+
There are no manuals for '''OpenPBS''' from Altair, only for '''PBS Pro'''. To get to them, you need to create a username/password at the [https://secure.altair.com/UserArea/ PBS Pro User Area] you can then get to the [https://secure.altair.com/UserArea/docs.php Documentation]. Do not expect a one to one correspondence between the OpenPBS and PBSPro versions (like, you don't need a FLEX license for the open one.)
 +
 
 +
The newer development in OpenPBS is renamed '''Torque''', which is what is installed on our systems. See [http://www.clusterresources.com/ Cluster Resources] and go to [http://www.clusterresources.com/pages/products/torque-resource-manager.php Torque Resource Manager]. This includes documentation.
  
 
=== Commands ===
 
=== Commands ===

Revision as of 17:12, 23 April 2009

Endeavour

Here are the notes on this system.
Notes on configuration status/changes and ToDo is at the bottom.


System Usage

This section explains some of the special use for this system.

OpenPBS = Torque = Portable Batch System

PBS is a system for scheduling compute jobs onto nodes, aka "workload management software", that was first created by NASA in the '90s. We ran this early version on our farm back then. It is very sophisticated and thus not so trivial to configure. Some things are already setup.

The company supporting the old open source version is PBS Gridworks which seems to be a devision of "Altair". They haven't touched their free open version since 2001.
There are no manuals for OpenPBS from Altair, only for PBS Pro. To get to them, you need to create a username/password at the PBS Pro User Area you can then get to the Documentation. Do not expect a one to one correspondence between the OpenPBS and PBSPro versions (like, you don't need a FLEX license for the open one.)

The newer development in OpenPBS is renamed Torque, which is what is installed on our systems. See Cluster Resources and go to Torque Resource Manager. This includes documentation.

Commands

pbsnodes
This gives a quick overview of all the known nodes and whether they are up. If they are what the status is.

Initial setup

  • Set the UNH IP address (endeavour.unh.edu) on eth1
  • I switched the IP address on eth0 to 10.0.0.100 from 10.0.0.1 (since that is the usual gateway address, and we want to bridge the two backend networks.)
    • This requires ALL "hosts" files on the nodes to be modified
    • Also, the /root/.shosts /root/.rhosts and /etc/ssh/ssh_known_hosts files need to be copied from node2 to node*
  • Set the root password to standard scheme.
  • Setup the LDAP client side


TO DO

  1. Lots.
  2. Integrate back-end link with our farm.

Long Term To Do

Possible long term tasks if manpower is available.

  1. Replicate LDAP server.
  2. Replicate home directories for selected users (this may be too tricky, really)