If you use ROCKS to deploy cluster you still need to do some manual configuration. What I personally did was:
1. Disable hyperthreading in BIOS. HT is not helpful for compute intensive tasks. It’s better suited for I/O intensive applications.
2. Add compute nodes to /opt/torque/server_priv/nodes in the following format:
compute-1-2 np=8
Where compute-1-2 is node’s domain name and np is the number of processors (cores).
3. Add master.local to /etc/hosts.equiv (for Torque).
4. Install InfiniBand stack. Most of IB RPMs are already installed but I also added:
libibumad, opensm-libs, opensm, ibutils, infiniband-diags, ibutils-libs, libibmad, libmlx4, swig
I just downloaded them by
# yumdownloader package-name
then put them into /share/apps and installed by
# rocks run host compute “rpm -Uvh /share/apps/*.rpm”
Then you will need to set openibd service to run upon startup on all compute nodes and opensmd service on any ONE of compute nodes.
5. Then you need to set up queues. There is a ‘default’ queue which is decent for simple setup. But I have nodes of two types and it’s handy to have separate queues. ‘qmgr’ is a binary which controls Torque queues. I made simple qmgr.in text file which contained:
create queue srail queue_type=execution
set queue srail started=true
set queue srail enabled=true
set queue srail resources_default.neednodes=srailcreate queue mrail queue_type=execution
set queue mrail started=true
set queue mrail enabled=true
set queue mrail resources_default.neednodes=mrail
Then I fed this file to qmgr:
# qmgr < qmgr.in
To check qmgr configuration call:
# qmgr -c ‘p s’
At last change node records in /opt/torque/server_priv/nodes from
compute-1-2 np=8
to
compute-1-2 np=8 mrail
6. ROCKS don’t support LVM. It installs everything on first hard drive it finds. If you have two hard drives just mount second drive to /state/partition1/home and add home partition to /etc/exportfs.
Leave a Reply