Archive for the ‘HPC’ Category

OSCAR uninstall

February 3, 2011

Recently I needed to install CentOS on several new compute nodes. Since I had old version of OSCAR already installed, firstly I had to get rid of it. Effectively it means removing following directories:

  • /opt/oscar
  • /tftpboot
  • /usr/lib/perl5/vendor_perl/5.8.8/OSCAR
  • /usr/lib/perl5/site_perl/OSCAR
  • /usr/share/oscar
  • /etc/profile.d/ and /etc/profile.d/oscar_home.rsh
  • /etc/oscar

Benchmarking InfiniBand

February 2, 2011

As I’ve already mentioned in my previous post called “Activating InfiniBand stack in Linux” there is a perftest package which has simple tests for benchmarking IB bandwidth and latency. Here go my results for default ib_write_bw and ib_write_lat tests. Write, read and send tests results don’t differ much, that’s why I’m posting only write results.

                    RDMA_Write BW Test
Number of qp's running 1
Connection type : RC
Each Qp will post up to 100 messages each time
Inline data is used up to 0 bytes message
  local address:  LID 0x04, QPN 0x18004a, PSN 0xcf8a2e
RKey 0x2c042529 VAddr 0x002af439bf2000
  remote address: LID 0x01, QPN 0x12004a, PSN 0xb446fe,
RKey 0x440428db VAddr 0x002b46ea9b5000
Mtu : 2048
 #bytes #iterations    BW peak[MB/sec]    BW average[MB/sec]
  65536        5000            1350.34               1350.27

                    RDMA_Write Latency Test
Inline data is used up to 400 bytes message
Connection type : RC
   local address: LID 0x04 QPN 0x16004a PSN 0x5d05e8
RKey 0x2a042529 VAddr 0x00000017f88002
  remote address: LID 0x01 QPN 0x10004a PSN 0xb8cade
RKey 0x420428db VAddr 0x00000000ae2002
Mtu : 2048
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]
      2        1000           1.16           6.93             1.22

Activating InfiniBand stack in Linux

February 2, 2011

I did this in CentOS 5. So some steps can differ if you work with another flavor of Linux.

Major implementation of InfiniBand stack is called OFED. It’s a collaborative development of several vendors to standardize APIs. Allience HQ site is You can take vanilla OFED version there. Vendors also release their own tweaked versions of original OFED, like Mellanox OFED.

OFED stack provides several interfaces for underlying hardware:

  1. First and simplest way of working with InfiniBand is IPoIB. In this case IP stack is put above IB. You don’t need to rewrite your applications while you can utilize high throughput. On the other hand you will kill IB low latencies and won’t be able to utilize whole IB throughput capabilities.
  2. Second way is Sockets Direct Protocol (SDP) which is designed to utilize IB RDMA capabilities  and bypass TCP/IP stack. SDP can be used transparently w/o recompiling your application. It’s not that fast as native IB API but os better than IPoIB.
  3. Third and hardest way is using IB Verbs which is the lowest API, User Direct Access Programming Library (uDAPL) which is based on IB Verbs, Message Passing Interface (MPI) or Unified Parallel C. Different versions of MPI and UPC can be based on either IB Verbs or uDAPL. I personally work with MPI and UPC so I will describe their installation over InfiniBand.

To bring up your InfiniBand hardware you need to do the following things:

1. Install kernel level and user level HCA drivers.  Interesting thing about IB is that driver is split into two parts. Kernel part is usually already compiled as modules. You just need an openibd package which will run as a service upon startup and will load all necessary kernel modules. User level part you should install by your self. Since I’m working with Mellanox ConnectX MT25418 I had to install libmlx4 package.

2. Next thing is Subnet Manager. The InfiniBand subnet manager (OpenSM) assigns Local IDentifiers (LIDs) to each port connected to the InfiniBand fabric, and develops a routing table based off of the assigned LIDs. opensmd package performs Subnet Manager role. You need to set this service up on any one IB node and it will initiate IB fabric upon system startup.

3. Install IB API.  Depending on what software you going to use install appropriate API. I use MPI and UPC parallel applications compilers which use IB Verbs.  libibverbs package is responsible for that.

Additionally  you would probably like to use some diagnostic tools to check IB connectivity. Found them in ibutils and infiniband-diags RPMs:

  • ibv_devinfo, ibstat, ibstatus show device information;
  • sminfo, ibnodes, ibhosts, ibswitches, ibnetdiscover, ibchecknet show fabric information.

Use perftest package for IB performance testing. Simply run ib_write_bw in server mode on one node by just calling # ib_write_bw and run client on the other node # ib_write_bw node_name. Use ib_write_lat for latency testing.

You will probably also need libibcommon, libibumad and libibmad libraries which is needed for opensm, infinband-diags and ibutils to operate.

Links I find useful:

Ethernet vs. InfiniBand

January 18, 2010

Have you ever tried to compare scalability of Ethernet versus InfiniBand on HPC cluster? I was shocked.

It’s a solution of three-dimensional partial differential equation using the Fast Fourier transform. Comparison isn’t absolutely fair. Because I used gcc compiler for Ethernet version and Sun compiler for InfiniBand. Hence the difference between versions on small number of CPUs. But regardess of the fact that gcc is a bit faster Ethernet shows no scalability at all! It’s unbelievable how Ethernet is slow.

suncc with UPC NPB

January 18, 2010

When I was trying to compile NAS Parallel Benchmark for Unified Parallel C with SUN C compiler from Sun Ceres Studio IDE 9.0 Linux_i386 2009/03/06. I got dozens of errors like:

“/opt/bupc-runtime-2.10.0/opt/include/upcr_atomic.h”, line 782: warning: result of paste undefined and not portable: 32_ (E_PASTE_RESULT_NOT_TOKEN)
“/opt/bupc-runtime-2.10.0/opt/include/upcr_atomic.h”, line 782: warning: result of paste undefined and not portable: 32_fetchadd (E_PASTE_RESULT_NOT_TOKEN)

I have no idea where they come from but you can get patch from here.

Creating your own ‘modules’ environment

January 18, 2010

When you’re working with high performance computing cluster which has several compilers, parallel libraries, etc and has modules package for managing user environment it’s handy to make your own environment with all paths you need using this package.

If you’ll try to create simple shell script like:


module unload openmpi-gcc/1.3.2
module load openmpi-sun/1.3.2

It won’t work. It’ll just show you an error like this:

./ line 3: module: command not found
./ line 4: module: command not found

The reason why script is complaining hides in children shells environment inheritance. When you start any script from inside your console it inherits all environment variables from it’s parent. But it doesn’t inherit aliases and functions they are not part of the environment. As you no doubt have already guessed modules is a shell function. You see it in your login shell ’cause when you log in to the system it runs /etc/profile which in turn runs /etc/profile.d/*.sh scripts. modules function is in /etc/profile.d/

So you can resolve this problem by sourcing either /etc/profile or /usr/local/Modules/3.2.6/init/bash in the first line of your environment configuration script.