Posts Tagged ‘upcrun’

Basic UPC compiler installation

October 8, 2012

There were times when I used to work heavily on one UPC-related project. I had several issues with installation of the Berkeley UPC compiler. I don’t want that information to be wasted, so I will share it here with everyone in several posts. I worked with Berkeley UPC versions until 2.14.0. So this post can already be obsolete for you.

Compilation

Berkeley UPC compiler consists of a runtime and a translator (you can use online translator if you want). They are installed separately. I used several flags in configure stage I’d like to explain.

First flag is --without-mpi-cc. UPC supports several underlying transports to exchange messages between threads. The most basic is udp, I worked primarily on ibv (InfiniBand). UPC also installs mpi transport by default. It’s slow and it requires MPI installation, so I never used it and prefered to disable it.

Flag --disable-aligned-segments is ususally a must in Linux environments. There is a security feature which randomizes the virtual address space. This doesn’t allow UPC threads to use the same base memory address on all nodes. It introduces some additional pointer arithmetic in the deference of a UPC pointer-to-shared. So you either disable Linux virtual address space randomization feature or use this flag.

It is stated that UPC can have issues with GCC 4.0.x through 4.2.x as a backened compiler. GCC can misoptimize a shared-local access such that it deterministically read or write an incorrect value. So you cannot install UPC without using the --enable-allow-gcc4 flag. I didn’t have any issues with GCC ever, so you can safely use it.

Post-installation tasks

After installation is completed you need to point UPC runtime to your locally installed translator. Otherwise it will try to use online translator on the Berkeley web-site. Under each UPC build subdirectory (opt, dbg, etc) replace translator directive in etc/upcc.conf to:

translator = /opt/translator-installation-dir/targ

You need to correctly configure NFS and SHH on your nodes, so that they could access and run your application binary files without password. If you use firewall you need to open all necessary ports. For me they were:

111 tcp, udp for portmapper
2049 tcp for nfs
892 tcp, udp for mountd
32803 tcp, 32769 udp for lockd
662 tcp,udp for statd

Since lockd uses dynamic ports, uncomment static port configuration in /etc/sysconfig/nfs:

LOCKD_TCPPORT=32803
LOCKD_UDPPORT=32769
MOUNTD_PORT=892
STATD_PORT=662

SSH is also just a walk in the park:

# su – fred
> ssh-keygen -t rsa
> cp /home/fred/.ssh/id_rsa.pub /home/fred/.ssh/authorized_keys
> chmod /home/fred/.ssh/authorized_keys 600
> chown fred:fred /home/fred/.ssh/authorized_keys

Usage example

> upcc --network=udp source_code.c
> UPC_NODES=”node1 node2 node3 node4″ upcrun -n 32 bin_file

You choose conduit by using --network flag, UPC_NODES environment variable sets hosts which will run the code and -n sets the number of threads.

Possible problems

You can encounter the following error when you run UPC application:

*** FATAL ERROR: Got an xSocket while spawning slave process: connect() failed while creating a connect socket (111:Connection refused)
bash: line 1: 10535 Aborted ‘./a.out’ ‘__AMUDP_SLAVE_PROCESS__’ ‘node1:49655’

This could happen if you use firewall and didn’t uncomment static port configuration for lockd daemon. Each time it uses random port which doesn’t match to what you entered in firewall configuration and fail to communicate.

If you get an error which starts with:

Address node1_ip_address maps to node1, but this does not map back to the address – POSSIBLE BREAK-IN ATTEMPT!
AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with requested resource)
from function sendPacket
at /root/install/berkeley_upc-2.8.0/gasnet/other/amudp/amudp_reqrep.cpp:99
reason: Invalid argument

or

AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with requested resource)
from function sendPacket
at /root/install/berkeley_upc-2.8.0/gasnet/other/amudp/amudp_reqrep.cpp:99
reason: Invalid argument

then you have /etc/hosts misconfiguration. Don’t add compute node hostname to 127.0.0.1 line in /etc/hosts. There should be only real address line. /etc/hosts on each node should look something like this:

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
10.0.0.1 node1
10.0.0.2 node2
10.0.0.3 node3