Posts Tagged ‘runtime’

Advanced notes on Unified Parallel C installation

October 8, 2012

I already described basic Berkeley UPC compiler installation here. So now lets go deeper in details.

Backend Compilers

Basically UPC compiler is a translator from UPC language to C. After translation is done, backend C compiler is invoked to actually compile the code. On Linux clusters GCC is used by default, if you have Intel, Sun or any other high performance compiler installed, then use CC and CXX flags in UPC runtime configure step:

./configure CC=icc CXX=icpc --prefix=/opt/bupc-runtime-2.12.1-icc
./configure CC=suncc CXX=sunCC --prefix=/opt/bupc-runtime-2.10.0-suncc

Optional UPC builds

By default Berkeley UPC is installed in two configurations: debug (with GASnet assertions enabled and debugging info compiled in) and opt (optimized version for everyday use). You will see debug and opt subdirectories in your working UPC runtime build. But you can install additional versions of runtime for other uses.

Berkeley UPC has integrated tracing facility. If you upcrun application with the -trace flag, tracing data is collected and you can analyze it with upc_trace utility. Tracing build can be compiled by using opt_trace multiconf option:

./configure --prefix=/opt/bupc-runtime-2.12.1 --with-multiconf=+opt_trace

Berkeley UPC has integrated callbacks (called GASP) for third-party instrumenting utilities. Instrumentation allows developers of performance analysis tools to gather all sorts of information about UPC program execution. Like functions called, their arguments, etc. If you want to develop your own UPC performance analysis tool you can use this feature during development and instruct users to build opt_trace version of UPC to be able to use your tool later.

./configure --prefix=/opt/bupc-runtime-2.12.1 --with-multiconf=+opt_inst

You can debug UPC applications with dbg build, if you are a developer and use instrumented build of UPC and need to debug it, then build a dbg_inst version. There was a dbg_inst.patch (find link below) to add dbg_inst functionality to UPC, but it’s already integrated into compiler as far as I remember.

./configure --prefix=/opt/bupc-runtime-2.12.1 --with-multiconf=+dbg_inst

There was also another bug which broke dbg_inst in 2.12.1 (which was originally implemented in 2.10.0) with the following errors:

/root/install/berkeley_upc-2.12.1/gasnet/gasnet_trace.c: In function ‘gasneti_trace_finish’:

/root/install/berkeley_upc-2.12.1/gasnet/gasnet_trace.c:988: error: ‘gasneti_mallocreport_filename’ undeclared (first use in this function)

/root/install/berkeley_upc-2.12.1/gasnet/gasnet_trace.c:988: error: (Each undeclared identifier is reported only once

/root/install/berkeley_upc-2.12.1/gasnet/gasnet_trace.c:988: error: for each function it appears in.)

To resolve this issue apply mallocreport.patch00 (find link below). But if you use recent Berkeley UPC build you won’t see this bug.

Block size

If you work with huge matrixes and want to distribute them in large chunks of consecutive rows, then you will run into UPC limitation of block size. UPC pack pointer representation into one 64 bit integer. By default 34 bits are allocated for memory address, 10 bits for threads and 20 bits for phase (or block size). 2^20 is basically a 1048576 elements which is a very small number. You can redistribute bits with --with-sptr-packed-bits=value value=’phase,thread,addr’ configure option, but then you will either have small address space or small number of threads.

Another option is to use --enable-sptr-struct configure flag which changes shared pointer representation from int to struct. It will increase block size to 2^(32-1) which is 2147483647. But it could also be too small if you conduct performance measurement and need to run your code for 1 thread. Then the whole matrix is a one huge block. 50000×50000 matrix is already hit the limit.

If 2^(32-1) is not enough, then the last option for you is to use row distributed algorithm instead of row-block distributed.

POSIX shared memory problems with InfiniBand

UPC support two one-node inter-thread shared memory communication types: POSIX shared memory and SYSV shared memory. POSIX is configured by default. If you want to register large amounts of shared memory with many PSHM processes using --shared-heap key you can see errors like these:

*** FATAL ERROR: Unexpected error Bad address (rc=1 errno=14) when registering the segment

NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the environment to generate a backtrace.

*** Caught a fatal signal: SIGABRT(6) on node 29/32

To solve this problem reinstall runtime using the following options:

./configure --prefix=/opt/bupc-runtime-2.12.1 --enable-pshm --disable-pshm-posix --enable-pshm-sysv

Bug when building translator

For some vendor-build GCC releases, like Red Hat, older versions of translator fail to compile with error like:

/usr/bin/ld: ipl_summarize_util.o: relocation R_X86_64_PC32 against `Phi_To_Idx_Map’ can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
collect2: ld returned 1 exit status

It’s a bug number 2202 in UPC Bugzilla and is described here. Solution and patch are described in post 17. Find copy of patch below.

UPC I/O support for large files

UPC have parallel I/O extension. In version 2.14.0 and earlier by default UPC I/O supported files 2GB in length. It led to upc_all_fread_shared() returning -1 “Invalid argument” for data above the 2GB limit. To change defaults from 2^(32-1) bits size to 2^(64-1) use BUPC_IO_64 variable during runtime configure step:

./configure CC=”gcc -DBUPC_IO_64″ CXX=”g++ -DBUPC_IO_64″ --prefix=/opt/bupc-runtime-2.12.1

Replace GCC with your own compiler.

SUN compiler issues

If you run into an error (I had it in version 2.10.0):

“/home/fred/install/berkeley_upc-2.10.0/upcr_profile.c”, line 36: left operand must be modifiable lvalue: op “=”
cc: acomp failed for /home/fred/install/berkeley_upc-2.10.0/upcr_globfiles.c

Apply patch sun_const_field.patch00 (find link below).  Additional info can be found in Berkeley UPC Bugzilla, bug number 2696.

Another bug (not an error, but an annoying warning) looks like numerous warnings throughout compilation:

“/home/fred/install/berkeley_upc-2.10.0/upcr_atomic.h”, line 876: warning: result of paste undefined and not portable: 64_ (E_PASTE_RESULT_NOT_TOKEN)
“/home/fred/install/berkeley_upc-2.10.0/upcr_atomic.h”, line 876: warning: result of paste undefined and not portable: 64_cswap (E_PASTE_RESULT_NOT_TOKEN)

To get rid of it apply patch not_token.patch00 (find link below). It’s described in the same 2696.

Links to patches

Unfortunately WordPress doesn’t allow to upload .txt files due to security reasons. Other formats, such as .doc or .pdf will break the lines. So I decided to give direct links when possible and provide contents of patch in text converted to .jpg format in case direct link will break in future. The drawback is that you will have to type it yourself or OCR it.

Advertisements

Basic UPC compiler installation

October 8, 2012

There were times when I used to work heavily on one UPC-related project. I had several issues with installation of the Berkeley UPC compiler. I don’t want that information to be wasted, so I will share it here with everyone in several posts. I worked with Berkeley UPC versions until 2.14.0. So this post can already be obsolete for you.

Compilation

Berkeley UPC compiler consists of a runtime and a translator (you can use online translator if you want). They are installed separately. I used several flags in configure stage I’d like to explain.

First flag is --without-mpi-cc. UPC supports several underlying transports to exchange messages between threads. The most basic is udp, I worked primarily on ibv (InfiniBand). UPC also installs mpi transport by default. It’s slow and it requires MPI installation, so I never used it and prefered to disable it.

Flag --disable-aligned-segments is ususally a must in Linux environments. There is a security feature which randomizes the virtual address space. This doesn’t allow UPC threads to use the same base memory address on all nodes. It introduces some additional pointer arithmetic in the deference of a UPC pointer-to-shared. So you either disable Linux virtual address space randomization feature or use this flag.

It is stated that UPC can have issues with GCC 4.0.x through 4.2.x as a backened compiler. GCC can misoptimize a shared-local access such that it deterministically read or write an incorrect value. So you cannot install UPC without using the --enable-allow-gcc4 flag. I didn’t have any issues with GCC ever, so you can safely use it.

Post-installation tasks

After installation is completed you need to point UPC runtime to your locally installed translator. Otherwise it will try to use online translator on the Berkeley web-site. Under each UPC build subdirectory (opt, dbg, etc) replace translator directive in etc/upcc.conf to:

translator = /opt/translator-installation-dir/targ

You need to correctly configure NFS and SHH on your nodes, so that they could access and run your application binary files without password. If you use firewall you need to open all necessary ports. For me they were:

111 tcp, udp for portmapper
2049 tcp for nfs
892 tcp, udp for mountd
32803 tcp, 32769 udp for lockd
662 tcp,udp for statd

Since lockd uses dynamic ports, uncomment static port configuration in /etc/sysconfig/nfs:

LOCKD_TCPPORT=32803
LOCKD_UDPPORT=32769
MOUNTD_PORT=892
STATD_PORT=662

SSH is also just a walk in the park:

# su – fred
> ssh-keygen -t rsa
> cp /home/fred/.ssh/id_rsa.pub /home/fred/.ssh/authorized_keys
> chmod /home/fred/.ssh/authorized_keys 600
> chown fred:fred /home/fred/.ssh/authorized_keys

Usage example

> upcc --network=udp source_code.c
> UPC_NODES=”node1 node2 node3 node4″ upcrun -n 32 bin_file

You choose conduit by using --network flag, UPC_NODES environment variable sets hosts which will run the code and -n sets the number of threads.

Possible problems

You can encounter the following error when you run UPC application:

*** FATAL ERROR: Got an xSocket while spawning slave process: connect() failed while creating a connect socket (111:Connection refused)
bash: line 1: 10535 Aborted ‘./a.out’ ‘__AMUDP_SLAVE_PROCESS__’ ‘node1:49655’

This could happen if you use firewall and didn’t uncomment static port configuration for lockd daemon. Each time it uses random port which doesn’t match to what you entered in firewall configuration and fail to communicate.

If you get an error which starts with:

Address node1_ip_address maps to node1, but this does not map back to the address – POSSIBLE BREAK-IN ATTEMPT!
AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with requested resource)
from function sendPacket
at /root/install/berkeley_upc-2.8.0/gasnet/other/amudp/amudp_reqrep.cpp:99
reason: Invalid argument

or

AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with requested resource)
from function sendPacket
at /root/install/berkeley_upc-2.8.0/gasnet/other/amudp/amudp_reqrep.cpp:99
reason: Invalid argument

then you have /etc/hosts misconfiguration. Don’t add compute node hostname to 127.0.0.1 line in /etc/hosts. There should be only real address line. /etc/hosts on each node should look something like this:

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
10.0.0.1 node1
10.0.0.2 node2
10.0.0.3 node3