Skip to content

OpenMPI-4

Open MPI is a Message Passing Interface (MPI) library project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI).


Note

There are 3 ways to use IB in OpenMPI, (let compile with all, and use runtime setting to select)

- OpenIB is an very old Infiband implemented in OpenMPI. OpenIB is not maintained and will be remove in OpenMPI-5 [see this](https://github.com/open-mpi/ompi/issues/11755)
- UCX: newer OpenMPI uses UCX. But some apps may conflict with UCX (e.g., Gpaw)
- Libfabric: this may a reasonable choice now [libfabric](https://github.com/ofiwg/libfabric) to instead of OpenIB.
Note
  • Some applications require C++11, this is only supported on GCC 4.8 or newer, which is not always available on system, then newer GCC need to be installed before compiling Openmpi.
  • Make sure to build OpenMPI with 64-bit support. To check whether the currently available OpenMPI do support 64-bit or not, type this: `ompi_info -a | grep 'Fort integer size'. If the output is 8, then it supports 64-bit. If output is 4, then it just supports 32-bit.* configuration for 64-bit support:
  • For Intel compilers use: FFLAGS=-i8 FCFLAGS=-i8 CFLAGS=-m64 CXXFLAGS=-m64
  • For GNU compilers type: `FFLAGS="-m64 -fdefault-integer-8" FCFLAGS="-m64 -fdefault-integer-8" CFLAGS=-m64 CXXFLAGS=-m64'
  • must keep the source after compiling
  • consider to use UCX
  • consider compile your own PMIX.
  • consider using linker
  • lld linker:

    module load llvm/llvm-gcc10-lld                   # to use lld
    LDFLAGS="-fuse-ld=lld -lrt"
    

  • gold linker:

    module load tool_dev/binutils-2.32
    LDFLAGS="-fuse-ld=gold -lrt"
    

Possible errors

  • OpenMPI-4 use UCX by default (openMPI 4.0,3 → ucx-1.7 or older). Solution: compile your own UCX.
  • No components were able to be opened in the pml framework. PML ucx cannot be selected. This error may be due to no IB device, check it
ssh com054
ibv_devinfo
  • counter exceeded may be solved by compile openMPI with your own PMIX.

1. Download

See what new in openMPI-4

download OpenMPI-4

tar xvf openmpi-4.1.3rc1.tar.gz
cd openmpi-4.1.3rc1

2. Compiling OpenMPI + GCC

Need separated installations for: eagle, lion/leopard, cheetah, taycheon
Installation OPTIONS in README.txt or ./configure -h

  • Sun Grid: --with-sge
  • InfiniBand: --with-verbs
  • with KNEM: --with-knem=path
  • use UCX: --with-ucx=path
export myUCX=/uhome/p001cao/app/tool_dev/ucx-1.9
../configure...  --with-ucx=${myUCX}

USC1: (Cenntos 6.5)

- should use gold-linker to avoid compiling error
- UCX cause error: ib_md.c:329  UCX  ERROR ibv_reg_mr(address=0x145cb580, length=263504, access=0xf) failed: Resource temporarily unavailable. So dont use UCX on this server.
module load tool_dev/binutils-2.36                       # gold, should use to avoid link-error
module load compiler/gcc-11.2
export myKNEM=/uhome/p001cao/app/tool_dev/knem-1.1.4

InfiniBand cluster

cd openmpi-4.1.1
mkdir build_eagle && cd build_eagle

../configure CC=gcc CXX=g++ FC=gfortran F77=gfortran LDFLAGS="-fuse-ld=gold -lrt" \
--with-sge --without-ucx --with-verbs --with-knem=${myKNEM} \
--prefix=/uhome/p001cao/app/openmpi/4.1.1-gcc11.2-noUCX-eagle

no InfiniBand cluster

cd openmpi-4.1.1
mkdir build_lion && cd build_lion
../configure CC=gcc CXX=g++ FC=gfortran F77=gfortran LDFLAGS="-fuse-ld=gold -lrt" \
--with-sge --without-ucx --without-verbs --with-knem=${myKNEM} \
--prefix=/uhome/p001cao/app/openmpi/4.1.1-gcc11.2-noUCX-lion
make  -j 20         # not use -j to know what error
make install

CANlab: (Cenntos 5.8)

module load gcc/gcc-7.4.0

../configure CC=gcc CXX=g++ FC=gfortran F77=gfortran \
--with-sge --without-verbs --without-ucx  \
--prefix=/home/thang/app/openmpi/4.0.2-gcc7.4.0

CAN-GPU: (Ubuntu-18)

- install Cuda ussing GCC
- cuda-10 only support to gcc-8
- need binutils 2.22 or newer to link cuda

Install conda

  • CLI install Cuda
  • Download: wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_rhel6.run
  • Install (using Root acc)
  • disable the graphical target, to update Nvidia driver
systemctl isolate multi-user.target
modprobe -r nvidia-drm
module load compiler/gcc-7.4
sh cuda_10.2.89_440.33.01_rhel6.run --toolkitpath=/home/thang/app/cuda-10.2
  1. after install Cuda, start the graphical environment again
systemctl start graphical.target

compile OpenMPI

cd openmpi-4.1.1
mkdir build && cd build

module load compiler/gcc-7.4   # cuda-10 only support to gcc-8
module load binutils-2.35

../configure CC=gcc CXX=g++ FC=gfortran F77=gfortran \
--with-sge --without-ucx \
--with-cuda=/home/thang/app/cuda-10.2 \
--prefix=/home/thang/app/openmpi/4.1.1-gcc7.4-cuda

3. Compiling OpenMPI + Intel

USC1: (Cenntos 6.5)

InfiniBand cluster

cd openmpi-4.1.1
mkdir build_eagle && cd build_eagle
module load intel/compiler-xe19u5
module load compiler/gcc/9.1.0
# check: icpc -v
export PATH=/home1/p001cao/app/intel/xe19u5/compilers_and_libraries_2019.5.281/linux/bin/intel64:$PATH
export CC=icc  export CXX=icpc  export FORTRAN=ifort

../configure CC=icc CXX=icpc FC=ifort F77=ifort \
--with-sge --without-ucx --with-verbs --with-knem=${myKNEM} \
--prefix=/uhome/p001cao/app/openmpi/4.1.1-intelxe19u5-noUCX-eagle

USC2: (Cenntos 6.9)

# use linker lld (include in Intel-bin, require GLIBC >2.15)
module load compiler/gcc-10.1.0
module load intel/compiler-xe19u5       # lld
##
export PATH=/home1/p001cao/app/intel/xe19u5/compilers_and_libraries_2019.5.281/linux/bin/intel64:$PATH
export CC=icc  export CXX=icpc  export FORTRAN=ifort
export myUCX=/home1/p001cao/app/tool_dev/ucx-1.8-intel

../configure CC=icc CXX=icpc FC=ifort F77=ifort LDFLAGS="-fuse-ld=lld -lrt" \
--with-sge --without-verbs --with-ucx=${myUCX} \
--prefix=/home1/p001cao/app/openmpi/4.0.4-intelxe19u5

4. Make module file

at directory: /uhome/p001cao/local/share/lmodfiles/mpi→ create file "ompi4.1.1-gcc11.2-noUCX"

# for Tcl script use only
module load compiler/gcc-11.2
module load tool_dev/binutils-2.37

set     topdir          /uhome/p001cao/app/openmpi/4.1.1-gcc11.2-noUCX-eagle

prepend-path   PATH                $topdir/bin
prepend-path   LD_LIBRARY_PATH     $topdir/lib
prepend-path   INCLUDE             $topdir/include

prepend-path   PKG_CONFIG_PATH     $topdir/lib/pkgconfig          # this is required

Check:

module load ompi4.1.1-gcc11.2-noUCX
mpic++ -v

USC2(Cenntos 6.9)

Note

  • Now, compile with all IB options, and select them by runtime parameters. (not work, should exclude UCX)
  • How to build from source code see here
  • --with-verbs (default - auto detect)
  • ./autogen.pl is the same as ./autogen.sh
# cd /home1/p001cao/0SourceCode
# wget https://github.com/open-mpi/ompi/releases/tag/v4.1.4/ompi-4.1.4.tar.gz
# tar xvf openmpi-4.1.4.tar.gz
# cd openmpi-4.1.4
cd /home1/p001cao/0SourceCode
# wget https://github.com/open-mpi/ompi/releases/download/v4.1.5/ompi-4.1.5.tar.gz
# git clone -b v4.1.x https://github.com/open-mpi/ompi.git  ompi-4.1.x
cd ompi-4.1.x
git pull origin v4.1.x

module load tooldev/autoconf-2.72c
module load tooldev/automake-1.16.5
module load tooldev/libtool-2.4.7
export ACLOCAL_PATH=/home1/p001cao/app/tooldev/libtool-2.4.7/share/aclocal

./autogen.pl

Using LLVM

Note
  • To use clang libc++, use this link export CPPFLAGS="-nodefaultlibs -lc++ -lc++abi -lm -lc -lgcc_s -lgcc". But might not be used?
  • with FC=flang-new, To solve error: unknown argument: '-soname'see this
rm -rf build_llvm && mkdir build_llvm && cd build_llvm

module load compiler/llvm-17          # clang + lld
module load tooldev/ucx1.15-clang17

myLLVM=/home1/p001cao/app/compiler/llvm-17
export PATH=$myLLVM/bin:$PATH
export CC=clang CXX=clang++ FC=gfortran        # flang-new
export LDFLAGS="-fuse-ld=lld -lrt"
myUCX=/home1/p001cao/app/tooldev/ucx1.15-clang17
OFI=/home1/p001cao/app/tooldev/libfabric-1.19
KNEM=/home1/p001cao/app/tooldev/knem-1.1.4
myPREFIX=/home1/p001cao/app/mpi/openmpi4.1.x-clang17

../configure --with-sge --with-verbs --with-ucx=${myUCX} --with-knem=${KNEM} --with-ofi=${OFI} --prefix=${myPREFIX}

make  -j 16 && make install

Test:

mpicc ../examples/hello_c.c -o ../examples/hello_c.exe
mpirun -np 2 ../examples/hello_c.exe

module load mpi/ompi4.1.x-clang17
mpirun --version
ompi_info

Other options

export my_PMIX=/home1/p001cao/app/tool_dev/pmix-4.1.2
export my_libevent=/home1/p001cao/app/tool_dev/libevent-2.1.11       # require by PMIX
export my_hwloc=/home1/p001cao/app/tool_dev/hwloc-2.8.0

--with-pmix=${my_PMIX} --with-libevent=${my_libevent} --with-hwloc=${my_hwloc}

LLVM no UCX

rm -rf build_noUCX && mkdir build_noUCX && cd build_noUCX

module load compiler/llvm-17          # clang + lld

myLLVM=/home1/p001cao/app/compiler/llvm-17
export PATH=$myLLVM/bin:$PATH
export CC=clang CXX=clang++ FC=gfortran        # flang-new
export LDFLAGS="-fuse-ld=lld -lrt"
OFI=/home1/p001cao/app/tooldev/libfabric-1.19
KNEM=/home1/p001cao/app/tooldev/knem-1.1.4
myPREFIX=/home1/p001cao/app/mpi/openmpi4.1.x-clang17-noUCX

../configure --with-sge --with-verbs --without-ucx --with-knem=${KNEM} --with-ofi=${OFI} --prefix=${myPREFIX}

make  -j 16 && make install

GCC 11

cd /home1/p001cao/0SourceCode
cd ompi-4.1.x
rm -rf build_ase && mkdir build_ase && cd build_ase

module load compiler/gcc-11
myGCC=/home1/p001cao/app/compiler/gcc-11
export PATH=$myGCC/bin:$PATH
export CFLAGS="-gdwarf-2 -gstrict-dwarf"
myUCX=/home1/p001cao/app/tooldev/ucx-1.15-gcc
myPREFIX=/home1/p001cao/app/mpi/openmpi4.1.x-gcc11

../configure --with-sge --without-verbs --with-ucx=${myUCX} --prefix=${myPREFIX}

make -j 16 && make install

Test

module load mpi/ompi4.1.x-gcc11
mpirun --version

GCC 9

cd /home1/p001cao/0SourceCode
cd ompi-4.1.5
rm -rf build_gcc && mkdir build_gcc && cd build_gcc

module load compiler/gcc-9.5
myGCC=/home2/app/compiler/gcc/9.5.0
export PATH=$myGCC/bin:$PATH
myUCX=/home1/p001cao/app/tooldev/ucx1.15-gcc9
myPREFIX=/home1/p001cao/app/openmpi/4.1.5-gcc9

../configure --with-sge --without-verbs --with-ucx=${myUCX} --prefix=${myPREFIX}

make -j 16 && make install

Some optional packages

2. libnuma-devel

https://github.com/numactl/numactl

cd /home1/p001cao/0SourceCode/tooldev
tar xzf numactl-2.0.13.tar.gz
cd numactl-2.0.13

module load tooldev/autoconf-2.72c
./autogen.sh

rm -rf build && mkdir build && cd build
../configure --prefix=/home1/p001cao/app/tooldev/numactl-2.0.13

2. libudev

NOTE: remove -Wpedantic in Makefile

cd /home1/p001cao/0SourceCode/tooldev
git clone https://github.com/illiliti/libudev-zero.git
cd libudev-zero

make PREFIX=/home1/p001cao/app/tooldev/libudev-zero install

3. openMPI/UCX: libfabric ()

If building directly from the libfabric git tree, run './autogen.sh' before the configure step.

# wget https://github.com/ofiwg/libfabric/releases/tag/v1.19.0/libfabric-1.19.0.tar.bz2

cd /home1/p001cao/0SourceCode/tooldev
git clone -b main https://github.com/ofiwg/libfabric
cd libfabric
git pull origin main

module load tooldev/autoconf-2.72c
./autogen.sh
module load compiler/llvm-17

./configure --enable-ucx=no --prefix=/home1/p001cao/app/tooldev/libfabric-1.19
make -j 16 && make install

## module
prepend-path PKG_CONFIG_PATH $topdir/lib/pkgconfig

4. openMPI/UCX: KNEM

Dont use new compiler.

https://knem.gitlabpages.inria.fr/

tar zxvf knem-1.1.4.tar.gz
cd knem-1.1.4
./configure --prefix=/home1/p001cao/app/tooldev/knem-1.1.4

5. openMPI/UCX: XPMEM

https://github.com/hjelmn/xpmem/releases/tag/v2.6.3

https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM → cannot install: require linux kernel 4.x

check: uname -a
tar zxvf xpmem-2.6.3.tar.gz
cd xpmem-2.6.3

./configure --prefix=/home1/p001cao/app/tooldev/xpmem-2.6.2