- OpenMPI-4
- Possible errors
- 1. Download
- 2. Compiling OpenMPI + GCC
- 3. Compiling OpenMPI + Intel
- 4. Make module file
- OpenMPI-5
- 2. Compiling OpenMPI + Clang
Open MPI is a Message Passing Interface (MPI) library project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI).
There are 3 ways to use IB in OpenMPI, (let compile with all, and use runtime setting to select)
- OpenIB is an very old Infiband implemented in OpenMPI. OpenIB is not maintained and will be remove in OpenMPI-5 [see this](https://github.com/open-mpi/ompi/issues/11755)
- UCX: newer OpenMPI uses UCX. But some apps may conflict with UCX (e.g., Gpaw)
- Libfabric: this may a reasonable choice now [libfabric](https://github.com/ofiwg/libfabric) to instead of OpenIB.
- Some applications require C++11, this is only supported on GCC 4.8 or newer, which is not always available on system, then newer GCC need to be installed before compiling Openmpi.
- Make sure to build OpenMPI with 64-bit support. To check whether the currently available OpenMPI do support 64-bit or not, type this: `ompi_info -a | grep 'Fort integer size'. If the output is 8, then it supports 64-bit. If output is 4, then it just supports 32-bit.* configuration for 64-bit support:
- For Intel compilers use:
- For GNU compilers type: `FFLAGS="-m64 -fdefault-integer-8" FCFLAGS="-m64 -fdefault-integer-8" CFLAGS=-m64 CXXFLAGS=-m64'
- must keep the source after compiling
- consider to use UCX
- consider compile your own PMIX.
- consider using linker
lld linker:
gold linker:
Possible errors¶
- OpenMPI-4 use UCX by default (openMPI 4.0,3 → ucx-1.7 or older). Solution: compile your own UCX.
- No components were able to be opened in the pml framework.
PML ucx cannot be selected
. This error may be due to no IB device, check it
- counter exceeded may be solved by compile openMPI with your own PMIX.
1. Download¶
2. Compiling OpenMPI + GCC¶
Need separated installations for: eagle, lion/leopard, cheetah, taycheon
Installation OPTIONS in README.txt or ./configure -h
- Sun Grid:
- InfiniBand:
- with KNEM:
- use UCX:
USC1: (Cenntos 6.5)¶
- should use gold-linker to avoid compiling error
- UCX cause error: ib_md.c:329 UCX ERROR ibv_reg_mr(address=0x145cb580, length=263504, access=0xf) failed: Resource temporarily unavailable. So dont use UCX on this server.
module load tool_dev/binutils-2.36 # gold, should use to avoid link-error
module load compiler/gcc-11.2
export myKNEM=/uhome/p001cao/app/tool_dev/knem-1.1.4
InfiniBand cluster¶
cd openmpi-4.1.1
mkdir build_eagle && cd build_eagle
../configure CC=gcc CXX=g++ FC=gfortran F77=gfortran LDFLAGS="-fuse-ld=gold -lrt" \
--with-sge --without-ucx --with-verbs --with-knem=${myKNEM} \
no InfiniBand cluster¶
cd openmpi-4.1.1
mkdir build_lion && cd build_lion
../configure CC=gcc CXX=g++ FC=gfortran F77=gfortran LDFLAGS="-fuse-ld=gold -lrt" \
--with-sge --without-ucx --without-verbs --with-knem=${myKNEM} \
CANlab: (Cenntos 5.8)¶
module load gcc/gcc-7.4.0
../configure CC=gcc CXX=g++ FC=gfortran F77=gfortran \
--with-sge --without-verbs --without-ucx \
CAN-GPU: (Ubuntu-18)¶
- install Cuda ussing GCC
- cuda-10 only support to gcc-8
- need binutils 2.22 or newer to link cuda
Install conda¶
- CLI install Cuda
- Download:
wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_rhel6.run
- Install (using Root acc)
- disable the graphical target, to update Nvidia driver
module load compiler/gcc-7.4
sh cuda_10.2.89_440.33.01_rhel6.run --toolkitpath=/home/thang/app/cuda-10.2
- after install Cuda, start the graphical environment again
compile OpenMPI¶
cd openmpi-4.1.1
mkdir build && cd build
module load compiler/gcc-7.4 # cuda-10 only support to gcc-8
module load binutils-2.35
../configure CC=gcc CXX=g++ FC=gfortran F77=gfortran \
--with-sge --without-ucx \
--with-cuda=/home/thang/app/cuda-10.2 \
3. Compiling OpenMPI + Intel¶
USC1: (Cenntos 6.5)¶
InfiniBand cluster¶
module load intel/compiler-xe19u5
module load compiler/gcc/9.1.0
# check: icpc -v
export PATH=/home1/p001cao/app/intel/xe19u5/compilers_and_libraries_2019.5.281/linux/bin/intel64:$PATH
export CC=icc export CXX=icpc export FORTRAN=ifort
../configure CC=icc CXX=icpc FC=ifort F77=ifort \
--with-sge --without-ucx --with-verbs --with-knem=${myKNEM} \
USC2: (Cenntos 6.9)¶
# use linker lld (include in Intel-bin, require GLIBC >2.15)
module load compiler/gcc-10.1.0
module load intel/compiler-xe19u5 # lld
export PATH=/home1/p001cao/app/intel/xe19u5/compilers_and_libraries_2019.5.281/linux/bin/intel64:$PATH
export CC=icc export CXX=icpc export FORTRAN=ifort
export myUCX=/home1/p001cao/app/tool_dev/ucx-1.8-intel
../configure CC=icc CXX=icpc FC=ifort F77=ifort LDFLAGS="-fuse-ld=lld -lrt" \
--with-sge --without-verbs --with-ucx=${myUCX} \
4. Make module file¶
at directory: /uhome/p001cao/local/share/lmodfiles/mpi→ create file "ompi4.1.1-gcc11.2-noUCX"
# for Tcl script use only
module load compiler/gcc-11.2
module load tool_dev/binutils-2.37
set topdir /uhome/p001cao/app/openmpi/4.1.1-gcc11.2-noUCX-eagle
prepend-path PATH $topdir/bin
prepend-path LD_LIBRARY_PATH $topdir/lib
prepend-path INCLUDE $topdir/include
prepend-path PKG_CONFIG_PATH $topdir/lib/pkgconfig # this is required
USC2(Cenntos 6.9)¶
- Now, compile with all IB options, and select them by runtime parameters. (not work, should exclude UCX)
- How to build from source code see here
(default - auto detect)./autogen.pl
is the same as./autogen.sh
# cd /home1/p001cao/0SourceCode
# wget https://github.com/open-mpi/ompi/releases/tag/v4.1.4/ompi-4.1.4.tar.gz
# tar xvf openmpi-4.1.4.tar.gz
# cd openmpi-4.1.4
cd /home1/p001cao/0SourceCode
# wget https://github.com/open-mpi/ompi/releases/download/v4.1.5/ompi-4.1.5.tar.gz
# git clone -b v4.1.x https://github.com/open-mpi/ompi.git ompi-4.1.x
cd ompi-4.1.x
git pull origin v4.1.x
module load tooldev/autoconf-2.72c
module load tooldev/automake-1.16.5
module load tooldev/libtool-2.4.7
export ACLOCAL_PATH=/home1/p001cao/app/tooldev/libtool-2.4.7/share/aclocal
Using LLVM¶
- To use clang libc++, use this link
export CPPFLAGS="-nodefaultlibs -lc++ -lc++abi -lm -lc -lgcc_s -lgcc"
. But might not be used? - with
, To solveerror: unknown argument: '-soname'
→ see this
rm -rf build_llvm && mkdir build_llvm && cd build_llvm
module load compiler/llvm-17 # clang + lld
module load tooldev/ucx1.15-clang17
export PATH=$myLLVM/bin:$PATH
export CC=clang CXX=clang++ FC=gfortran # flang-new
export LDFLAGS="-fuse-ld=lld -lrt"
../configure --with-sge --with-verbs --with-ucx=${myUCX} --with-knem=${KNEM} --with-ofi=${OFI} --prefix=${myPREFIX}
make -j 16 && make install
Other options
export my_PMIX=/home1/p001cao/app/tool_dev/pmix-4.1.2
export my_libevent=/home1/p001cao/app/tool_dev/libevent-2.1.11 # require by PMIX
export my_hwloc=/home1/p001cao/app/tool_dev/hwloc-2.8.0
--with-pmix=${my_PMIX} --with-libevent=${my_libevent} --with-hwloc=${my_hwloc}
rm -rf build_noUCX && mkdir build_noUCX && cd build_noUCX
module load compiler/llvm-17 # clang + lld
export PATH=$myLLVM/bin:$PATH
export CC=clang CXX=clang++ FC=gfortran # flang-new
export LDFLAGS="-fuse-ld=lld -lrt"
../configure --with-sge --with-verbs --without-ucx --with-knem=${KNEM} --with-ofi=${OFI} --prefix=${myPREFIX}
make -j 16 && make install
GCC 11¶
cd /home1/p001cao/0SourceCode
cd ompi-4.1.x
rm -rf build_ase && mkdir build_ase && cd build_ase
module load compiler/gcc-11
export PATH=$myGCC/bin:$PATH
export CFLAGS="-gdwarf-2 -gstrict-dwarf"
../configure --with-sge --without-verbs --with-ucx=${myUCX} --prefix=${myPREFIX}
make -j 16 && make install
GCC 9¶
cd /home1/p001cao/0SourceCode
cd ompi-4.1.5
rm -rf build_gcc && mkdir build_gcc && cd build_gcc
module load compiler/gcc-9.5
export PATH=$myGCC/bin:$PATH
../configure --with-sge --without-verbs --with-ucx=${myUCX} --prefix=${myPREFIX}
make -j 16 && make install
Some optional packages¶
2. libnuma-devel¶
cd /home1/p001cao/0SourceCode/tooldev
tar xzf numactl-2.0.13.tar.gz
cd numactl-2.0.13
module load tooldev/autoconf-2.72c
rm -rf build && mkdir build && cd build
../configure --prefix=/home1/p001cao/app/tooldev/numactl-2.0.13
2. libudev¶
NOTE: remove -Wpedantic
in Makefile
cd /home1/p001cao/0SourceCode/tooldev
git clone https://github.com/illiliti/libudev-zero.git
cd libudev-zero
make PREFIX=/home1/p001cao/app/tooldev/libudev-zero install
3. openMPI/UCX: libfabric ()¶
If building directly from the libfabric git tree, run './autogen.sh' before the configure step.
# wget https://github.com/ofiwg/libfabric/releases/tag/v1.19.0/libfabric-1.19.0.tar.bz2
cd /home1/p001cao/0SourceCode/tooldev
git clone -b main https://github.com/ofiwg/libfabric
cd libfabric
git pull origin main
module load tooldev/autoconf-2.72c
module load compiler/llvm-17
./configure --enable-ucx=no --prefix=/home1/p001cao/app/tooldev/libfabric-1.19
make -j 16 && make install
## module
prepend-path PKG_CONFIG_PATH $topdir/lib/pkgconfig
4. openMPI/UCX: KNEM¶
Dont use new compiler.
5. openMPI/UCX: XPMEM¶
https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM → cannot install: require linux kernel 4.x