Compile LAMMPS with FLARE
Anders Johansson
Compilation with CMake
Download LAMMPS, e.g.
git clone --depth=1 https://github.com/lammps/lammps.git
From the directory
flare/lammps_plugins
, run the install script.
cd path/to/flare/lammps_plugins
./install.sh /path/to/lammps
This copies the necessary files and slightly updates the LAMMPS CMake file.
Compile LAMMPS with CMake as usual. If you’re not using Kokkos, the Makefile system might also work.
Note: The pair style uses the Eigen (https://gitlab.com/libeigen/eigen) (header-only) library.
This needs to be found by the compiler. Specifically, the Eigen
folder. There are two workarounds.
Clone the
eigen
repository, andmv eigen/Eigen /path/to/lammps/src/
if it is not easily available on your system.If you are using LAMMPS version > 17Feb22, then you can set
-DPKG_MACHDYN=yes -DDOWNLOAD_EIGEN3=yes
for cmake, and Eigen3 will be downloaded and compiled automatically.
Compilation for GPU with Kokkos
Run the above steps, follow LAMMPS’s compilation instructions with Kokkos (https://docs.lammps.org/Build_extras.html#kokkos). Typically
cd lammps
mkdir build
cd build
cmake ../cmake -DPKG_KOKKOS=ON -DKokkos_ENABLE_CUDA=ON [-DKokkos_ARCH_VOLTA70=on if not auto-detected]
make -j<n_cpus>
If you really, really, really want to use the old Makefile system, you should be able to copy the files from kokkos/
into /path/to/lammps/src
, do make yes-kokkos
and otherwise follow the LAMMPS Kokkos instructions.
The KOKKOS_ARCH
must be changed according to your GPU model. Volta70
is for V100, Pascal60
is for P100, etc.
Basic usage
For the model with B2 descriptors and NormalizedDotProduct kernel, the LAMMPS input script:
newton on
pair_style flare
pair_coeff * * Si.txt
where Si.txt
should be replaced by the name of your mapped model. Then run lmp -in in.script
as usual.
For the model with 2+3 body descriptors, the LAMMPS non-Kokkos version uses newton off
, while the Kokkos version uses newton on
.
Running on a GPU with Kokkos
See the LAMMPS documentation (https://docs.lammps.org/Speed_kokkos.html). In general, run
lmp -k on g 1 -sf kk -pk kokkos newton on neigh full -in in.script
When running with MPI, replace g 1
with the number of GPUs per node.
In order to run large systems, the atoms will be divided into batches to reduce memory usage. The batch size is controlled by the MAXMEM
environment variable (in GB). If necessary, set this to an estimate of how much memory FLARE++ can use (i.e., total GPU memory minus LAMMPS’s memory for neighbor lists etc.). If you are memory-limited, you can set MAXMEM=1
or similar, otherwise leave it to a larger number for more parallelism. The default is 12 GB, which should work for most systems while not affecting performance.
MAXMEM
is printed at the beginning of the simulation from every MPI process, in order to verify that the environment variable has been correctly set on all nodes. Look at mpirun -x
if this is not the case.
For MPI run on CPU nodes, if you are running on multiple nodes on a cluster, you would typically launch one MPI task per node, and then set the number of threads equal to the number of cores (or hyperthreads) per node. A sample SLURM job script for 4 nodes, each with 48 cores, may look something like this:
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=48
mpirun -np $SLURM_NTASKS /path/to/lammps/src/lmp_kokkos_omp -k on t $SLURM_CPUS_PER_TASK -sf kk -package kokkos newton off neigh full -in in.lammps
When running on Knight’s Landing or other heavily hyperthreaded systems, you may want to try using more than one thread per CPU.
For MPI run on GPU nodes, if you are running on multiple nodes on a cluster, you would typically launch one MPI task per GPU. A sample SLURM job script for 4 nodes, each with 2 GPUs, may look something like this:
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gpus-per-node=2
mpirun -np $SLURM_NTASKS /path/to/lammps/src/lmp_kokkos_cuda_mpi -k on g $SLURM_GPUS_PER_NODE -sf kk -package kokkos newton off neigh full -in in.lammps
Notes on Newton (only relevant with Kokkos)
There are defaults which will kick in if you don’t specify anything in the input
script and/or skip the -package kokkos newton ... neigh ...
flag.
You can try these at your own risk, but it is safest to specify everything.
See also the documentation.
newton on
will probably be faster if you have a 2-body potential,
otherwise the alternatives should give roughly equal performance.