How to Use HPX Applications with SLURM

Just like PBS (described in section Using PBS), SLURM is a job management system which is widely used on large supercomputing systems. Any HPX application can easily be run using SLURM. This section describes how this can be done.

The easiest way to run an HPX application using SLURM is to utilize the command line tool srun which interacts with the SLURM batch scheduling system.

srun -p <partition> -N <number-of-nodes> hpx-application <application-arguments>

Here, <partition> is one of the node partitions existing on the target machine (consult the machines documentation to get a list of existing partitions) and <number-of-nodes> is the number of compute nodes you want to use. By default, the HPX application is started with one locality per node and uses all available cores on a node. You can change the number of localities started per node (for example to account for NUMA effects) by specifying the -n option of srun. The number of cores per locality can be set by -c. The <application-arguments> are any application specific arguments which need to passed on to the application.

	Note
	There is no need to use any of the HPX command line options related to the number of localities, number of threads, or related to networking ports. All of this information is automatically extracted from the SLURM environment by the HPX startup code.

Important

	Important
The srun documentation explicitly states: "If `-c` is specified without `-n`, as many tasks will be allocated per node as possible while satisfying the `-c` restriction. For instance on a cluster with 8 CPUs per node, a job request for 4 nodes and 3 CPUs per task may be allocated 3 or 6 CPUs per node (1 or 2 tasks per node) depending upon resource consumption by other jobs." For this reason, we suggest to always specify `-n <number-of-instances>`, even if `<number-of-instances>` is equal to one (`1`).

The srun documentation explicitly states: "If -c is specified without -n, as many tasks will be allocated per node as possible while satisfying the -c restriction. For instance on a cluster with 8 CPUs per node, a job request for 4 nodes and 3 CPUs per task may be allocated 3 or 6 CPUs per node (1 or 2 tasks per node) depending upon resource consumption by other jobs." For this reason, we suggest to always specify -n <number-of-instances>, even if <number-of-instances> is equal to one (1).

Interactive Shells

To get an interactive development shell on one of the nodes you can issue the following command:

srun -p <node-type> -N <number-of-nodes> --pty /bin/bash -l

After the shell has been opened, you can run your HPX application. By default, it uses all available cores. Note that if you requested one node, you don't need to do srun again. However, if you requested more than one nodes, and want to run your distributed application, you can use srun again to start up the distributed HPX application. It will use the resources that have been requested for the interactive shell.

Scheduling Batch Jobs

The above mentioned method of running HPX applications is fine for development purposes. The disadvantage that comes with srun is that it only returns once the application is finished. This might not be appropriate for longer running applications (for example benchmarks or larger scale simulations). In order to cope with that limitation you can use the sbatch command.

The sbatch command expects a script that it can run once the requested resources are available. In order to request resources you need to add #SBATCH comments in your script or provide the necessary parameters to sbatch directly. The parameters are the same as with srun. The commands you need to execute are the same you would need to start your application as if you were in an interactive shell.