Flags, TRES and GRES in SLURM? What do they do?

In SLURM, flags, tres (Task Resource Efficiency System), and gres (Generic Resource) are used to specify and manage job resource requirements. These parameters allow users to request specific hardware resources and configure job behaviors more precisely.

Flags: Flags in SLURM are a set of options used to modify the behavior of the sbatch command, which is used for job submission. Flags can be used to set various job properties and requirements, such as the number of CPU cores, memory, walltime, output file names, job name, etc. Flags start with a double dash (--) followed by the flag name and its value. For example:

   sbatch --nodes=2 --ntasks-per-node=8 --time=02:00:00 my_job_script.sh

In this example, --nodes=2 specifies that the job should be run on two compute nodes, --ntasks-per-node=8 sets the number of tasks (CPU cores) per node to 8, and --time=02:00:00 sets the maximum runtime for the job to 2 hours.

TRES (Task Resource Efficiency System): TRES is a mechanism in SLURM that helps track and manage job resource usage efficiently. It is a part of the SLURM accounting system. TRES can be used to allocate and enforce resource limits for individual jobs based on factors like CPU usage, memory, or GPU usage. TRES settings are often configured by system administrators to ensure fair resource allocation and prevent resource hogging.
GRES (Generic Resource): GRES in SLURM is used to request and manage generic resources, which are hardware resources other than standard CPU and memory. GRES allows users to specify and request resources such as GPUs, specialized accelerators (e.g., FPGAs), networking resources, or custom hardware components. To request a GRES, users can use the --gres flag when submitting the job.

   sbatch --gres=gpu:2 my_gpu_job_script.sh

In this example, the --gres=gpu:2 flag requests two GPU resources for the job.

By using flags, TRES, and GRES effectively, users can ensure that their jobs have the necessary resources to execute correctly and administrators can manage the cluster’s resource allocation more efficiently. It’s essential to refer to the SLURM documentation and consult with system administrators to understand the specific flag, TRES, and GRES options available on your HPC cluster.

Flags, TRES and GRES in SLURM? What do they do?

Related Posts

How to STOP a running job in SLURM

SLURM Partitions

SLURM’s Fairshare Algorithm for scheduling

What do node states in SLURM represent?

Recent Posts