7. Job Management
On HPC, you don’t run your program on the shell directly (on login nodes). Instead you submit your program as a job to run on compute nodes. These jobs are managed by SLURM and will be queued to the system and executed eventually. Conceptually, each job is a 2-step process:
You request certain resources from the system. The most common resources are CPU cores.
With the assigned resources, you run your computational tasks.
HPC provides flexibility in types of jobs as per the resource and computer requirements of its users. Down below are the most commonly used job types.
SLURM Useful Links
7.1. Batch Jobs
Users should use batch jobs for the most part unless your requirements can’t be met without a direct shell access.
A complete batch job workflow:
Write a job script, which consists of 2 parts:
Resources requirement.
Commands to be executed.
Submit the job.
Relax, have a coffee, log off if you wish. The computer will do the work.
Come back to examine the result.
7.1.1. Batch Job Script
A job script is a text file describing the job. As discussed, the first part tells how much resources you want. The second part is what you want to run.
Points to be noted:
Request only what you need
Serial jobs would need only one CPU (#SBATCH -n 1)
Make sure the walltime specified is not greater than the allowed time limit.
Difference between CPUs, Cores and Tasks
On Greene HPC, One CPU is equivalent to one Core.
In Slurm, the resources (CPUs) are allocated in terms of tasks which are denoted by
-n
or--natsks
.By Default, the value of
-n
or--ntasks
is one if left undefined.By Default, Each task is equivalent to one CPU.
But if you have defined
-c
or--cpus-per-task
in your job script, then the total number of CPUs allocated to you would be the multiple of-n
and-c
.
7.1.1.1. Syntax
#!/bin/bash
# Define the resource requirements here using #SBATCH tag,
# ('#' before SBATCH is required and you do not remove it).
#------ Resource requiremenmt commands start here
#SBATCH <option> <value>
#SBATCH <option> <value>
#SBATCH <option> <value>
...
#------ Resource requiremenmt commands end here
#------ Commmands to be executed
<command executable on shell>
<command executable on shell>
<command executable on shell>
...
Save the job scripts with .sbatch
file extension.
7.1.1.2. Options
The options tell SLURM information about the job, such as what resources will be needed. These can be specified in the job-script as SBATCH directives, or on the command line as options while submitting a job, or both (in which case the command line options take precedence should the two contradict each other).
For each option there is a corresponding SBATCH directive with the syntax:
#!/bin/bash
#SBATCH --nodes=2
sbatch abc.sbatch
or as a command-line option to sbatch when you submit the job:
sbatch --nodes=2 abc.sbatch
Available options:
Option |
Description |
---|---|
|
Give the job a name. The default is the filename of the job script. |
|
Send stdout to path/for/stdout. The default filename is slurm-${SLURM_JOB_ID}.out, e.g. slurm-12345.out, |
|
Send stderr to specified path |
|
Send email to specified email id when certain events occur |
|
Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL… |
|
Pass variables to the job, either with a specific value (the VAR= form) or from the submittingenvironment |
|
Set a limit on the total run time. Acceptable formats include “minutes”, “minutes:seconds”, |
|
Maximum memory per node the job will need, unit can be MB, GB, ex- 3GB |
|
Memory required per allocated CPU, unit can be MB, GB |
|
Number of nodes are required. Default is 1 node |
|
Specify the number of tasks to run, e.g. -n4. Default is one CPU core per task. Don’t just submitthe job, |
|
Request that ntasks be invoked on each node |
|
Require ncpus number of CPU cores per task. Without this option, allocate one core per task |
|
Execute the first task in pseudo terminal mode, e.g. –pty /bin/bash, to start a bash command shell |
|
Enable X forwarding, so programs using a GUI can be used during the session (provided you have X |
|
Delay starting this job until after the specified date and time, e.g. –begin=9:42:00, to start the job at |
|
Submit an array of jobs with array ids as specified. Array ids can be specified as a numerical range, a |
Note
A more comprehensive list of options can be found here.
7.1.2. Submitting a batch Job
To submit a job you need to use sbatch
command.
# sbatch [options] <filename>
sbatch abc.sbatch
Attention
Requesting the resources you need, as accurately as possible, allows your job to be started at the earliest opportunity as well as helping the system to schedule work efficiently to everyone’s benefit.
7.1.3. Reading Outputs
After a job execution finishes, a file is generated with the name specified in sbatch file which contains the stdout or the shell output. You can read the output by opening the output file.
cat job_1323234.out
7.1.4. Examples
Job with one 1 core:
Create a file python_program.sbatch
to run abc.py
.
#!/bin/bash
#SBATCH --ntasks=1 # Set number of tasks to run
#SBATCH --time=00:30:00 # Walltime format hh:mm:ss
#SBATCH -o job_%J.out # Output file (%J: JobID)
#SBATCH -e job_%J.err # Error file
# ---- Put all #SBATCH directives above this line! ---- #
# ---- Otherwise they will not be effective! ---- #
# ---- Actual commands start here ---- #
# Load modules here (safety measure)
module purge
module load python
python abc.py
Submit the job:
sbatch python_program.sbatch
A typical batch job script with GPU:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=5:00:00 # Walltime, expected time of job
# completion, SLURM can kill it
# after this time
#SBATCH --mem=2GB # RAM
#SBATCH --gres=gpu:1 # 1 GPU
#SBATCH --job-name=myTest # Assigned Job Name
#SBATCH --mail-type=END # Email on completion
#SBATCH --mail-user=bob.smith@nyu.edu # Email
#SBATCH --output=slurm_%j.out # Output file
# Clean environment
module purge
module load cuda/11.3.1
module load anaconda3/2020.07
eval "$(conda shell.bash hook)"
conda activate pytorch_env
cd $SCRATCH/Projects/DLProject
python resnet.py
Tip
If you want to select a specific GPU, you can specify that by gpu:rtx8000:1
for Quadro RTX8000 and gpu:v100:1
for Tesla V100.
Warning
A job having a GPU resource is terminated if GPU usage is very low for 2 hours continuously.
7.2. Interactive Sessions
Instead of submitting a job, you could get an interactive session from your terminal on compute nodes, where you can run your program on the shell directly.
Warning
Only short interactive jobs should be used (e.g., experimenting with new hyper-parameters in your source code taking a short runtime on each execution).
To start an interactive session srun
command is used.
Request a session with 4 CPU cores:
srun -c 4 --pty /bin/bash
Expected output:
[wz22@login-0-1]$ srun -c 4 --pty /bin/bash
srun: job 775175 queued and waiting for resources
srun: job 775175 has been allocated resources
[wz22@compute-21-1 ~]$
Then you can run your applications on the terminal directly.
Warning
In a real scenario, the system might be exhausted with no available resources to you. You need to wait in this circumstance.
With interactive session you can have the same arguments passed to srun
as you pass into the job script with sbatch
.
To request a GPU session with 32 GB RAM and 10 CPU cores for 1 hour:
srun -c 10 --mem=32GB --gres=gpu:1 -t 1:00:00 --pty /bin/bash
To leave an interactive batch session, type exit
at the command prompt.
7.3. Checking Job Status
7.3.1. Running or Pending Job
This command shows all your current jobs.
squeue -u $USER
Example output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
31408 ser_std job1.sh wz22 R 0:02 1 compute-21-4
It means the job with Job ID 31408
, has running status (ST: R
) and its runtime is 2 minutes on compute-21-4
cluster node.
You can also get information about job status by running:
sstat --format=TresUsageInMax%80,TresUsageInMaxNode%80 -j <JobID> --allsteps
For more verbose information, use scontrol show job.
scontrol show job <jobid>
7.3.2. Completed Job
Once the job is finished, the job can’t be inspected by squeue
or scontrol show job
. At this point, you could inspect the job by sacct
.
sacct -j <jobid>
The following commands give you extremely verbose information on a job.
sacct -j <jobid> -l
7.4. Cancelling a Job
If you decide to end a job prematurely, use scancel
commmand.
scancel <jobid>
Use with caution !
To cancel all jobs from your account. Run this on the HPC terminal.
scancel -u $USER
7.5. Job Resource Usage Statistics
7.5.1. Completed Job
A useful command that allows you to better understand how resources were utilized by completed jobs is seff
:
seff <job-ID>
Example Output:
Job ID: 8932105
Cluster: greene
User/Group: NetID/GROUPID
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 02:22:45
CPU Efficiency: 99.99% of 02:22:46 core-walltime
Job Wall-clock time: 02:22:46
Memory Utilized: 2.18 GB
Memory Efficiency: 21.80% of 10.00 GB
This example shows statistics on a completed job, that was ran with a request of 1 cpu core and 10Gb of RAM. While CPU utilization was 100%, RAM utilization was very poor – only 2.2GB out of requested 10GB was used. This job’s batch script should definitely be adjusted to something like #SBATCH –mem=2250MB
7.5.2. Running job
From login node, run:
top -u $USER
Take a look how fully you use CPUs and how much RAM your jobs are using.
To exit hit Ctrl+C
For a GPU job also run:
nvidia-smi
Take a look how much GPU processing power your job is using.
7.5.3. Visualize Job Statistics
You can use the below dashboard to visualize the efficiency and utilization of your jobs:
Note
You will have to login using your NYU email and you need to be on NYU Network or connected to NYU VPN.
7.6. Resource Limitations
Within SLURM there are multiple resource limits defined on different levels and applied to different objects. Some of the important limits are listed below:
Note
Note, that these limits are frequently updated by the HPC team, based on the cluster usage patterns. Due to this, the numbers above are not exact and should only be used as general guidelines.
7.6.1. Job Limitations
Resource / Object per User |
Limit |
---|---|
Concurrent Jobs |
2000 |
Job Lifetime |
7 days / 168 hours (extendible) |
7.6.2. CPU, GPU, RAM Limitations
These limitations are account specific and you need to run the command below to check yours:
sacctmgr list qos format=name,maxwall,maxtresperuser%40,flags%40 where name=interact,cpu48,cpu168,gpu48,gpu168,gpuamd,cds,cpuplus,gpuplus
Example Output:
Name MaxWall MaxTRESPU Flags
---------- ----------- ------------------------ ---------------
cpu48 2-00:00:00 cpu=3000,mem=6000G
cpu168 7-00:00:00 cpu=1000,mem=2000G
gpu48 2-00:00:00 gres/gpu=24
gpu168 7-00:00:00 gres/gpu=4
From this you can see that in the “short queue” (under 48 hours, or 2 days) each user is allowed to utilize up to 3000 cores. For jobs that are running in the “long queue” (under 168 hours, or 7 days) you can use up to 1000 cores. Basic idea behind this – users can run more short jobs, and fewer long jobs. The same logic applies to GPU resources.
7.6.3. CPU with GPU Limitations
For Tesla V100:
# GPUs |
Max CPUs |
Max Memory |
---|---|---|
1 |
20 |
200 |
2 |
24 |
300 |
3 |
44 |
350 |
4 |
48 |
369 |
For Quadro RTX8000:
# GPUs |
Max CPUs |
Max Memory |
---|---|---|
1 |
20 |
200 |
2 |
24 |
300 |
3 |
44 |
350 |
4 |
48 |
369 |
From this table you can for example see, that a job asking for 8 V100 GPUs will not be queued. Another example is that requests for 2 V100s and 48 cores will also not be granted.
7.7. HPC Resource Status
You can check current compute resource status of whole HPC using the below dashboards:
Note
You need to be on NYU Network or connected to NYU VPN.