Empire AI Grace — Job Submission and QoS Overview

Created by Cesar Arias, Modified on Fri, 22 May at 2:47 AM by Cesar Arias

GRACE CPU ENVIRONMENT

Empire AI Grace — Job Submission and QoS Overview

This guide explains how to submit jobs on Grace, how Grace QoS tiers affect cost and scheduling behavior, and which submission pattern to use for interactive work, quick validation, production runs, and longer-running CPU-only jobs.

Related articles

Partition note: Grace has a single CPU partition, grace. All Grace job examples use --partition=grace.

Mixed-architecture note: Grace is an ARM64 (aarch64) CPU environment. If part of your workflow also runs on Alpha x86_64 GPU nodes, environments and compiled packages may need to be rebuilt. Use the Alpha and Grace mixed-architecture guidance article when moving software between Alpha and Grace.

Environment overview

Grace is the CPU-only environment for MPI simulations, genomics pipelines, statistical analysis, parameter sweeps, and data preprocessing ahead of GPU training on Alpha. Each Grace node has 144 CPU cores and 478 GiB RAM, and all nodes belong to the single grace partition.

Submission pattern

The basic submission pattern on Grace is to use the grace partition, select the QoS tier that matches the job type, and set node count, task layout, and wall time. Unlike Alpha, there is no hardware-tier partition change planned; Grace remains a single CPU partition.

Workflow	Pattern	Example
Batch job	Submit a script to run unattended on Grace nodes.	`sbatch --partition=grace --qos=standard --nodes=4 --ntasks-per-node=144 --time=12:00:00 my_job.sh`
Interactive shell	Get a live shell on a Grace node for debugging and setup.	`srun --partition=grace --qos=interactive --nodes=1 --ntasks-per-node=144 --time=01:00:00 --pty bash`
Interactive allocation first	Request an interactive allocation first, then launch one or more commands inside it.	`salloc --partition=grace --qos=interactive --nodes=1 --ntasks-per-node=1 --cpus-per-task=4 --time=01:00:00`
Array job	Run many independent tasks with one submission.	`sbatch --partition=grace --qos=long --array=1-100 --nodes=1 --ntasks-per-node=1 --time=01:00:00 sweep.sh`

Example syntax note: In command examples, ... stands in for additional optional Slurm flags such as --time, --nodes, --cpus-per-task, --mem, --array, or --requeue. Remove ... and replace it with the extra options you need, or omit it entirely if you do not need any additional flags.

QoS tiers

Every Grace job runs under a QoS tier. The tier controls scheduling priority, wall-time limit, node limits, and SU billing behavior.

QoS	Best for	Max wall time	Max nodes/user	SU rate
test	Quick validation and script checks	2 hours	2 nodes (288 cores)	0.25 SU/node-hr
interactive	Live shell for debugging and setup	2 hours	1 node (144 cores), 1 running job	0.5 SU/node-hr
standard	Default production jobs	48 hours (2 days)	10 nodes (1,440 cores)	0.5 SU/node-hr
long	Longer, budget-conscious runs	7 days	5 nodes (720 cores)	0.25 SU/node-hr
priority	Deadline-driven urgent work	24 hours	20 nodes (2,880 cores)	1.0 SU/node-hr
burst	System-assigned overflow when SUs are exhausted	7 days	4 nodes (576 cores)	FREE (0 SU)

Every tier has a wall-time limit. No QoS tier gives unlimited runtime. If your job may exceed the tier wall-time limit, use checkpointing and consider --requeue --signal=B:SIGTERM@900 so you can resume cleanly on a later run.

Interactive methods

On Grace, interactive work can start with either srun or salloc. Both are valid, but they are useful in slightly different situations.

Command	What it does	Best for
`srun --pty bash`	Requests resources and immediately launches an interactive shell on the compute node	Quick CPU debugging sessions, environment setup, and short interactive checks
`salloc`	Requests an allocation first, then lets you run one or more commands inside that allocation	Workflows where you want more control after the allocation starts, such as launching multiple commands or mixing shell work with explicit `srun` steps

srun and salloc examples

Direct interactive shell with srun

srun --partition=grace --qos=interactive --nodes=1 --ntasks-per-node=1 --cpus-per-task=4 --time=00:10:00 --pty bash

Check the interactive shell

uname -m && nproc && free -h && echo interactive works && exit

Allocate first with salloc

salloc --partition=grace --qos=interactive --nodes=1 --ntasks-per-node=1 --cpus-per-task=4 --time=00:10:00

Then launch work inside the allocation

srun --pty bash
uname -m
nproc
python smoke_test.py

Interactive tier reminder: Grace interactive jobs are intended for short live sessions and are limited to 1 node, 1 running interactive job, and a 2-hour wall time. For unattended work, use sbatch on test, standard, long, or priority instead.

Example jobs

The examples below illustrate Grace submission workflows and can be adapted to match your project, node count, and runtime needs.

Quick QoS smoke tests

Test tier (2h, 2 nodes max)

sbatch --partition=grace --qos=test --nodes=1 --ntasks-per-node=1 --time=00:05:00 \
  --wrap="echo 'test tier works on' \$(hostname); uname -m; nproc; sleep 10"

Standard tier (default production)

sbatch --partition=grace --qos=standard --nodes=1 --ntasks-per-node=1 --time=00:05:00 \
  --wrap="echo 'standard tier works on' \$(hostname); uname -m; nproc; sleep 10"

Long tier (7 days, half price)

sbatch --partition=grace --qos=long --nodes=1 --ntasks-per-node=1 --time=00:05:00 \
  --wrap="echo 'long tier works on' \$(hostname); uname -m; nproc; sleep 10"

Priority tier (highest priority)

sbatch --partition=grace --qos=priority --nodes=1 --ntasks-per-node=1 --time=00:05:00 \
  --wrap="echo 'priority tier works on' \$(hostname); uname -m; nproc; sleep 10"

Interactive examples

Interactive shell on Grace

srun --partition=grace --qos=interactive --nodes=1 --ntasks-per-node=144 --time=00:10:00 --pty bash

Check the interactive environment

uname -m && nproc && free -h && echo 'interactive works' && exit

Production and array examples

Single-node production job with requeue

sbatch --partition=grace --qos=standard --requeue --signal=B:SIGTERM@900 \
  --nodes=1 --ntasks-per-node=144 --time=24:00:00 my_job.sh

Multi-node MPI job

sbatch --partition=grace --qos=standard --requeue --signal=B:SIGTERM@900 \
  --nodes=4 --ntasks-per-node=144 --time=2-00:00:00 my_mpi_job.sh

Array job (3 tasks)

sbatch --partition=grace --qos=test --nodes=1 --ntasks-per-node=1 --time=00:05:00 \
  --array=0-2 --wrap='echo "Array task $SLURM_ARRAY_TASK_ID of job $SLURM_ARRAY_JOB_ID running on $(hostname)"; sleep 5'

Requeue and checkpointing

If a run may exceed the wall-time limit or may be preempted (burst), combine checkpointing in your application with Slurm requeue eligibility and a warning signal before time-limit expiry. Slurm can restart the batch script, but your application still needs to save and reload its own state.

Flag	What it does
`--requeue`	Makes the batch job eligible to be requeued after preemption, node failure, or a requeue action.
`--signal=B:SIGTERM@900`	Sends `SIGTERM` to the batch shell 900 seconds (15 minutes) before wall time so your script can save a checkpoint and exit cleanly.
`--signal=SIGTERM@900`	Sends the warning signal directly to job steps when the application itself handles the signal.

A useful mental model is that Slurm can restart the batch script, but it does not save application state. Your code must write and reload checkpoints so requeued jobs resume instead of starting from scratch.

How to choose a tier

If you need...	Use...	Why
A quick script validation	`test`	Low cost, short wall time, designed for verifying code before full runs.
A live shell on a compute node	`interactive` with `srun --pty bash` or `salloc` followed by `srun`	Best for debugging, profiling, installation, and exploratory work.
A normal production batch job	`standard`	Balanced cost and scheduling. Default tier if you omit `--qos`.
A cheaper but less urgent long run	`long`	Half the SU rate of standard, longer wall time, lower scheduling priority.
An urgent deadline-driven job	`priority`	Highest scheduling priority at higher SU cost.

SU billing

SU = Nodes × Hours × Tier rate. For example, 4 nodes for 24 hours on standard cost 48 SU, while the same job on long costs 24 SU and on priority costs 96 SU.

Tier	Rate	4 nodes × 24h	Best for
priority	1.0 SU/node-hr	96 SU	Deadlines and urgent results
standard	0.5 SU/node-hr	48 SU	Normal production work
interactive	0.5 SU/node-hr	Max ~1 SU (1 node × 2h)	Live debugging sessions
test	0.25 SU/node-hr	24 SU	Script validation and small tests
long	0.25 SU/node-hr	24 SU	Long runs and budget-conscious work
burst	FREE	0 SU	Out-of-budget, preemptable work

Monitoring

squeue -u $USER shows your currently queued and running jobs.
sacct -u $USER --format=JobID%10,JobName%20,QOS%12,AllocTRES%35,Elapsed,State -S today shows your recent job history from today, including job name, QoS, allocated resources, elapsed time, and state.
sacctmgr show assoc where user=$USER format=Account%20,QOS%60 lists the Slurm accounts and QoS tiers associated with your user.
sshare -u $USER -Ul shows your fairshare usage and priority information.
scancel JOBID cancels a job; replace JOBID with the numeric job ID you want to stop.