Empire AI Grace — Job Submission and QoS Overview

Created by Cesar Arias, Modified on Fri, 22 May at 2:47 AM by Cesar Arias

 GRACE CPU ENVIRONMENT 

Empire AI Grace — Job Submission and QoS Overview

This guide explains how to submit jobs on Grace, how Grace QoS tiers affect cost and scheduling behavior, and which submission pattern to use for interactive work, quick validation, production runs, and longer-running CPU-only jobs.

Partition note: Grace has a single CPU partition, grace. All Grace job examples use --partition=grace.
Mixed-architecture note: Grace is an ARM64 (aarch64) CPU environment. If part of your workflow also runs on Alpha x86_64 GPU nodes, environments and compiled packages may need to be rebuilt. Use the Alpha and Grace mixed-architecture guidance article when moving software between Alpha and Grace.

Environment overview

Grace is the CPU-only environment for MPI simulations, genomics pipelines, statistical analysis, parameter sweeps, and data preprocessing ahead of GPU training on Alpha. Each Grace node has 144 CPU cores and 478 GiB RAM, and all nodes belong to the single grace partition.

Submission pattern

The basic submission pattern on Grace is to use the grace partition, select the QoS tier that matches the job type, and set node count, task layout, and wall time. Unlike Alpha, there is no hardware-tier partition change planned; Grace remains a single CPU partition.

WorkflowPatternExample
Batch jobSubmit a script to run unattended on Grace nodes.sbatch --partition=grace --qos=standard --nodes=4 --ntasks-per-node=144 --time=12:00:00 my_job.sh
Interactive shellGet a live shell on a Grace node for debugging and setup.srun --partition=grace --qos=interactive --nodes=1 --ntasks-per-node=144 --time=01:00:00 --pty bash
Interactive allocation firstRequest an interactive allocation first, then launch one or more commands inside it.salloc --partition=grace --qos=interactive --nodes=1 --ntasks-per-node=1 --cpus-per-task=4 --time=01:00:00
Array jobRun many independent tasks with one submission.sbatch --partition=grace --qos=long --array=1-100 --nodes=1 --ntasks-per-node=1 --time=01:00:00 sweep.sh
Example syntax note: In command examples, ... stands in for additional optional Slurm flags such as --time, --nodes, --cpus-per-task, --mem, --array, or --requeue. Remove ... and replace it with the extra options you need, or omit it entirely if you do not need any additional flags.

QoS tiers

Every Grace job runs under a QoS tier. The tier controls scheduling priority, wall-time limit, node limits, and SU billing behavior.

QoSBest forMax wall timeMax nodes/userSU rate
testQuick validation and script checks2 hours2 nodes (288 cores)0.25 SU/node-hr
interactiveLive shell for debugging and setup2 hours1 node (144 cores), 1 running job0.5 SU/node-hr
standardDefault production jobs48 hours (2 days)10 nodes (1,440 cores)0.5 SU/node-hr
longLonger, budget-conscious runs7 days5 nodes (720 cores)0.25 SU/node-hr
priorityDeadline-driven urgent work24 hours20 nodes (2,880 cores)1.0 SU/node-hr
burstSystem-assigned overflow when SUs are exhausted7 days4 nodes (576 cores)FREE (0 SU)
Every tier has a wall-time limit. No QoS tier gives unlimited runtime. If your job may exceed the tier wall-time limit, use checkpointing and consider --requeue --signal=B:SIGTERM@900 so you can resume cleanly on a later run.

Interactive methods

On Grace, interactive work can start with either srun or salloc. Both are valid, but they are useful in slightly different situations.

CommandWhat it doesBest for
srun --pty bashRequests resources and immediately launches an interactive shell on the compute nodeQuick CPU debugging sessions, environment setup, and short interactive checks
sallocRequests an allocation first, then lets you run one or more commands inside that allocationWorkflows where you want more control after the allocation starts, such as launching multiple commands or mixing shell work with explicit srun steps
srun and salloc examples

Direct interactive shell with srun

srun --partition=grace --qos=interactive --nodes=1 --ntasks-per-node=1 --cpus-per-task=4 --time=00:10:00 --pty bash

Check the interactive shell

uname -m && nproc && free -h && echo interactive works && exit

Allocate first with salloc

salloc --partition=grace --qos=interactive --nodes=1 --ntasks-per-node=1 --cpus-per-task=4 --time=00:10:00

Then launch work inside the allocation

srun --pty bash
uname -m
nproc
python smoke_test.py
Interactive tier reminder: Grace interactive jobs are intended for short live sessions and are limited to 1 node, 1 running interactive job, and a 2-hour wall time. For unattended work, use sbatch on test, standard, long, or priority instead.

Example jobs

The examples below illustrate Grace submission workflows and can be adapted to match your project, node count, and runtime needs.

Quick QoS smoke tests

Test tier (2h, 2 nodes max)

sbatch --partition=grace --qos=test --nodes=1 --ntasks-per-node=1 --time=00:05:00 \
  --wrap="echo 'test tier works on' \$(hostname); uname -m; nproc; sleep 10"

Standard tier (default production)

sbatch --partition=grace --qos=standard --nodes=1 --ntasks-per-node=1 --time=00:05:00 \
  --wrap="echo 'standard tier works on' \$(hostname); uname -m; nproc; sleep 10"

Long tier (7 days, half price)

sbatch --partition=grace --qos=long --nodes=1 --ntasks-per-node=1 --time=00:05:00 \
  --wrap="echo 'long tier works on' \$(hostname); uname -m; nproc; sleep 10"

Priority tier (highest priority)

sbatch --partition=grace --qos=priority --nodes=1 --ntasks-per-node=1 --time=00:05:00 \
  --wrap="echo 'priority tier works on' \$(hostname); uname -m; nproc; sleep 10"
Interactive examples

Interactive shell on Grace

srun --partition=grace --qos=interactive --nodes=1 --ntasks-per-node=144 --time=00:10:00 --pty bash

Check the interactive environment

uname -m && nproc && free -h && echo 'interactive works' && exit
Production and array examples

Single-node production job with requeue

sbatch --partition=grace --qos=standard --requeue --signal=B:SIGTERM@900 \
  --nodes=1 --ntasks-per-node=144 --time=24:00:00 my_job.sh

Multi-node MPI job

sbatch --partition=grace --qos=standard --requeue --signal=B:SIGTERM@900 \
  --nodes=4 --ntasks-per-node=144 --time=2-00:00:00 my_mpi_job.sh

Array job (3 tasks)

sbatch --partition=grace --qos=test --nodes=1 --ntasks-per-node=1 --time=00:05:00 \
  --array=0-2 --wrap='echo "Array task $SLURM_ARRAY_TASK_ID of job $SLURM_ARRAY_JOB_ID running on $(hostname)"; sleep 5'

Requeue and checkpointing

If a run may exceed the wall-time limit or may be preempted (burst), combine checkpointing in your application with Slurm requeue eligibility and a warning signal before time-limit expiry. Slurm can restart the batch script, but your application still needs to save and reload its own state.

FlagWhat it does
--requeueMakes the batch job eligible to be requeued after preemption, node failure, or a requeue action.
--signal=B:SIGTERM@900Sends SIGTERM to the batch shell 900 seconds (15 minutes) before wall time so your script can save a checkpoint and exit cleanly.
--signal=SIGTERM@900Sends the warning signal directly to job steps when the application itself handles the signal.

A useful mental model is that Slurm can restart the batch script, but it does not save application state. Your code must write and reload checkpoints so requeued jobs resume instead of starting from scratch.

How to choose a tier

If you need...Use...Why
A quick script validationtestLow cost, short wall time, designed for verifying code before full runs.
A live shell on a compute nodeinteractive with srun --pty bash or salloc followed by srunBest for debugging, profiling, installation, and exploratory work.
A normal production batch jobstandardBalanced cost and scheduling. Default tier if you omit --qos.
A cheaper but less urgent long runlongHalf the SU rate of standard, longer wall time, lower scheduling priority.
An urgent deadline-driven jobpriorityHighest scheduling priority at higher SU cost.

SU billing

SU = Nodes × Hours × Tier rate. For example, 4 nodes for 24 hours on standard cost 48 SU, while the same job on long costs 24 SU and on priority costs 96 SU.

TierRate4 nodes × 24hBest for
priority1.0 SU/node-hr96 SUDeadlines and urgent results
standard0.5 SU/node-hr48 SUNormal production work
interactive0.5 SU/node-hrMax ~1 SU (1 node × 2h)Live debugging sessions
test0.25 SU/node-hr24 SUScript validation and small tests
long0.25 SU/node-hr24 SULong runs and budget-conscious work
burstFREE0 SUOut-of-budget, preemptable work

Monitoring

  • squeue -u $USER shows your currently queued and running jobs.
  • sacct -u $USER --format=JobID%10,JobName%20,QOS%12,AllocTRES%35,Elapsed,State -S today shows your recent job history from today, including job name, QoS, allocated resources, elapsed time, and state.
  • sacctmgr show assoc where user=$USER format=Account%20,QOS%60 lists the Slurm accounts and QoS tiers associated with your user.
  • sshare -u $USER -Ul shows your fairshare usage and priority information.
  • scancel JOBID cancels a job; replace JOBID with the numeric job ID you want to stop.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article