Empire AI Alpha — Interactive Slurm Quickstart

Created by Cesar Arias, Modified on Fri, 22 May at 2:36 AM by Cesar Arias

Alpha Quickstart

Empire AI Alpha — Interactive Slurm Quickstart

This guide shows how to start an interactive session on Empire AI Alpha, then optionally load modules, create a Python virtual environment, and run a quick PyTorch test. It covers CPU-only and GPU-interactive workflows and explains where Grace fits into CPU-heavy work.

Series navigation

Partition map: Use -p cpu for the x86 CPU-only node alphacpu01, -p grace for the Grace CPU-only nodes under the same Slurm instance, and -p YOUR_PARTITION -A YOUR_ACCOUNT for current GPU access until the future alpha partition becomes the standard GPU target.

Mixed-architecture note: Alpha and cpu are x86_64. Grace is ARM64 / aarch64. If you move work between Alpha and Grace, keep separate environments and review Alpha and Grace mixed-architecture guidance.

1) Log in

ssh <username>@alpha1.empireai.edu

2) Discover what you can run

sinfo
sacctmgr show assoc user=$USER format=User,Account,QOS%30

sinfo shows available partitions, and sacctmgr shows which accounts and QoS values your user can charge.

3) Interactive CPU job on alphacpu01

# Quick x86 CPU-only setup or debugging on alphacpu01
salloc -p cpu -A YOUR_ACCOUNT -c 4 --mem=16G -t 0:30:00 --job-name=cpu-interactive
srun --pty bash -l

Use the cpu partition for small CPU-only tests, package setup, or quick debugging on alphacpu01.

4) CPU-heavy work on Grace nodes

# Heavier CPU or memory-bound work on Grace nodes
salloc -p grace --qos=interactive -c 4 --mem=16G -t 01:00:00 --job-name=grace-interactive
srun --pty bash -l

Use the grace partition for larger CPU-heavy preprocessing, MPI, and memory-intensive workflows. Grace and Alpha share the same Slurm environment, but software compiled on one architecture may not run on the other.

5) Interactive GPU job — current pattern

salloc -p YOUR_PARTITION -A YOUR_ACCOUNT -c 4 --mem=16G -t 0:30:00 \
  --job-name=gpu-interactive --qos=interactive --gres=gpu:1
srun --pty bash -l

Inside the GPU session:

nvidia-smi

Current capacity: Alpha’s GPU pool is 192 GPUs across 24 GPU nodes, with 8 GPUs per node. If you need more than one GPU, request --gres=gpu:2, --gres=gpu:4, and so on.

6) Interactive GPU job — future alpha pattern

# Future pattern once the alpha partition becomes the standard GPU target
salloc -p alpha -A YOUR_ACCOUNT -c 4 --mem=16G -t 0:30:00 \
  --job-name=gpu-interactive --qos=interactive --gres=gpu:1
srun --pty bash -l

7) Modules and Python venv

module avail
module load Python/3.10.15
python -V
python -m venv ~/venvs/torch
source ~/venvs/torch/bin/activate
python -m pip install -U pip setuptools wheel

8) Install PyTorch

CPU-only:

pip install -U torch numpy

GPU-enabled PyTorch inside a GPU session:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install -U numpy

9) Quick GPU test

python torch_gpu_test.py