Alpha + Beta hardware (fall 2025)

Created by Robert Harrison, Modified on Tue, 15 Jul at 3:41 PM by Robert Harrison

Over the summer of 2025 we are upgrading Alpha (adding more H100 GPU nodes, expanding both file systems) and also installing a brand new computer name Beta. Alpha and Beta will share the same login nodes, data transfer nodes, parallel file systems, and Slurm batch system. Beta should be up to ∼7x faster than Alpha on ML training and up to ∼20x on inference. Its DGX GB200 NVL72 SuperPOD technology tightly integrates the GPUs, enabling efficient training of multi-trillion parameter models.

Note that Alpha employs x86 CPUs and H100 GPUs, whereas Beta employs ARM CPUs and B200 GPUs. Thus, between Alpha, the Grace-Grace nodes, and the SuperPOD, researchers can pick the tool best suited for their needs.

Note that many of NVIDIA's containers (https://catalog.ngc.nvidia.com/containers) are multi-architecture --- i.e., they transparently work on all of the CPUs and GPUs on Alpha+Beta and likely also at your home institutions. This compatibility along with their additional testing and anticipated superior performance are key reasons to adopt those tools.

Alpha+ (the plus referring to the upgraded Alpha)

18 HGX nodes each with
- 8 H100 80GB GPUs per node
- 10 400Gb/s ConnectX-7 NIC Cards (8 for IB and 2 for Ethernet)
- 30TB NVMe caching space
- 2TB of system memory
- x86 processor

(https://www.lenovo.com/us/en/p/servers-storage/servers/racks/lenovo-thinksystem-sr680a-v3-rack-server/len21ts0030)

Beta

NVIDIA DGX GB200 NVL72 SuperPOD with 288 B200 GPUs for AI applications
(https://www.nvidia.com/en-us/data-center/dgx-gb200/)
(https://docs.nvidia.com/dgx-superpod/reference-architecture-scalable-infrastructure-gb200/latest/dgx-superpod-components.html)
60 nodes of NVIDIA Grace-Grace ARM Superchip for HPC and data processing
(https://dl.acm.org/doi/10.1145/3636480.3637097)

Storage

20+ Petabytes effective flash storage from VAST as our primary advanced data platform initially served via NFS
10+ Petabytes flash storage from DDN for high-performance data served via Lustre

The SuperPOD will include ∼70 TByte of LPDDR5X memory at up to 18.4 Tbyte/s aggregate and ∼54 TByte of HBM3e memory at up to 512 Tbyte/s aggregate. The tensor cores will achieve 5.76 exaFLOP/s in FP4, 2.88 in FP8, 1.44 in BF16/FP16, and 0.72 in FP32. Non-tensor performance is 23.0 petaFLOP/s in FP32 and 11.0 petaFLOP/s in FP64. Grace is a general-purpose processor with high memory bandwidth that delivers excellent floating-point and integer performance across a wide range of applications. It also maintains binary compatibility with the rest of Beta. Each node includes 144 cores of ARM Neoverse V2 at 3.5Hz (8 teraFLOP/s FP64) with 0.9 Tbyte of LPDDR5X memory (0.95 Tbyte/s).