Over the summer of 2025 we are upgrading Alpha (adding more H100 GPU nodes, expanding both file systems) and also installing a brand new computer name Beta. Alpha and Beta will share the same login nodes, data transfer nodes, parallel file systems, and Slurm batch system. Beta should be up to ∼7x faster than Alpha on ML training and up to ∼20x on inference. Its DGX GB200 NVL72 SuperPOD technology tightly integrates the GPUs, enabling efficient training of multi-trillion parameter models.
Note that Alpha employs x86 CPUs and H100 GPUs, whereas Beta employs ARM CPUs and B200 GPUs. Thus, between Alpha, the Grace-Grace nodes, and the SuperPOD, researchers can pick the tool best suited for their needs.
Note that many of NVIDIA's containers (https://catalog.ngc.nvidia.com/containers) are multi-architecture --- i.e., they transparently work on all of the CPUs and GPUs on Alpha+Beta and likely also at your home institutions. This compatibility along with their additional testing and anticipated superior performance are key reasons to adopt those tools.
Alpha+ (the plus referring to the upgraded Alpha)
- 18 HGX nodes each with
Beta
- NVIDIA DGX GB200 NVL72 SuperPOD with 288 B200 GPUs for AI applications
(https://www.nvidia.com/en-us/data-center/dgx-gb200/)
(https://docs.nvidia.com/dgx-superpod/reference-architecture-scalable-infrastructure-gb200/latest/dgx-superpod-components.html) - 60 nodes of NVIDIA Grace-Grace ARM Superchip for HPC and data processing
(https://dl.acm.org/doi/10.1145/3636480.3637097)
Storage
- 20+ Petabytes effective flash storage from VAST as our primary advanced data platform initially served via NFS
- 10+ Petabytes flash storage from DDN for high-performance data served via Lustre
The SuperPOD will include ∼70 TByte of LPDDR5X memory at up to 18.4 Tbyte/s aggregate and ∼54 TByte of HBM3e memory at up to 512 Tbyte/s aggregate. The tensor cores will achieve 5.76 exaFLOP/s in FP4, 2.88 in FP8, 1.44 in BF16/FP16, and 0.72 in FP32. Non-tensor performance is 23.0 petaFLOP/s in FP32 and 11.0 petaFLOP/s in FP64. Grace is a general-purpose processor with high memory bandwidth that delivers excellent floating-point and integer performance across a wide range of applications. It also maintains binary compatibility with the rest of Beta. Each node includes 144 cores of ARM Neoverse V2 at 3.5Hz (8 teraFLOP/s FP64) with 0.9 Tbyte of LPDDR5X memory (0.95 Tbyte/s).
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article