[MAINTENANCE] Alpha Cluster: Hardware Expansion and Grace-Grace Integration (May 6–8)

Posted 19 days ago by Cesar Arias

  • Pinned Topic
C
Cesar Arias Admin

Overview: Empire AI will be performing scheduled maintenance to integrate new Grace-Grace hardware resources with Alpha. This 48 hour window is essential for the configuration and scaling of our compute capacity to support the growing needs of the user community.


Maintenance Window:

  • Start: Wednesday, May 6, 2026 @ 8:00 AM ET

  • End: Friday, May 8, 2026 @ 8:00 AM ET

  • Duration: 48 Hours

What this means for you:

  • Access: Login and compute access will be unavailable during this time.

  • Automated Job Handling: A Slurm reservation is in place. The scheduler will automatically hold any jobs that are estimated to finish after the 8:00 AM start time on Wednesday. These jobs will remain in a pending state until the maintenance is complete.

  • Resume Policy: Once the cluster is back online and the system is verified, all pending and held jobs will automatically resume their position in the queue. 

Purpose of Work: This downtime allows our systems team to perform the necessary back end configuration to make the new Grace-Grace resources available. This upgrade increases the total throughput and specialized hardware available to all researchers. Thank you for your continued partnership as we grow the Empire AI ecosystem. For real time updates, please monitor this discussion thread or our Slack channel.

0 Votes


0 Comments

Login to post a comment