Kestrel Maintenance Progress

Feb. 9, 2024

The upgrades for Kestrel continue with considerable progress at the mid-point of the upgrade process. As a reminder, the visible changes to users will be an operating system upgrade to RedHat Enterprise Linux (RHEL) 8.8, as well as a new Cray Programming Environment 23.12.

Other updates are also being made to improve system performance and reliability, including software upgrades to the Slingshot high-speed networking, storage system updates, and a new version of the cluster management software for system administrators.

Phase II of the Kestrel installation is the addition of 132 new nodes, each with dual 64-core AMD CPUs (128 cores total), 384GB RAM, and 4x Nvidia H100 80GB GPUs per node. These GPU based nodes will arrive 2/6 with integration, testing and eventual release to users in the next few months.

This outage is currently still on schedule for a two-week window through Friday, February 9th, 2024, with a return to service during the week of February 12th, and we expect to use the full period for this work.

No Kestrel systems will be available during this maintenance period until we release a follow-up announcement to release the system back to regular use.

