WHEN IS IT HAPPENING?
PACE’s next Maintenance Period starts at 6:00AM on Monday January 13th, 01/13/2025, and is tentatively scheduled to conclude by 11:59PM on Thursday January 16th, 01/16/2025. The additional day is needed to accommodate additional testing needed due to the presence of both RHEL7 and RHEL9 versions of our systems as we migrate to the new Operating System. PACE will release each cluster (Phoenix, Hive, Firebird, ICE, and Buzzard) as soon as maintenance work and testing are completed. We will prioritize ICE to support Spring courses as soon as possible, and for the others, plan to focus on the largest portion of each system first (for Phoenix and Firebird where both OSs are present), to restore access to data and compute capabilities.
WHAT DO YOU NEED TO DO?
As usual, jobs with resource requests that would be running during the Maintenance Period will be held until after the maintenance by the scheduler. During this Maintenance Period, access to all the PACE-managed computational and storage resources will be unavailable. This includes Phoenix, Hive, Firebird, ICE, and Buzzard. Please plan accordingly for the projected downtime. CEDAR storage will not be affected.
WHAT IS HAPPENING?
ITEMS REQUIRING USER ACTION:
- [Phoenix] Continue migrating nodes to the RHEL 9 operating system, which will complete post-MD – after this, Phoenix will be 75% on the RHEL9 OS.
- [Hive] COMPLETE migrating nodes to the RHEL 9 operating system.
- [Phoenix and Hive] Default login behavior will change so that login-phoenix and login-hive will point to RHEL 9 login nodes rather than RHEL 7 nodes, which WILL trigger SSH warnings. For more information on SSH at PACE, see our documentation.
ITEMS NOT REQUIRING USER ACTION:
- [Phoenix, Hive, Firebird, ICE] Upgrade Slurm to 24.11.10
- [all] DataBank will perform cooling tower cleaning requiring all machines in the research hall to be powered off
- [all] Upgrade border firewall hardware
- [Phoenix,ICE] Upgrade IB (InfiniBand) switch firmware
- [Phoenix,Hive] Move Globus endpoints to new network to improve performance
- [ICE] Enable self-service container builds
- [Phoenix] Upgrade all storage servers to latest version to support performance improvements, covering scratch and project (coda1) storage
- [Firebird] Upgrades to underlying storage servers to improve functionality
WHY IS IT HAPPENING?
Regular maintenance periods are necessary to reduce unplanned downtime and maintain a secure and stable system.
WHO IS AFFECTED?
All users across all PACE clusters.
WHO SHOULD YOU CONTACT FOR QUESTIONS?
Please contact PACE at pace-support@oit.gatech.edu with questions or concerns.
Thank you,
-The PACE Team