[Update 5/11/23]
The Phoenix, Hive, Firebird, and Buzzard clusters are now ready for research. We have released all jobs that were held by the scheduler.
The ICE instructional cluster remains under maintenance until tomorrow. Summer instructors will be notified when the upgraded ICE is ready for use.
The next maintenance period for all PACE clusters is August 8, 2023, at 6:00 AM through August 10, 2023, at 11:59 PM. An additional maintenance period for 2023 is tentatively scheduled for October 24-26, 2023 (note revised date).
Status of activities:
- [Complete][Login nodes] Implement enforcement of usage limits on the login nodes, limiting each individual to 1 CPU, 4 GB memory, and 5000 open files. These limits should reduce the possibility of one individual’s processes causing a login node outage. Researchers are reminded to use interactive jobs for resource-intensive activities via OnDemand Interactive Shell or the command line (Phoenix, Hive, and Firebird instructions).
- [In progress][ICE] The instructional cluster will be migrated to the Slurm scheduler; new Lustre-based scratch storage will be added; and home directories will be migrated. PACE-ICE and COC-ICE will be merged. Additional information will be available for instructors on ICE.
- [Complete][Phoenix Storage] Phoenix scratch will be migrated to a new Lustre device, which will result in fully independent project & scratch filesystems. Researchers will find their scratch data remains accessible at the same path via symbolic link or directly via the same mount location.
- [Complete][Datacenter] Connect new cooling doors to power for datacenter expansion
- [Complete][Datacenter] High-temperature loop pump maintenance
- [Complete][Storage] Replace cables on Hive and Phoenix parallel filesystems
- [Complete]Network] Upgrade ethernet switch code in Enterprise Hall
- [Complete][Network] Configure virtual pair between ethernet switches in Research Hall
If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.
[Update 5/2/23]
This is a reminder that the next PACE Maintenance Period starts at 6:00AM on Tuesday, 05/09/2023, and is tentatively scheduled to conclude by 11:59PM on Thursday, 05/11/2023.
Maintenance on the ICE instructional cluster is expected to continue through Friday, 05/12/2023.
Updated planned activities:
WHAT IS HAPPENING?
ITEMS NOT REQUIRING USER ACTION:
- [Login nodes] Implement enforcement of usage limits on the login nodes, limiting each individual to 1 CPU, 4 GB memory, and 5000 open files. These limits should reduce the possibility of one individual’s processes causing a login node outage. Researchers are reminded to use interactive jobs for resource-intensive activities via OnDemand Interactive Shell or the command line (Phoenix, Hive, and Firebird instructions).
- [ICE] The instructional cluster will be migrated to the Slurm scheduler; new Lustre-based scratch storage will be added; and home directories will be migrated. PACE-ICE and COC-ICE will be merged. Additional information will be available for instructors on ICE.
- [Phoenix Storage] Phoenix scratch will be migrated to a new Lustre device, which will result in fully independent project & scratch filesystems. Researchers will find their scratch data remains accessible at the same path via symbolic link or directly via the same mount location.
- [Datacenter] Connect new cooling doors to power for datacenter expansion
- [Datacenter] High-temperature loop pump maintenance
- [Storage] Replace cables on Hive and Phoenix parallel filesystems
- [Network] Upgrade ethernet switch code in Enterprise Hall
- [Network] Configure virtual pair between ethernet switches in Research Hall
[Original Announcement 4/24/23]
WHEN IS IT HAPPENING?
The next PACE Maintenance Period starts at 6:00AM on Tuesday, 05/09/2023, and is tentatively scheduled to conclude by 11:59PM on Thursday, 05/11/2023.
Maintenance on the ICE instructional cluster is expected to continue through Friday, 05/12/2023.
WHAT DO YOU NEED TO DO?
As usual, jobs with resource requests that would be running during the Maintenance Period will be held until after the maintenance by the scheduler. During the Maintenance Period, access to all PACE-managed computational and storage resources will be unavailable. This includes Phoenix, Hive, Firebird, PACE-ICE, COC-ICE, and Buzzard.
WHAT IS HAPPENING?
ITEMS NOT REQUIRING USER ACTION:
- [Login nodes] Implement enforcement of usage limits on the login nodes, limiting each individual to 1 CPU, 4 GB memory, and 5000 open files. These limits should reduce the possibility of one individual’s processes causing a login node outage. Researchers are reminded to use interactive jobs for resource-intensive activities via OnDemand Interactive Shell or the command line (Phoenix, Hive, and Firebird instructions).
- [ICE] The instructional cluster will be migrated to the Slurm scheduler; new Lustre-based scratch storage will be added; and home directories will be migrated. PACE-ICE and COC-ICE will be merged. Additional information will be available for instructors on ICE.
- [Datacenter] Connect new cooling doors to power for datacenter expansion
- [Datacenter] High-temperature loop pump maintenance
- [Storage] Replace Input/Output Modules on two storage devices
WHY IS IT HAPPENING?
Regular maintenance periods are necessary to reduce unplanned downtime and maintain a secure and stable system. Future maintenance dates may be found on our homepage.
WHO IS AFFECTED?
All users across all PACE clusters.
WHO SHOULD YOU CONTACT FOR QUESTIONS?
Please contact PACE at pace-support@oit.gatech.edu with questions or concerns.