[Update – 02/04/2022 10:24AM]
Dear PACE Researchers,
We are following up to inform you that all PACE clusters have resumed normal operations and clusters are accepting new user jobs. After the cooling loop was restored last night, datacenter’s operating temperatures had returned to normal and remained stable.
As previously mentioned, this outage should not have impacted any running jobs as PACE had only powered off idle compute nodes, so there is no user action required. Thank you for your patience as we worked through this emergency outage in coordination with Databank. If you have any questions or concerns, please contact us at pace-support@oit.gatech.edu.
Best,
The PACE Team
[Original Post]
Dear PACE Researchers,
Due to a cooling issue in the Coda datacenter, we were asked to power off as many nodes as possible to control temperature in the research hall. At this time, Databank has recovered the cooling loop, and temperatures have stabilized. However, all PACE job schedulers will remain paused to help expedite the return to normal operating temperatures in the datacenter.
These events should have had no impact on running jobs, so no action is required at this time. We expect normal operation to resume in the morning. As always, if you have any questions, please contact us at pace-support@oit.gatech.edu.
Best,
The PACE Team