[3/18/24 10:00 AM]
Full functionality of all PACE clusters has been restored, and the schedulers have resumed launching queued jobs. Please resubmit any jobs that may have failed over the weekend.
A migration of GT’s DNS services on Saturday from BlueCat to Efficient IP caused widespread outages over the weekend to PACE and other campus services. DNS records began to disappear at 5 PM on Saturday and were patched late Saturday night, with PACE login access reappearing on Sunday morning as changes propagated.
All jobs running on Phoenix and Firebird between 5:30 PM on Saturday, March 16, and 9:00 AM on Monday, March 18, will be refunded.
Thank you for your patience as we recovered from the DNS outage.
[3/16/24 7:15 PM]
Summary: All PACE clusters (Phoenix, Hive, ICE, Firebird, and Buzzard) are currently unreachable due to a domain name resolution (DNS) issue.
Details: We are investigating a DNS issue that has left all PACE clusters unreachable. No further information is known at this time. We are pausing the scheduler on all clusters to prevent additional jobs from starting.
Impact: It will not be possible to access any PACE cluster via ssh or OnDemand at this time. Running jobs may be impacted on all clusters except Firebird. If you are already connected to a PACE cluster, scheduler and other commands may fail with address resolution errors on all clusters except Firebird.
Thank you for your patience as we work to restore access to PACE clusters. Please contact us at pace-support@oit.gatech.edu with any questions. Please visit status.gatech.edu for updates.