[Update 7/22/20 1:00 PM]
Hive and testflight-coda systems were restored early this morning. Systems have returned to normal operation, and user jobs are running. If you were notified of a lost job, please resubmit it at this time.
Georgia Power does not plan to conduct any tests today. No additional information about the cause of yesterday’s outage is available at this time.
[Update 7/21/20 11:00 PM]
[Update 7/21/20 3:15 PM]
Unfortunately, the planned testing of the Georgia Power Micro Grid this week has led to a loss of power in the Coda research hall, home to compute nodes for Hive & testflight-coda. Any running jobs on those clusters will have failed at this time. Access to login nodes and storage, housed in the Coda enterprise hall, is uninterrupted.
We are sorry for what we know if a significant interruption to your work.
At this time, teams are working to restore power to the system. We will provide an update when available.
[Update 7/14/20 4:00 PM]
Georgia Power will be conducting additional bypass tests for the MicroGrid power generation facility for the Coda datacenter (Hive & testflight-coda clusters) during the week of July 20-24. These tests represent a slightly higher risk of disruption than the tests conducted in June, but the risk has been substantially lowered by additional testing last month.
As before, we do not expect any disruption to PACE compute resources. PACE’s storage and head nodes have UPS and generator backup power, but compute nodes do not. In the event of an unexpected complication during testing, compute nodes could lose power for a brief period, disrupting running jobs. Georgia Power, DataBank, OIT’s network team, and PACE will all have staff on standby during these tests to ensure a quick repair in the event of an unexpected outage.
Please contact us at pace-support@oit.gatech.edu with any questions.
Visit https://blog.pace.gatech.edu/?p=6778 for full details on this power testing.