Summary: The scratch file system became unresponsive yesterday evening (~5:50pm) when some of the network controllers stopped working, causing an outage that may have resulted in difficulties logging into login nodes and writing to scratch.
Details: The file system was recovered this morning after restarting the controllers and all the Lustre components. The Slurm scheduler was also paused to troubleshoot issues with the cluster and has been re-released.
Impact: The file system and scheduler should now be fully functional. Users may have had issues accessing the Phoenix cluster yesterday evening and this morning. Compute jobs ongoing during that time period may have also been affected, so we recommend reviewing jobs run during that time period.
Thank you for your patience. Please contact us at pace-support@oit.gatech.edu with any questions.