[Original Post 1/25/22 9:30 AM]
Summary: Phoenix project & scratch storage cable replacement potential outage and subsequent temporary decreased performance
What’s happening and what are we doing: A cable connecting one enclosure of the Phoenix Lustre device, hosting project and scratch storage, to one of its controllers needs to be replaced, beginning around 12:00 noon Wednesday (January 26). After the replacement, pools will need to rebuild over the course of about a day.
How does this impact me: Since there is a redundant controller, there should not be an outage during the cable replacement. However, a similar previous replacement caused storage to become unavailable, so this is a possibility. If this happens, your job may fail or run without making progress. If you have such a job, please cancel it and resubmit it once storage availability is restored.
In addition, performance will be slower than usual for a day following the repair as pools rebuild. Jobs may progress more slowly than normal. If your job runs out of wall time and is cancelled by the scheduler, please resubmit it to run again.
What we will continue to do: PACE will monitor Phoenix Lustre storage throughout this procedure. In the event of a loss of availability occurs, we will update you.
Please accept our sincere apology for any inconvenience that this temporary limitation may cause you. If you have any questions or concerns, please direct them to pace-support@oit.gatech.edu.