At approximately 10:40 this morning, the failure of a top-of-rack network switch in the P31 rack of our data center failed. This caused a loss of network connectivity for approximately 44 compute nodes across a wide variety of queues. (see below) No other compute nodes are affected. Jobs running on these nodes will likely have failed as a result. The OIT network team is swapping in a replacement at the moment, and PACE staff are working to restore service as quickly as possible.
If you have access to any of the queues below, please check on their status and resubmit as needed. You can check which queues you have access to by using the ‘pace-whoami’ command.
We apologize for the inconvenience, and will work to bring these nodes back online as soon as possible. If you have additional questions, please email pace-support@oit.gatech.edu.
aces
athena-intel
biocluster-6
bioforce-6
blue
chow
cochlea
dimer-6
dimerforce-6
granulous
hygene-6
hygeneforce-6
iw-shared-6
joe-6-intel
math-6
mathforce-6
orbit
prometforce-6
prometheus
sonar-6
sonarforce-6
starscream