We’ve identified the source of problems which impacted all of the clusters this (4/7) afternoon. While making preparations to deploy some firewall upgrades for PACE, one of the campus network team members inadvertently applied a misconfiguration to one of our core network links. This resulted in widespread packet loss across the PACE internal network.
The head nodes seem to have recovered properly, but please let us know if you see continued issues there. While it is possible that jobs have been lost, we believe that most things will have recovered without loss.
We’ll continue to monitor the situation and address any remaining problems as soon as we are able.
PACE Team