We look to be back up at this point. The root cause seems to have been a problem with the subnet manager that controls the Infiniband network. Since GPFS uses this network, the issue initially manifested as a storage problem. However, many MPI codes use this network as well and may have crashed.
Again, we apologize for the inconvenience. Please do check on your jobs if you use MPI.