update (5/18/2018, 4:15pm): We’ve identified a large number of jobs overloading the storage and worked with their owners to delete them. This resulted in an immediate improvement in performance. Please let us know if you observe any of the slowness comes back over the weekend.
original post: PACE is aware of GPFS (storage) slowness that impacts a large fraction of users from the pace1 and menon1 systems. We are actively working, with guidance from the vendor, to identify the root cause and resolve this issue ASAP.
This slowness is observed from all nodes mounting this storage, including headnodes, compute nodes and the datamover.
We believe that we’ve found the culprit, but more investigation is needed for verification. Please continue to report any slowness problems to us.