Update: GPFS storage is stabilized again. There remains several steps we need to take to complete this work, which can be completed without a downtime. We may need to take some nodes temporarily offline, which will be done in coordination with you without impacting the running jobs.
—–
PACE systems started experiencing wide-spread problems with the GPFS storage shortly after releasing jobs after the maintenance tasks are complete. At fist glance, they seem to be related to the Infiniband network.
We would like to advise against submitting new jobs until these issues are fully resolved. We will continue to work on a resolution and keep you updated on the progress.
Thank you for your patience.