GT Home : : Campus Maps : : GT Directory

Systematic offlining of PACE nodes to address storage slowness

This entry was posted by on Tuesday, 21 November, 2017 at

We identified a problem with the way some nodes are mounting our main (GPFS) storage server, causing slow storage performance. The fix requires restarting the storage services on affected nodes individually, when they are not running any jobs. For this reason, we started draining (offlining) all affected nodes and systematically bringing them back online as soon as their jobs are complete and the fix is applied.

This issue does not impact running jobs other than storage slowness, but you will notice offline nodes in your queues until we address all affected nodes.

It’s safe to continue submitting jobs and there is no risk of data loss.

We are sorry for this inconvenience and thank you for your cooperation.

Comments are closed.