PACE A Partnership for an Advanced Computing Environment

November 21, 2017

Systematic offlining of PACE nodes to address storage slowness

Filed under: Uncategorized — Semir Sarajlic @ 2:34 pm

We identified a problem with the way some nodes are mounting our main (GPFS) storage server, causing slow storage performance. The fix requires restarting the storage services on affected nodes individually, when they are not running any jobs. For this reason, we started draining (offlining) all affected nodes and systematically bringing them back online as soon as their jobs are complete and the fix is applied.

This issue does not impact running jobs other than storage slowness, but you will notice offline nodes in your queues until we address all affected nodes.

It’s safe to continue submitting jobs and there is no risk of data loss.

We are sorry for this inconvenience and thank you for your cooperation.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress