PACE A Partnership for an Advanced Computing Environment

July 25, 2017

Storage (GPFS) Issue Update

Filed under: Uncategorized — Semir Sarajlic @ 8:37 pm

We are seeing a reduction in the GPFS filesystem problems over the past weekend, and are continuing to actively work with the vendor. We don’t have a complete solution yet, but have observed greater stability for compute nodes in the GPFS filesystem. Thank you for your patience – we will continue to keep you updated as much as possible as the situation changes.

July 14, 2017

Storage (GPFS) Issue Update

Filed under: Uncategorized — Semir Sarajlic @ 12:54 am

While the problem wasn’t very widespread and we have improved the reliability, we have not yet arrived at a full solution and are still actively working on the problem. We now believe the problem is due to the recent addition of many compute nodes, ultimately bringing us into the next tier of system-level tuning needed for the filesystem. Thank you for your patience – we will continue to provide updates as they become available.

July 12, 2017

Storage (GPFS) Issue

Filed under: Uncategorized — Semir Sarajlic @ 4:32 pm

We are experiencing intermittent problems with the GPFS storage system that hosts scratch and project directories (~/scratch, and ~/data). At the moment, we are exploring this failure with the vendor if this may be related to the recent cluster nodes that have been brought online.

This issue has potential impact on running jobs. We are actively working on the problem, apologize for the inconvenience, and will update as soon as possible.

Powered by WordPress