PACE A Partnership for an Advanced Computing Environment

May 8, 2017

PACE quarterly maintenance – May 11, 2017

Filed under: Uncategorized — Semir Sarajlic @ 5:15 pm

PACE clusters and systems will be taken offline at 6am this Thursday (May 11) through the the end of Saturday (May 13). Jobs with long walltimes will be held by the scheduler to prevent them from getting killed when we power off the nodes. These jobs will be released as soon as the maintenance activities are complete.

Planned improvements are mostly transparent to users, requiring no user action before or after the maintenance.

Systems

  • We will deploy a recompiled kernel that’s identical to the current version except for a patch that addresses the dirty cow vulnerability. Currently, we have mitigation in place that prevents the use of debuggers and profilers (e.g. gdb, strace, Allinea DDT, etc). After the deployment of the patched kernel, these functions will once again be available for all nodes. Please let us know if you continue to have problems debugging or profiling your codes after the maintenance day.

Storage

  • Firmware updates on all of the DDN GPFS storage (scratch and most of the project storage)

Network

  • Upgrades to DNS servers, as recommended and performed by OIT Network Engineering
  • Software upgrades to the PACE firewall appliance to address a known bug
  • New subnets and re-assignment of IP addresses for some of the clusters

Power

  • PDU fixes that are impacting 3 nodes in c29 rack

The date for the next maintenance day is not certain yet, but we will announce it as soon as we have it.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress