PACE A Partnership for an Advanced Computing Environment

November 3, 2018

PACE clusters ready for research

Filed under: Uncategorized — Semir Sarajlic @ 8:44 pm

Our November 2018 maintenance (https://blog.pace.gatech.edu/?p=6360) is complete on schedule. We have brought compute nodes online and released previously submitted jobs. Login nodes are accessible and your data are available. As usual, there are a small number of straggling nodes we will address over the coming days, which includes nodes that will need PCIe connectors replaced as a preventative measure.

Completed Tasks

Compute

  • Complete – (no user action needed) Replace power components in a rack in Rich 133
  • Complete(no user action needed) Replace defective PCIe connectors on multiple servers
      • As a precaution, additional identified nodes will have their PCIe connectors replaced  when parts are delivered.  There will be no user action needed.

Network

  • Complete(no user action needed) Stress test new InfiniBand subnet managers, to prepare for the move to Coda
  • Complete(no user action needed) Change uplink connections from management switches

Storage

  • Complete(no user action needed) Verify integrity of GPFS file systems
  • Complete(no user action needed) Upgrade firmware on DDN / GPFS storage systems
  • Complete(no user action needed) Upgrade firmware on TruNAS storage systems

Other

  • Complete (some user action needed) Replaced PACE ICE schedulers with a physical server, to increase capacity and reliability.   Some jobs on PACE ICE cluster need to be re-submitted, and we have contacted the affected users individually. 

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress