Our November 2018 maintenance (https://blog.pace.gatech.edu/?p=6360) is complete on schedule. We have brought compute nodes online and released previously submitted jobs. Login nodes are accessible and your data are available. As usual, there are a small number of straggling nodes we will address over the coming days, which includes nodes that will need PCIe connectors replaced as a preventative measure.
Completed Tasks
Compute
- Complete – (no user action needed) Replace power components in a rack in Rich 133
- Complete – (no user action needed) Replace defective PCIe connectors on multiple servers
-
- As a precaution, additional identified nodes will have their PCIe connectors replaced when parts are delivered. There will be no user action needed.
-
Network
- Complete – (no user action needed) Stress test new InfiniBand subnet managers, to prepare for the move to Coda
- Complete – (no user action needed) Change uplink connections from management switches
Storage
- Complete – (no user action needed) Verify integrity of GPFS file systems
- Complete – (no user action needed) Upgrade firmware on DDN / GPFS storage systems
- Complete – (no user action needed) Upgrade firmware on TruNAS storage systems
Other
- Complete – (some user action needed) Replaced PACE ICE schedulers with a physical server, to increase capacity and reliability. Some jobs on PACE ICE cluster need to be re-submitted, and we have contacted the affected users individually.