PACE A Partnership for an Advanced Computing Environment

July 23, 2019

[Complete] PACE Quarterly Maintenance – August 8-10

Filed under: Uncategorized — Tags: , — Semir Sarajlic @ 8:03 pm

[August 9, 2019 Update] Our August 2019 maintenance ( https://blog.pace.gatech.edu/?p=6511 ) is complete one day ahead of schedule!  We have brought compute nodes online and released previously submitted jobs.  Login nodes are accessible and your data are available.

[August 2, 2019 Update]

NO USER ACTION NEEDED ITEMS:

  • Network connections to PACE-RTR will be upgraded. Connectivity in and out of the Rich Data Center will be disrupted on Friday morning. VAPOR network will not be affected.
  • Additional space will be configured for license server.
  • OS and application patches will be applied to Red Hat Enterprise Linux (RHEL) 7 servers, effectively upgrading to RHEL 7.6.
  • OS and application patches will be applied to testflight nodes, to begin testing new versions of kernel and libraries.
  • PACE management scripts and utilities will be upgraded, to improve reliability and performance.
  • The submit filter for jobs on the RHEL 6 clusters will be modified to allow proper formatting of commands. This filter is already in place on RHEL 7 clusters.
  • Upgrade DNS appliances; no downtime is expected due to redundant configuration.

Please send questions and/or comments to pace-support@oit.gatech.edu

 

[July 23, 2019] We are preparing for a maintenance day on August 8 – 10, 2019. This maintenance day is planned for three days and will start on Thursday, August 8, and go through Saturday, August 10.

As usual, jobs with long walltimes will be held by the scheduler to ensure that no active jobs will be running when systems are powered off. These jobs will be released as soon as the maintenance activities are complete.  

In general, we will be working on upgrading all of the RHEL7 production nodes to latest 7.6 kernel, update connection to and from PACE routers, and add additional disk capacity to our license server.  While we are still working on finalizing the task list and details, none of these tasks are expected to require any user actions.

October 18, 2011

Maintenance Day Has Begun (All Clusters are Down)

Filed under: tech support — Tags: — Semir Sarajlic @ 1:20 pm

As scheduled, the compute clusters have been brought down for maintenance activities.

Some of the work now progressing:

  • Network redundancy changes
  • Filesystem moves
  • Updates to critical systems
  • Change to directory services

We’ll let you know when we’re back and ready to compute.

Powered by WordPress