GT Home : : Campus Maps : : GT Directory

PACE maintenance day – July 16

This entry was posted by on Wednesday, 3 July, 2013 at

Dear PACE cluster users,

The time has come again for our quarterly maintenance day, and we would like to remind you that all systems will be powered off starting at 6:00am on Tuesday, July 16, and will be down for the entire day.

None of your jobs will be killed, because the job scheduler knows about the planned downtime, and does not start any jobs that would be still running by then. You might like to check the walltimes for the jobs you will be submitting and modify them accordingly so they will complete sometime before the maintenance day, if possible. Submitting jobs with longer walltimes is still OK, but they will be held by the scheduler and released right after the maintenance day.

We have many tasks to complete, here are the highlights:

  1. transition to a new method of managing our configuration files – We’ve referred to this in the past as ‘database-based configuration makers’. We’ve been doing a lot of testing on this the last few months and have things ready to go. I don’t expect this to cause any visible change to your experience, just give us a greater capability to manage more and more equipment.
  2. network redundancy – we’re beefing up our ethernet network core for compute nodes. Again, not an item I expect to be a change to your experience, just improvements to the infrastructure.
  3. Panasas code upgrade – This work will complete the series of bug fixes from Panasas, and all us to reinstate the quotas on scratch space. We’ve been testing this code for many weeks and have not observed any detrimental behavior. This is potentially a visible change to you. We will reinstate the 10TB soft and 20TB hard quotas. If you are using more than 20TB of our 215TB scratch space, you will not be able to add additional files or modify existing files in scratch.
  4. decommissioning of the RHEL5 version of the FoRCE cluster – This will allow us to add 240 CPU cores to the RHEL6 side of the FoRCE cluster, pushing force-6 over 2,000 CPU cores. We’ve been dwindling this resource for some time now, this just finishes it off. Users with access to FoRCE currently have access to both RHEL5 and RHEL6 sides, access to RHEL6 via the force-6 head node will not change as part of this process.

As always, please contact us via pace-support@oit.gatech.edu for any questions/concerns you may have.

Comments are closed.