Greetings everybody. It’s again time for the quarterly PACE maintenance. As usual, we will have all PACE clusters down Tuesday and Wednesday of next week, April 21 and 22. We’ll get started at 6:00am on Tuesday, and have things back to you as soon as possible.There’s some significant changes this time around, so please continue on.
Moab/Torque scheduler:
Last maintenance period, we deployed a new scheduler for the Atlas and Joe clusters. This time around, we’re continuing that rollout to the rest of our clusters. Some of the highlights are:
- increased responsiveness from commands like qsub & showq
- need to resubmit jobs that haven’t run yet
- removal of older, incompatible versions of mvapich and openmpi
- required changes to our /usr/local software repository
Mehmet Belgin has posted a detailed note about the scheduler upgrade on our blog here. He has also posted a similar note about the related updates to our software repository here.
Additionally, the x2200-6.3 queues on the Atlas cluster will be renamed to atlas-6-sunib and atlas-6-sunge.
Networking:
We’ve deployed upgraded network equipment to upgrade the core of the PACE network to 40-gigabit ethernet and will transition to this new core during the maintenance period. This new network brings additional capability to utilize data center space outside of the Rich building, and provides a path for future 100-gigabit external connectivity and ScienceDMZ services. Stay tuned for further developments. 😉 Additionally, the campus network team will be upgrading the firmware of a number of our existing switches with some security related fixes.
Storage:
The network upgrades above will allow us to relocate some of our project directory servers to OIT data center space on Marietta Street, as we’re pressed for generator-protected space in Rich. We will also be doing some security patching, highly recommended updates and performance optimizations on the DDN/GPFS storage. As a stretch goal, we will also migrate some filesystems to GPFS. If we are pressed for time, they will move with their old servers. Details regarding which filesystems are available on our blog here.
Operating System patching:
Last, but not least, we have a couple of OS patches. We’ll complete the rollout of a glibc patch for the highly publicized “Ghost” vulnerability, as well as deploy a bug fix for autofs that addresses a bug which would sometimes cause a failure to mount /nv filesystems.