January maintenance is complete, and clusters started accepting and running jobs. We accomplished all of the primary objectives, and even found time to address a few bonus items.
Most importantly, we completed updating the resource and scheduling managers (torque and moab) throughout the entire PACE realm. This upgrade should bring visible improvements in the speed and reliability. Please note that the job submission process will show some differences after this update, therefore we strongly encourage you to read the transition guide here: http://www.pace.gatech.edu/job-submissionmanagement-transition-guide-jan-2014
Also, please make sure that you check the FAQ for common problems and their solutions by running the command on your headnode: jan2014-faq (use the spacebar to skip pages).
We had a hardware failure in the DDN storage system, which caused an interruption in the planned Biocluster data transfer. We expect to receive the replacement parts and fix the system in a few days. This failure has not caused any data loss, and the system will be up and running (perhaps with some performance degradation). We learned that the repairs will require a short downtime, and we will soon get in touch with the users of Gryphon, Biocluster and Skadi clusters (current users of this system), for scheduling this work.
Other accomplishments include:
– Optimus is now a shared cluster. All Optimus users now have access to optimusforce-6 and iw-shared-6.
– All of the Atlas nodes are upgraded to RHEL6.
– Most of the Athena nodes are upgraded to RHEL6.
– The old scheduler server (repace) is replaced with the upgraded (shared-sched). You may notice a difference in the generated job numbers and files.
– Some networking cable cleanup and improvements
– Gryphon has new scheduler and login servers, and the nodes used for these purposes have been put back in the computation pool.
– Deployed project file space quotas as previously agreed with PIs to users who did not have quotas prior to maintenance, and adjusted for those already over to allow some head room before abutting their quota. To check your quotas, use “quota -s”.