We have been running an upgraded scheduler version on Joe and Atlas clusters, who had graciously volunteered to test it out since January. This version brings significant performance and stability improvements, and we are looking forward to roll out the upgrades to the rest of the PACE universe during this maintenance period. Please note the following important changes, which will apply to all PACE users.
- The new schedulers are not compatible with MPI versions older than mvapich/1.9 and openmpi/1.6.2. If you are using one of the older MPI stacks (a warning would be printed when you load their modules), you will need to replace it with one of the recent versions. This motivated the creation of a new and improved software repository, which will be available after the maintenance day. For more details, please see our related post.
- The current version uses a different type of database, so we will not be able to migrate submitted jobs. The scheduler will start with an empty queue, and you will need to resubmit your jobs after the maintenance day. This applies to Joe and Atlas jobs as well, as we are merging exclusive queues on a new and more powerful server with the exception of Tardis, Gryphon and Testflight.
- We will start using the “node packing” policy which allocates as many jobs on a node as possible before jumping on the next one. With the current version, users can submit many single-core jobs, each landing on a separate node, making it more difficult for the scheduler to start jobs that require entire nodes.
- This version fixes a bug that prevents use of msub for interactive jobs. The recommendation from the vendor company is to use “qsub” for everything (we confirmed that it’s much faster than msub), but this bug fix gives you the freedom to pick either tool.
- There will no longer be a discrepancy between job IDs generated by msub (Moab.###) and qsub (####). You will always see a single job ID (in plain number format) regardless of your msub/qsub preference.
- Speed — new versions of Moab and Torque are now multithreaded, making it possible for some query commands (e.g. showq) to return instantly regardless of the load on the scheduler. Currently, when a user submits a large job array, these commands usually timeout.
- Introduction of cpusets — when a user is given X cores, he/she will not be able to use more than that. Currently, users can easily violate the requested limits by spawning any number of processes/threads and Torque cannot do much to stop that. The use of cpusets will significantly reduce the job interference and allows us to finally use ‘node packing’ as explained above.
- Several other benefits from bug fixes and improvements are (including but not limited to) less number of zombie processes, lost output files, missing array jobs. We also expect visible improvements in job allocation times and less frequent command timeouts.
We hope these improvements will provide you with a more efficient and productive computing environment. Please let us know (pace-support@oit.gatech.edu) if you have any concerns or questions!