As you may know, we are preparing for upgrading the scheduler versions (that are known to be faster and less buggy) on the next maintenance day (01/26/2015, Tue).
The “testflight-sched” scheduler, which runs “testflight” and “ligo-6” queues, will receive these updates earlier for testing, most likely today. The upgrades will be mostly transparent from users, with the exception of 30min (estimated) downtime on the scheduler server as well as “testflight-6” and “ligo-6” headnodes. For the duration of scheduler upgrade, your queries and commands will return “Cannot reach server”. The headnodes will also need to be rebooted several times, so please make sure you don’t use them for anything critical (text editing, interactive matlab sessions, etc). We confirmed that old client services on the compute nodes can still communicate with the new server, so we will be able to upgrade nodes one-by-one, without killing any running jobs, as they become idle.
Once the upgrades are complete (we will let you know), we strongly encourage every PACE user to run at least a few test jobs on testflight to make sure everything will work after the upgrades. We cannot express enough the importance of testing the new version, given our past experience with scheduler upgrades. Please contact us as soon as possible if you notice any problems or odd behavior.
Another reason for this early upgrade is to finalize upgrade procedures, which means that they are not tested yet. Therefore, expect problems and don’t rely on testflight for anything critical (which is a warning that applies to this testflight at all times, as the name suggests).
Thank you in advance for your cooperation and feedback!