Reminder folks, the clusters will be down on this coming Tuesday, October 18.
All of the currently running jobs will have completed by then, and the scheduler has been instructed to not start any new jobs that will not complete by then. Jobs that have been submitted, but wouldn’t complete by Tuesday morning are being held by the scheduler, and will be released as nodes become available after our maintenance activities.
Major items on the list this time around are:
- swap over to redundant network switches for the core of the HPC network
- Panasas software update to version 4.1
- routine Solaris and RedHat patching to non-user facing infrastructure services
- routine security patches to ssh everywhere
- migration of infrastructure services to virtual machines
- migration to new infrastructure-facing LDAP schema
- reinstating storage quotas missed in our previous maintenance
Some further minor things we’ll take care of as well:
- load testing on some infrastructure servers
- migrate the /hp3 filesystem to different fileserver, we put it on the wrong one; (no user impact expected)
- OIT/Operations will be performing preventative maintenance on the UPS
- OIT/Operations will be verifying some electrical circuit locations
- update ganglia monitoring agents on all RHEL5 machines
- reboot everything