GT Home : : Campus Maps : : GT Directory

[Complete] PACE Maintenance – May 14-16

This entry was posted by on Friday, 1 May, 2020 at

[Update 5/15/20 9:30 PM]

We are pleased to announce that our May 2020 maintenance period has completed ahead of schedule. We have restored access to computational resources, and previously queued jobs will start as resources allow. The login nodes and storage systems are now accessible.
As usual, there are a small number of straggling nodes that will require additional intervention.

A summary of the changes and actions accomplished during this maintenance period:
– (Completed) [Hive/Testflight-Coda] Georgia Power began work to establish a Micro Grid power generation facility for Coda. Power has been restored.
– (Completed) [Hive] Default modules were changed to point to pace/2020.01 from pace/2019.08, which uses an updated MPI and compiler. Users employing default modules will need to update their PBS scripts to ensure their workflows will succeed. This is described at http://docs.pace.gatech.edu/hive/software_guide/#meta-module-updates-2020.
Functionally, default MPI and compiler are only patch updates, with the use of Intel 19.0.5 (previously 19.0.3) and MVAPICH 2.3.2 (previously 2.3.1). However, the base of software under the new hierarchy have been rebuilt, which may impact specific use of compiled and MPI-compiled applications.
Users may be impacted. Access to the old PACE software basis can be done interactively and within scripts simply by loading “module load pace/2019.08”.
PACE encourages users to migrate to using the new defaults, as new software will continue to be added to the newest software basis. Users may preserve current workflows by using the older pace module described above.

– (Completed) Performed upgrades and replacements on several infiniband switches in the Rich datacenter.
– (Completed) Replaced other switches and hardware in the Rich datacenter.
– (Completed) Updated software modules in Hive.
– (Completed) Updated salt configuration management settings on all the production servers.

If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.
Thank you for your patience!

[Update 5/13/20 10:30 AM]

We would like to remind you that we are preparing for our next PACE maintenance period, which will begin at 6:00 AM tomorrow and conclude at 11:59 PM on May 16. As usual, jobs with long walltimes will be held by the scheduler to ensure that no active jobs will be running when systems are powered off. These jobs will be released as soon as the maintenance activities are complete.

 

Georgia Power will begin work to establish a Micro Grid power generation facility for Coda beginning on Thursday, after initial testing during the February maintenance period. This means that the research hall of the Coda datacenter, including the Hive and testflight-coda clusters, will be powered down for an expected 12-14 hours. Should any issues and resultant delays occur that extend the outage for Hive & testflight-coda, users will be notified accordingly.

 

ITEMS REQUIRING USER ACTION:

– [Hive] Default modules will change to point to pace/2020.01 from pace/2019.08, which uses an updated MPI and compiler. Users employing default modules will need to update their PBS scripts to ensure their workflows will succeed. This is described at http://docs.pace.gatech.edu/hive/software_guide/#meta-module-updates-2020.

Functionally, default MPI and compiler are only patch updates, with the use of Intel 19.0.5 (previously 19.0.3) and MVAPICH 2.3.2 (previously 2.3.1). However, the base of software under the new hierarchy have been rebuilt, which may impact specific use of compiled and MPI-compiled applications.

Users may be impacted. Access to the old PACE software basis can be done interactively and within scripts can be done simply by loading “module load pace/2019.08”.

PACE encourages users to migrate to using the new defaults, as new software will continue to be added to the newest software basis. Users may preserve current workflows by using the older pace module described above.

 

ITEMS NOT REQUIRING USER ACTION:

– Perform upgrades and replacements on several infiniband switches in the Rich datacenter.

– Replace other switches and hardware in the Rich datacenter.

– Update software modules in Hive.

– Update salt configuration management settings on all the production servers.

 

If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.

[Update 5/11/20 8:30 AM]

We would like to remind you that we are preparing for our next PACE maintenance period, which will begin at 6:00 AM on May 14 and conclude at 11:59 PM on May 16. As usual, jobs with long walltimes will be held by the scheduler to ensure that no active jobs will be running when systems are powered off. These jobs will be released as soon as the maintenance activities are complete.

Georgia Power will begin work to establish a Micro Grid power generation facility for Coda beginning on Thursday, after initial testing during the February maintenance period. This means that the research hall of the Coda datacenter, including the Hive and testflight-coda clusters, will be powered down for an expected 12-14 hours. Should any issues and resultant delays occur that extend the outage for Hive & testflight-coda, users will be notified accordingly.

We are still finalizing planned activities for the maintenance period. Here is a current list:

ITEMS REQUIRING USER ACTION:
– [Hive] Default modules will change to point to pace/2020.01 from pace/2019.08, which uses an updated MPI and compiler. Users employing default modules will need to update their PBS scripts to ensure their workflows will succeed. This is described at http://docs.pace.gatech.edu/hive/software_guide/#meta-module-updates-2020.
Functionally, default MPI and compiler are only patch updates, with the use of Intel 19.0.5 (previously 19.0.3) and MVAPICH 2.3.2 (previously 2.3.1). However, the base of software under the new hierarchy have been rebuilt, which may impact specific use of compiled and MPI-compiled applications.
Users may be impacted. Access to the old PACE software basis can be done interactively and within scripts can be done simply by loading “module load pace/2019.08”.
PACE encourages users to migrate to using the new defaults, as new software will continue to be added to the newest software basis. Users may preserve current workflows by using the older pace module described above.

ITEMS NOT REQUIRING USER ACTION:
– Perform upgrades and replacements on several infiniband switches in the Rich datacenter.
– Replace other switches and hardware in the Rich datacenter.
– Update software modules in Hive.
– Update salt configuration management settings on all the production servers.

If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.

 

[Original Post]

We are preparing for our next PACE maintenance period, which will begin at 6:00 AM on May 14 and conclude at 11:59 PM on May 16. As usual, jobs with long walltimes will be held by the scheduler to ensure that no active jobs will be running when systems are powered off. These jobs will be released as soon as the maintenance activities are complete.

Georgia Power will begin work to establish a Micro Grid power generation facility for Coda beginning on Thursday, after initial testing during the February maintenance period. This means that the research hall of the Coda datacenter, including the Hive and testflight-coda clusters, will be powered down for an expected 12-14 hours. Should any issues and resultant delays occur that extend the outage for Hive & testflight-coda, users will be notified accordingly.

We are still finalizing planned activities for the maintenance period. Here is a current list:

ITEMS REQUIRING USER ACTION:
– [Hive] Default modules will change to point to pace/2020.01 from pace/2019.08, which uses an updated MPI and compiler. Users employing default modules will need to update their PBS scripts to ensure their workflows will succeed. A link with detailed documentation of this change and necessary action by users will be provided prior to the maintenance period.

ITEMS NOT REQUIRING USER ACTION:
– Perform upgrades and replacements on several infiniband switches in the Rich datacenter.
– Replace other switches and hardware in the Rich datacenter.
– Update software modules in Hive.
– Update salt configuration management settings on all the production servers.

If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.

Comments are closed.