GT Home : : Campus Maps : : GT Directory

[COMPLETED] PACE Quarterly Maintenance – November 7-9

This entry was posted by on Saturday, 26 October, 2019 at

[Update 11/5/19]

We would like to remind you that PACE’s maintenance period begins tomorrow. This quarterly maintenance period is planned for three days and will start on Thursday, November 7, and go through Saturday, November 9.  As usual, jobs with long walltimes will be held by the scheduler to ensure that no active jobs will be running when systems are powered off. These jobs will be released as soon as the maintenance activities are complete.

These activities will be performed:
ITEM REQUIRING USER ACTION:
– Anaconda Distributions have started to use a year.month scheme, starting from late last year (https://www.anaconda.com/anaconda-distribution-2018-12-released/). This is easier for all users of PACE to track; accordingly, all PACE resources will now adopt the same convention in the use of anaconda2/2019.10 and anaconda3/2019.10 modules across all PACE resources. Defaults for Anaconda will now be set to the latest YYYY.MM. Therefore, the anaconda module files for “latest” will be removed, to avoid ambiguities. However, software installations that rely on “latest” will still be retained to preserve any critical user workflows. Users currently loading an Anaconda module ending in “latest” should modify their commands to reference a specific version of Anaconda (or just load the default without a version specified  – e.g., “module load anaconda3”). Please email PACE Support if you need help in accessing older versions of Anaconda that are no longer available via the modules system or in updating your scripts.

ITEMS NOT REQUIRING USER ACTION:
– (Completed) Scheduler settings will be modified to improve the scheduler’s ability to handle large numbers of job submissions rapidly. These changes, along with the new policy implemented last week (10/29/19) limiting simultaneous job submissions (http://blog.pace.gatech.edu/?p=6550), will help stabilize the shared scheduler (accessed via login-s[X] headnodes) and make it more reliable. These scheduler settings are already implemented on the Hive cluster.
– (Completed) PBSTools, which records user job submissions, will be upgraded.
– (Completed) Upgrades to routers and network connections for PACE in Rich and Hive in Coda will be made in order to improve high-speed data transfer.
– (Completed) [Hive cluster] Infiniband switch firmware will be upgraded.
– (Completed) [Hive cluster] Storage system firmware will be updated.
– (Completed) [Hive cluster] Subnet managers will be reconfigured for better redundancy.
– (Completed) [Hive cluster] Lmod, the environment module system, will be updated to a newer version.
– (Completed) The athena-6 queue will be upgraded to RHEL7.

If you have any questions or concerns, please don’t hesitate to contact us at pace-support@oit.gatech.edu . You can follow our Maintenance blog post at http://blog.pace.gatech.edu/?p=6614.

 

[Update 11/1/19]

We would like to remind you that we are preparing for PACE’s next quarterly maintenance days on November 7-9, 2019. This maintenance period is planned for three days and will start on Thursday, November 7, and go through Saturday, November 9.  As usual, jobs with long walltimes will be held by the scheduler to ensure that no active jobs will be running when systems are powered off. These jobs will be released as soon as the maintenance activities are complete.

We are still finalizing planned activities for the maintenance period. Here is a current list:

ITEM REQUIRING USER ACTION:

– Anaconda Distributions have started to use a year.month scheme, starting from late last year (https://www.anaconda.com/anaconda-distribution-2018-12-released/). This is easier for all users of PACE to track; accordingly, all PACE resources will now adopt the same convention in the use of anaconda2/2019.10 and anaconda3/2019.10 modules across all PACE resources. Defaults for Anaconda will now be set to the latest YYYY.MM. Therefore, the anaconda module files for “latest” will be removed, to avoid ambiguities. However, software installations that rely on “latest” will still be retained to preserve any critical user workflows. Users currently loading an Anaconda module ending in “latest” should modify their commands to reference a specific version of Anaconda (or just load the default without a version specified  – e.g., “module load anaconda3”). Please email PACE Support if you need help in accessing older versions of Anaconda that are no longer available via the modules system or in updating your scripts.

ITEMS NOT REQUIRING USER ACTION:

– Scheduler settings will be modified to improve the scheduler’s ability to handle large numbers of job submissions rapidly. These changes, along with the new policy being implemented on Tuesday (10/29/19) limiting simultaneous job submissions (http://blog.pace.gatech.edu/?p=6550), will help stabilize the shared scheduler (accessed via login-s[X] headnodes) and make it more reliable. These scheduler settings are already implemented on the Hive cluster.

– RHEL7 clusters will receive critical patches.

– Updates will be made to PACE databases and configurations.

– PBSTools, which records user job submissions, will be upgraded.

– Upgrades to routers and network connections for PACE in Rich and Hive in Coda will be made in order to improve high-speed data transfer.

– [Hive cluster] Infiniband switch firmware will be upgraded.

– [Hive cluster] Storage system software will be updated. – [Hive cluster] Subnet managers will be reconfigured for better redundancy.

– [Hive cluster] Lmod, the environment module system, will be updated to a newer version.

 

If you have any questions or concerns, please don’t hesitate to contact us at pace-support@oit.gatech.edu . You can follow our Maintenance blog post at http://blog.pace.gatech.edu/?p=6614.

 

[Original post]

We are preparing for PACE’s next maintenance days on November 7-9, 2019. This maintenance period is planned for three days and will start on Thursday, November 7, and go through Saturday, November 9.  As usual, jobs with long walltimes will be held by the scheduler to ensure that no active jobs will be running when systems are powered off. These jobs will be released as soon as the maintenance activities are complete.

We are still finalizing planned activities for the maintenance period. Here is a current list:
ITEM REQUIRING USER ACTION:
– Anaconda Distributions have started to use a year.month scheme, starting from late last year. This is easier for all users of PACE to track; accordingly, all PACE resources will now adopt the same convention in the use of anaconda2/2019.10 and anaconda3/2019.10 modules across all PACE resources. Defaults for Anaconda will now be set to the latest YYYY.MM. Therefore, the anaconda module files for “latest” will be removed, to avoid ambiguities. However, software installations that rely on “latest” will still be retained to preserve any critical user workflows. Users currently loading an Anaconda module ending in “latest” should modify their commands to reference a specific version of Anaconda (or just load the default without a version specified  – e.g., “module load anaconda3”). Please email PACE Support if you need help in accessing older versions of Anaconda that are no longer available via the modules system or in updating your scripts.

ITEMS NOT REQUIRING USER ACTION:
– Scheduler settings will be modified to improve the scheduler’s ability to handle large numbers of job submissions rapidly. These changes, along with the new policy being implemented on Tuesday (10/29/19) limiting simultaneous job submissions, will help stabilize the shared scheduler (accessed via login-s[X] headnodes) and make it more reliable. These scheduler settings are already implemented on the Hive cluster.
– RHEL7 clusters will receive critical patches.
– Updates will be made to PACE databases and configurations.
– Firmware for DDN storage will be updated.

– Upgrades to routers and network connections for PACE in Rich and Hive in Coda will be made in order to improve high-speed data transfer.
– [Hive cluster] Infiniband switch firmware will be upgraded.
– [Hive cluster] Subnet managers will be reconfigured for better redundancy.
– [Hive cluster] Lmod, the environment module system, will be updated to a newer version.

If you have any questions or concerns, please don’t hesitate to contact us at pace-support@oit.gatech.edu.

Comments are closed.