[Updated 2023/02/03, 4:33 PM EST]
Dear Phoenix Users,
The Phoenix cluster is now ready for research and learning. We have released all jobs that were held by the scheduler.
Phoenix users will no longer be able to use the Torque/Moab scheduler and should make sure their workflows work on the Slurm-based cluster. Please contact us if you need additional help shifting your workflows to the Slurm-based cluster. PACE provides documentation, PACE Consulting Sessions, and PACE Slurm Orientation Sessions to support the smooth transition of your workflows to Slurm. We will host a Slurm Orientation Session (for users new to Slurm) on Friday, Feburary 17, 11am.
The transfer of remaining funds on Phoenix Moab/Torque to Slurm is ongoing and is expected to be completed next week. January statements will report the accurate balance when they are sent out.
The next maintenance period for all PACE clusters is May 9, 2023, at 6:00 AM through May 11, 2023, at 11:59 PM. Additional maintenance periods are tentatively scheduled for 2023 on August 8-10, and October 31-November 2.
Status of activities:
ITEMS REQUIRING USER ACTION:
- [Complete] [Phoenix] Slurm migration for the sixth and final phase of Phoenix cluster (about 123 additional nodes for a final total of about 1323). Phoenix users will no longer be able to use the Torque/Moab scheduler and should make sure their workflows work on the Slurm-based cluster.
- [Complete][Phoenix] New Phoenix login servers might cause a security message due to changes in the SSH keys. Please be aware of that and clear your local cache to clear the message
- [Complete] [Software] Singularity -> Apptainer Migration for PACE-apps, OOD. Users using Singularity on the command line need to use Apptainer commands moving forward.
ITEMS NOT REQUIRING USER ACTION:
- [Complete] [ALL CLUSTERS] Update GID on all file systems. Over 1.7 Billion files were updated
- [Complete] [Phoenix] Re-image last Phoenix login node; re-enable load balancer
- [Complete] [Phoenix] Migrate Remaining Phoenix-Moab Funds to Phoenix-Slurm
- [Complete] [Network] Code upgrade to PACE departmental Palo Alto
- [Complete] [Network] Upgrade ethernet switch firmware to 9.3.10 (research hall)
- [Complete] [Storage] Reduce the amount of memory available for ZFS caches, to 60% of installed memory
- [Complete] [Storage] Update the number of NFS threads to 4 times the number of cores
- [Complete] [Storage] Update sysctl parameters on ZFS servers
- [Complete] [Datacenter] Georgia Power: Microgrid tests and reconfiguration
- [Complete] [Datacenter] Databank: High Temp Chiller & Tower Maintenance
If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.
Thank You,
– The PACE Team
[Updated 2023/02/02, 4:20 PM EST]
Dear Hive Users,
The Hive cluster is now ready for research and learning. We have released all jobs that were held by the scheduler.
The next maintenance period for all PACE clusters is May 9, 2023, at 6:00 AM through May 11, 2023, at 11:59 PM. Additional maintenance periods are tentatively scheduled for 2023 on August 8-10, and October 31-November 2.
We are still working on maintenance for the Phoenix cluster and will provide more updates as more information is available.
Status of activities:
ITEMS REQUIRING USER ACTION:
- [Software] Singularity -> Apptainer Migration for PACE-apps, OOD. Users using Singularity on the command line need to use Apptainer commands moving forward.
ITEMS NOT REQUIRING USER ACTION:
- [ALL CLUSTERS] Update GID on all file systems. Over 1.7 Billion files will be updated
- [Network] Code upgrade to PACE departmental Palo Alto
- [Network] Upgrade ethernet switch firmware to 9.3.10 (research hall)
- [Hive][Storage] Replace 40G cables on storage-hive
- [Storage] Reduce the amount of memory available for ZFS caches, to 60% of installed memory
- [Storage] Update the number of NFS threads to 4 times the number of cores
- [Storage] Update sysctl parameters on ZFS servers
- [Datacenter] Georgia Power: Microgrid tests and reconfiguration
- [Datacenter] Databank: High Temp Chiller & Tower Maintenance
If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.
Thank You,
– The PACE Team
[Updated 2023/02/02, 4:05 PM EST]
Dear Firebird Users,
The Firebird cluster is now ready for research and learning. We have released all jobs that were held by the scheduler.
The next maintenance period for all PACE clusters is May 9, 2023, at 6:00 AM through May 11, 2023, at 11:59 PM. Additional maintenance periods are tentatively scheduled for 2023 on August 8-10, and October 31-November 2.
We are still working on maintenance for the Phoenix cluster and will provide more updates as more information is available.
Status of activities:
ITEMS REQUIRING USER ACTION:
- [Software] Singularity -> Apptainer Migration for PACE-apps, OOD. Users using Singularity on the command line need to use Apptainer commands moving forward.
ITEMS NOT REQUIRING USER ACTION:
- [ALL CLUSTERS] Update GID on all file systems. Over 1.7 Billion files will be updated
- [Network] Code upgrade to PACE departmental Palo Alto
- [Network] Upgrade ethernet switch firmware to 9.3.10 (research hall)
- [Storage] Reduce the amount of memory available for ZFS caches, to 60% of installed memory
- [Storage] Update the number of NFS threads to 4 times the number of cores
- [Storage] Update sysctl parameters on ZFS servers
- [Datacenter] Georgia Power: Microgrid tests and reconfiguration
- [Datacenter] Databank: High Temp Chiller & Tower Maintenance
If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.
Thank You,
– The PACE Team
[Updated 2023/02/01, 4:05 PM EST]
Dear Buzzard Users,
The Buzzard cluster is now ready for research and learning. We have released all jobs that were held by the scheduler.
The next maintenance period for all PACE clusters is May 9, 2023, at 6:00 AM through May 11, 2023, at 11:59 PM. Additional maintenance periods are tentatively scheduled for 2023 on August 8-10, and October 31-November 2.
We are still working on maintenance for the Phoenix cluster and will provide more updates as more information is available.
Status of activities:
ITEMS NOT REQUIRING USER ACTION:
- [ALL CLUSTERS] Update GID on all file systems. Over 1.7 Billion files will be updated
- [Network] Code upgrade to PACE departmental Palo Alto
- [Network] Upgrade ethernet switch firmware to 9.3.10 (research hall)
- [Storage] Reduce the amount of memory available for ZFS caches, to 60% of installed memory
- [Storage] Update the number of NFS threads to 4 times the number of cores
- [Storage] Update sysctl parameters on ZFS servers
- [Datacenter] Georgia Power: Microgrid tests and reconfiguration
- [Datacenter] Databank: High Temp Chiller & Tower Maintenance
If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.
Thank You,
– The PACE Team
[Updated 2023/02/01, 4:00 PM EST]
The PACE-ICE and COC-ICE instructional clusters are ready for learning. As usual, we have released all user jobs that were held by the scheduler. You may resume using PACE-ICE and COC-ICE at this time. PACE’s research clusters remain under maintenance as planned.
[Updated 2023/01/31, 6:00AM EST]
WHEN IS IT HAPPENING?
Maintenance Period starts now at 6 AM EST on Tuesday, 01/31/2023, and is tentatively scheduled to conclude by 11:59PM on Tuesday, 02/07/2023.
The Phoenix project file system changes are estimated to take seven days to complete. The other PACE clusters (Hive, Firebird, CoC-ICE, PACE-ICE, and Buzzard) are anticipated to finish earlier than seven days. PACE will release them as soon as maintenance and migration work are complete.
WHAT DO YOU NEED TO DO?
During this extended Maintenance Period, access to all the PACE-managed computational and storage resources will be unavailable. This includes Phoenix, Hive, Firebird, PACE-ICE, COC-ICE, and Buzzard. Phoenix is expected to take the full week, while the other PACE clusters (Hive, Firebird, CoC-ICE, PACE-ICE, and Buzzard) will be released as storage updates are completed during the maintenance window.
Torque/Moab will no longer be available to Phoenix users starting now. We strongly encourage all researchers to shift their workflows to the Slurm-based cluster. PACE provides documentation, PACE Consulting Sessions, and PACE Slurm Orientation Sessions to support the smooth transition of your workflows to Slurm. These can be found at: https://pace.gatech.edu/
Users using Singularity on the command line need to use Apptainer commands moving forward.
WHAT IS HAPPENING?
PACE Maintenance Period starts now and will run until it is complete. Phoenix downtime could last until Tuesday, 02/07/2023 or beyond.
ITEMS REQUIRING USER ACTION:
- [Phoenix] Slurm migration for the sixth and final phase of Phoenix cluster (123 additional nodes for a final total of about 1323). Phoenix users will no longer be able to use the Torque/Moab scheduler and should make sure their workflows work on the Slurm-based cluster.
- [Software] Singularity -> Apptainer Migration for PACE-apps, OOD. Users using Singularity on the command line need to use Apptainer commands moving forward.
ITEMS NOT REQUIRING USER ACTION:
- [ALL CLUSTERS] Update GID on all file systems. Over 1.7 Billion files will be updated
- [Phoenix] Re-image last Phoenix login node; re-enable load balancer
- [Phoenix] Migrate Remaining Phoenix-Moab Funds to Phoenix-Slurm
- [ICE] Update cgroups limits on ICE head nodes
- [Network] Code upgrade to PACE departmental Palo Alto
- [Network] Upgrade ethernet switch firmware to 9.3.10 (research hall)
- [Hive][Storage] Replace 40G cables on storage-hive
- [Storage] Reduce the amount of memory available for ZFS caches, to 60% of installed memory
- [Storage] Update the number of NFS threads to 4 times the number of cores
- [Storage] Update sysctl parameters on ZFS servers
- [Datacenter] Georgia Power: Microgrid tests and reconfiguration
- [Datacenter] Databank: High Temp Chiller & Tower Maintenance
WHY IS IT HAPPENING?
The extended maintenance period is required to remove conflicting GID’s with campus allowing the expansion of research storage across campus. It is a required component of a strategic initiative and will provide foundational work to provide additional storage options and capacity to researchers. The additional items are part of our regularly scheduled Maintenance Periods which can be found in advance at https://pace.gatech.edu/. Regular maintenance periods are necessary to reduce unplanned downtime and maintain a secure and stable system.
WHO IS AFFECTED?
All users across all PACE clusters.
WHO SHOULD YOU CONTACT FOR QUESTIONS?
Please contact PACE at pace-support@oit.gatech.edu with questions or concerns.
Thank You,
– The PACE Team
[Updated 2023/01/27, 2:06PM EST]
WHEN IS IT HAPPENING?
Reminder that the next Maintenance Period starts at 6:00AM on Tuesday, 01/31/2023, and is tentatively scheduled to conclude by 11:59PM on Tuesday, 02/07/2023.
The Phoenix project file system changes are estimated to take seven days to complete. The other PACE clusters (Hive, Firebird, CoC-ICE, PACE-ICE, and Buzzard) are anticipated to finish earlier than seven days. PACE will release them as soon as maintenance and migration work is complete.
WHAT DO YOU NEED TO DO?
As usual, jobs with resource requests that would be running during the Maintenance Period will be held until after the maintenance by the scheduler. During this extended Maintenance Period, access to all the PACE-managed computational and storage resources will be unavailable. This includes Phoenix, Hive, Firebird, PACE-ICE, COC-ICE, and Buzzard. Phoenix is expected to take the full week, while the other PACE clusters (Hive, Firebird, CoC-ICE, PACE-ICE, and Buzzard) will be released as storage updates are completed during the maintenance window. If you have a critical deadline that this will impact, please let us know, and we can collaborate on possible alternatives. Please plan accordingly for the projected downtime.
Torque/Moab will no longer be available to Phoenix users starting January 31st, at 6 AM ET. We strongly encourage all researchers to shift their workflows to the Slurm-based cluster. PACE provides documentation, PACE Consulting Sessions, and PACE Slurm Orientation Sessions to support the smooth transition of your workflows to Slurm. These can be found at: https://pace.gatech.edu/
Users using Singularity on the command line need to use Apptainer commands moving forward.
WHAT IS HAPPENING?
The next PACE Maintenance Period starts 01/31/2023 at 6am and will run until complete. Phoenix downtime could last until Feb 7 or beyond.
ITEMS REQUIRING USER ACTION:
- [Phoenix] Slurm migration for the sixth and final phase of Phoenix cluster (123 additional nodes for a final total of about 1323). Phoenix users will no longer be able to use the Torque/Moab scheduler and should make sure their workflows work on the Slurm-based cluster.
- [Software] Singularity -> Apptainer Migration for PACE-apps, OOD. Users using Singularity on the command line need to use Apptainer commands moving forward.
ITEMS NOT REQUIRING USER ACTION:
- [ALL CLUSTERS] Update GID on all file systems. Over 1.7 Billion files will be updated
- [Phoenix] Re-image last Phoenix login node; re-enable load balancer
- [Phoenix] Migrate Remaining Phoenix-Moab Funds to Phoenix-Slurm
- [ICE] Update cgroups limits on ICE head nodes
- [Network] Code upgrade to PACE departmental Palo Alto
- [Network] Upgrade ethernet switch firmware to 9.3.10 (research hall)
- [Hive][Storage] Replace 40G cables on storage-hive
- [Storage] Reduce the amount of memory available for ZFS caches, to 60% of installed memory
- [Storage] Update the number of NFS threads to 4 times the number of cores
- [Storage] Update sysctl parameters on ZFS servers
- [Datacenter] Georgia Power: Microgrid tests and reconfiguration
- [Datacenter] Databank: High Temp Chiller & Tower Maintenance
WHY IS IT HAPPENING?
The extended maintenance period is required to remove conflicting GID’s with campus allowing the expansion of research storage across campus. It is a required component of a strategic initiative and will provide foundational work to provide additional storage options and capacity to researchers. The additional items are part of our regularly scheduled Maintenance Periods which can be found in advance at https://pace.gatech.edu/. Regular maintenance periods are necessary to reduce unplanned downtime and maintain a secure and stable system.
WHO IS AFFECTED?
All users across all PACE clusters.
WHO SHOULD YOU CONTACT FOR QUESTIONS?
Please contact PACE at pace-support@oit.gatech.edu with questions or concerns.
Thank You,
– The PACE Team
[Updated 2023/01/20, 8:45AM EST]
WHEN IS IT HAPPENING?
The next Maintenance Period starts at 6:00AM on Tuesday, 01/31/2023, and is tentatively scheduled to conclude by 11:59PM on Tuesday, 02/07/2023.
The Phoenix project file system changes are estimated to take seven days to complete. The other PACE clusters (Hive, Firebird, CoC-ICE, PACE-ICE, and Buzzard) are anticipated to finish earlier than seven days. PACE will release them as soon as maintenance and migration work is complete.
WHAT DO YOU NEED TO DO?
As usual, jobs with resource requests that would be running during the Maintenance Period will be held until after the maintenance by the scheduler. During this extended Maintenance Period, access to all the PACE-managed computational and storage resources will be unavailable. This includes Phoenix, Hive, Firebird, PACE-ICE, COC-ICE, and Buzzard. Phoenix is expected to take the full week, while the other PACE clusters (Hive, Firebird, CoC-ICE, PACE-ICE, and Buzzard) will be released as storage updates are completed during the maintenance window. If you have a critical deadline that this will impact, please let us know, and we can collaborate on possible alternatives. Please plan accordingly for the projected downtime.
Torque/Moab will no longer be available to Phoenix users starting January 31st, at 6 AM ET. We strongly encourage all researchers to shift their workflows to the Slurm-based cluster. PACE provides documentation, PACE Consulting Sessions, and PACE Slurm Orientation Sessions to support the smooth transition of your workflows to Slurm. These can be found at: https://pace.gatech.edu/
Users using Singularity on the command line need to use Apptainer commands moving forward.
WHAT IS HAPPENING?
The next PACE Maintenance Period starts 01/31/2023 at 6am and will run until complete. Phoenix downtime could last until Feb 7 or beyond.
ITEMS REQUIRING USER ACTION:
- [Phoenix] Slurm migration for the sixth and final phase of Phoenix cluster (about 119 additional nodes for a final total of about 1319). Phoenix users will no longer be able to use the Torque/Moab scheduler and should make sure their workflows work on the Slurm-based cluster.
- [Software] Singularity -> Apptainer Migration for PACE-apps, OOD. Users using Singularity on the command line need to use Apptainer commands moving forward.
ITEMS NOT REQUIRING USER ACTION:
- [ALL CLUSTERS] Update GID on all file systems. Over 1.7 Billion files will be updated
- [Phoenix] Re-image last Phoenix login node; re-enable load balancer
- [Phoenix] Migrate Remaining Phoenix-Moab Funds to Phoenix-Slurm
- [ICE] Update cgroups limits on ICE head nodes
- [Network] Code upgrade to PACE departmental Palo Alto
- [Network] Upgrade ethernet switch firmware to 9.3.10 (research hall)
- [Hive][Storage] Replace 40G cables on storage-hive
- [Storage] Reduce the amount of memory available for ZFS caches, to 60% of installed memory
- [Storage] Update the number of NFS threads to 4 times the number of cores
- [Storage] Update sysctl parameters on ZFS servers
- [Datacenter] Georgia Power: Microgrid tests and reconfiguration
- [Datacenter] Databank: High Temp Chiller & Tower Maintenance
WHY IS IT HAPPENING?
The extended maintenance period is required to remove conflicting GID’s with campus allowing the expansion of research storage across campus. It is a required component of a strategic initiative and will provide foundational work to provide additional storage options and capacity to researchers. The additional items are part of our regularly scheduled Maintenance Periods which can be found in advance at https://pace.gatech.edu/. Regular maintenance periods are necessary to reduce unplanned downtime and maintain a secure and stable system.
WHO IS AFFECTED?
All users across all PACE clusters.
WHO SHOULD YOU CONTACT FOR QUESTIONS?
Please contact PACE at pace-support@oit.gatech.edu with questions or concerns.
Thank You,
– The PACE Team