PACE A Partnership for an Advanced Computing Environment

October 24, 2022

PACE Maintenance Period (November 2 – 4, 2022)

Filed under: Uncategorized — Jeff Valdez @ 4:58 pm

[11/4/2022 Update]

The Phoenix (Moab/Torque and Slurm), Hive, Firebird, PACE-ICE, COC-ICE, and Buzzard clusters are now ready for research and learning. We have released all jobs that were held by the scheduler. 

The 2nd phase of the Phoenix-Slurm cluster migration for 300 additional nodes (for a combined total of 800 nodes [out of 1319]) completed successfully and researchers can resume using it. 

The next maintenance period for all PACE clusters is January 31, 2023, at 6:00 AM through February 2, 2023, at 11:59 PM. Additional maintenance periods are tentatively scheduled for 2023 on May 9-11, August 8-10, and October 31-November 2. Additional phases for the Phoenix-Slurm cluster migration are tentatively scheduled for November 29 in 2022, and January 4, 17, and 31 in 2023. 

Status of activities: 

ITEMS REQUIRING USER ACTION: 

  • [Complete][Hive] New Hive login servers might cause a security message due to changes in the SSH keys. Please be aware of that and clear your local cache to clear the message 

ITEMS NOT REQUIRING USER ACTION: 

  • [Complete] [Phoenix] Slurm migration for second phase of Phoenix cluster (300 additional nodes for a combined total of 800 nodes [out of 1319]) 
  • [Complete] [Phoenix] Reconfigure Phoenix in PACE DB 
  • [Complete] [Hive][Storage] Cable replacement for GPFS (project/scratch) controller 
  • [Complete] [Firebird][Storage] Migrate some Firebird projects from current file servers to new file server 
  • [Complete] [Firebird] Reconfigure Firebird in PACE DB 
  • [Complete] [OSG] Update Nvidia drivers 
  • [Complete] [OSG][Network] Remove IB drivers on osg-login2 
  • [Complete] [Datacenter] Transformer repairs 
  • [Complete] [Network] Update VRF configuration on compute racks 
  • [Complete] [Storage] Upgrade Globus to 5.4.50 for new CA 

If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.

[11/2/2022 Update]

This is a reminder that our next PACE Maintenance period has now begun and is scheduled to end at 11:59PM on Friday, 11/04/2022During this Maintenance Period, access to all the PACE-managed computational and storage resources will be unavailable. This includes Phoenix, Hive, Firebird, PACE-ICE, COC-ICE, and Buzzard.

Tentative list of activities: 

ITEMS REQUIRING USER ACTION: 

  • [Hive] New Hive login servers might cause a security message due to changes in the SSH keys. Please be aware of that and clear your local cache to clear the message 

ITEMS NOT REQUIRING USER ACTION: 

  • [Phoenix] Slurm migration for second phase of Phoenix cluster (300 additional nodes for a combined total of 800 nodes [out of 1319]) 
  • [Phoenix] Reconfigure Phoenix in PACE DB 
  • [Hive][Storage] Cable replacement for GPFS (project/scratch) controller 
  • [Firebird][Storage] Migrate some Firebird projects from current file servers to new file server 
  • [Firebird] Reconfigure Firebird in PACE DB 
  • [OSG] Update Nvidia drivers 
  • [OSG][Network] Remove IB drivers on osg-login2 
  • [Datacenter] Transformer repairs 
  • [Network] Update VRF configuration on compute racks 
  • [Storage] Upgrade Globus to 5.4.50 for new CA 

If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.

[10/31/2022 Update]

This is a reminder that our next PACE Maintenance period is scheduled to begin later this week at 6:00AM on Wednesday, 11/02/2022, and it is tentatively scheduled to conclude by 11:59PM on Friday, 11/04/2022. As usual, jobs with resource requests that would be running during the Maintenance Period will be held until after the maintenance by the scheduler. During this Maintenance Period, access to all the PACE managed computational and storage resources will be unavailable. This includes Phoenix, Hive, Firebird, PACE-ICE, COC-ICE, and Buzzard. 

Tentative list of activities: 

ITEMS REQUIRING USER ACTION: 

  • [Hive] New Hive login servers might cause a security message due to changes in the SSH keys. Please be aware of that and clear your local cache to clear the message 

ITEMS NOT REQUIRING USER ACTION: 

  • [Phoenix] Slurm migration for second phase of Phoenix cluster (300 additional nodes for a combined total of 800 nodes [out of 1319]) 
  • [Phoenix] Reconfigure Phoenix in PACE DB 
  • [Hive][Storage] Cable replacement for GPFS (project/scratch) controller 
  • [Firebird][Storage] Migrate some Firebird projects from current file servers to new file server 
  • [Firebird] Reconfigure Firebird in PACE DB 
  • [OSG] Update Nvidia drivers 
  • [OSG][Network] Remove IB drivers on osg-login2 
  • [Datacenter] Transformer repairs 
  • [Network] Update VRF configuration on compute racks 
  • [Storage] Upgrade Globus to 5.4.50 for new CA 

If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.

[10/24/2022 Early Reminder]

Dear PACE Users,

This is a friendly reminder that our next PACE Maintenance period is scheduled to begin at 6:00AM on Wednesday, 11/02/2022, and it is tentatively scheduled to conclude by 11:59PM on Friday, 11/04/2022. As usual, jobs with resource requests that would be running during the Maintenance Period will be held until after the maintenance by the scheduler. During this Maintenance Period, access to all the PACE managed computational and storage resources will be unavailable. This includes Phoenix, Hive, Firebird, PACE-ICE, COC-ICE, and Buzzard.

Tentative list of activities:

ITEMS REQUIRING USER ACTION:

  • [Hive] New Hive login servers might cause a security message due to changes in the SSH keys. Please be aware of that and clear your local cache to clear the message

ITEMS NOT REQUIRING USER ACTION:

  • [Phoenix] Slurm migration for second phase of Phoenix cluster (300 additional nodes for a combined total of 800 nodes [out of 1319])
  • [Phoenix] Reconfigure Phoenix in PACE DB
  • [Hive][Storage] Cable replacement for GPFS (project/scratch) controller
  • [Firebird][Storage] Migrate some Firebird projects from current file servers to new file server
  • [Firebird] Reconfigure Firebird in PACE DB
  • [OSG] Update Nvidia drivers
  • [OSG][Network] Remove IB drivers on osg-login2
  • [Datacenter] Transformer repairs
  • [Network] Update VRF configuration on compute racks
  • [Storage] Upgrade Globus to 5.4.50 for new CA

If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.

Best,

The PACE Team

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress