GT Home : : Campus Maps : : GT Directory

TensorFlow update required due to identified security vulnerability

Wednesday, October 20, 2021 Posted by
Comments closed

Summary: TensorFlow update required due to identified security vulnerability

What’s happening and what are we doing: A security vulnerability was discovered in TensorFlow. PACE has installed the patched version 2.6.0 of TensorFlow in our software repository, and we will retire the older versions on November 3, 2021, during our maintenance period.

How does this impact me: Both researchers who use PACE’s TensorFlow installation and those who have installed their own are impacted.

The following PACE installations will be retired:

Modules: tensorflow-gpu/2.0.0 and tensorflow-gpu/2.2.0

Virtual envs under anaconda3/2020.02: pace-tensorflow-gpu-2.2.0 and pace-tensorflow-2.2.0

Please use the tensorflow-gpu/2.6.0 module instead of the older versions  identified above. If you were previously using  a PACE-provided virtual env provided  inside the anaconda3 module, please use the separate new module instead. You can find more information about using PACE’s TensorFlow installation in our documentation. You will need to update your PBS scripts to call the new module, and you may need to update python code to ensure compatibility with the latest version of the package.

If you have created your own conda environment on PACE and installed TensorFlow in it, please create a new virtual environment and install the necessary packages. You can build this environment from the tensorflow-gpu/2.6.0 virtual environment as a base if you would like, then install other packages you need, as described in our documentation. In order to protect Georgia Tech’s cybersecurity, please discontinue use of any older environments running prior versions of TensorFlow on PACE.

What we will continue to do: We are happy to assist researchers with the transition to the new version of TensorFlow. PACE will offer support to researchers upgrading TensorFlow at our upcoming consulting sessions. The next sessions are Thursday, October 28, 10:30-12:15, and Tuesday, November 2, 2:00-3:45. Visit our training page for the full schedule and BlueJeans links.

Thank you for your prompt attention to this security update, and please accept our sincere apology for any inconvenience that this may cause you. If you have any questions or concerns, please direct them to pace-support@oit.gatech.edu.

Hive scheduler recurring outages

Friday, October 15, 2021 Posted by
Comments closed

[Update 10/15/21 5:15 PM]

The Hive scheduler is functioning at this time. The PACE team disabled several system utilities that may have contributed to earlier issues with the scheduler. We will continue to monitor the scheduler status and to work with our support vendor to improve stability of Hive’s scheduler. Please check this blog post for updates.

[Update 10/15/21 4:15 PM]

The Hive scheduler is again functional. The PACE team and our vendor are continuing our investigation in order to restore stability to the scheduler.

[Original Post 10/15/21 12:35 PM]

Summary: Hive scheduler recurring outages

What’s happening and what are we doing: The Hive scheduler has been experiencing intermittent outages over the past few weeks requiring frequent restarts. At this time, the PACE team is running a diagnostic utility and will restart the scheduler shortly. The PACE team is actively investigating the outages in coordination with our scheduler vendor to restore stability to Hive’s scheduler.

How does this impact me: Hive researchers may be unable to submit or check the status of jobs, and jobs may be unable to start. You may find that the “qsub” and “qstat” commands and/or the “showq” command are not responsive. Already-running jobs will continue.

What we will continue to do: PACE will continue working to restore functionality to the Hive scheduler and coordinating with our support vendor. We will provide updates on our blog, so please check here for current status.

Please accept our sincere apology for any inconvenience that this temporary limitation may cause you. If you have any questions or concerns, please direct them to pace-support@oit.gatech.edu.

 

PACE’s centralized OSG service, powered with a new cluster “Buzzard”

Thursday, October 14, 2021 Posted by
Comments closed

We are happy to announce a new addition to PACE’s service portfolio to support Open Science Grid (OSG) efforts on campus and beyond. This service is kick-started by a brand new cluster, named “Buzzard”, funded by an NSF award* lead by Dr. Mehmet Belgin and Semir Sarajlic of PACE, in collaboration with Drs. Laura Cadonati, Nepomuk Otte, and Ignacio Taboada of the Center for Relativistic Astrophysics (CRA). 

Open Science Grid (OSG) is a unique consortium that provides shared infrastructure and services to unify access to supercomputing sites across the nation, making a vast array of High Throughput Computing (HTC) resources available to US-based researchers. OSG has been instrumental in ground-breaking scientific advancements, including but not limited to the Nobel-winning Gravitational Waves research (LIGO).  

Did you know that all of the GT researchers already qualify for OSG? This means you can join today and start running jobs on this vast resource at no cost. We highly encourage you to register for PACE’s next OSG orientation class, which will get you started with the basics of running on OSG.  As an added resource, PACE offers documentation to get researchers quickly started with OSG. 

In addition to training and documentation, PACE offers resource integration services. More specifically, GT faculty members now have an option to acquire new resources to expand Buzzard with their own OSG projects, similar to the High Performance Computing (HPC) services PACE had been successfully offering since 2009 prior to the new cost model. As a part of the NSF award, PACE already started supporting several exceptional OSG projects, namely LIGO, IceCube and CTA/VERITAS, and we look forward to supporting more OSG projects in the future! 

If you are interested in the OSG service, please feel free to reach out to us (pace-support@oit.gatech.edu) and we’ll be happy to discuss how our new service can transform your research. 

Thank you! 

 

* This material is based upon work supported by the National Science Foundation under grant number 1925541. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. 

Announcing the PACE OSG Orientation Class

Thursday, October 7, 2021 Posted by
Comments closed

Dear PACE Researchers, 

PACE is pleased to announce the launch of the PACE Open Science Grid (OSG) Orientation class that introduces Georgia Tech’s research community to OSG and the distributed high throughput computing resources that are available via OSG Connect.   Join us for this virtual orientation to learn about OSG and how it may benefit your research needs. 

Please see below the dates for the sessions and the registration form: 

Dates and times:  October 15, 10:30am – 12:15pm 

                               November 11, 1:30pm – 3:15pm 

Registration:         https://b.gatech.edu/3Bi4Yie 

This class is based in part on the work supported by the NSF CC* award 1925541: “Integrating Georgia Tech into the Open Science Grid for Multi-Messenger Astrophysics”. With this award, PACE, in collaboration with Center for Relativistic Astrophysics, added CPU/GPU/Storage to the existing OSG capacity, as well as the first regional StashCache service that benefits all OSG institutions in the Southeast region, not just Georgia Tech.  

This orientation is the first step into PACE’s longer-term plans to support OSG initiatives on campus. Please be on the lookout for more exciting announcements from our team in the very near future. 

We look forward to you joining us for the OSG orientation. 

Best,

The PACE Team

Hive Project & Scratch Storage Battery Replacement

Thursday, September 23, 2021 Posted by
Comments closed

[Update 9/23/21 3:15 PM]

The replacement batteries have reached a sufficient charge, and Hive GPFS performance has been restored. Thank you for your patience during this maintenance.

[Original Post 9/23/21 12:30 PM]

Summary: Battery replacement on Hive project & scratch storage will impact performance today.
What’s happening and what are we doing: UPS batteries on the Hive GPFS storage device, holding project (data) and scratch storage, need to be replaced. During the replacement, which will begin shortly this afternoon, storage will shift to write-through mode, and performance will be impacted. Once the new batteries are sufficiently charged, performance will return to normal.
How does this impact me: Hive project and scratch performance will be impacted until the fresh batteries have sufficiently charged, which should take approximately 3 hours. Jobs may progress more slowly than normal. If your job runs out of wall time and is cancelled by the scheduler, please resubmit it to run again.
What we will continue to do: PACE will monitor Hive GPFS storage throughout this procedure.
Please accept our sincere apology for any inconvenience that this temporary limitation may cause you. If you have any questions or concerns, please direct them to pace-support@oit.gatech.edu.

Hive and Phoenix Scheduler Configuration Change

Wednesday, September 22, 2021 Posted by
Comments closed

Dear PACE Researchers, 

We would like to announce an upcoming change to the scheduler configuration on the Phoenix and Hive clusters at 9:00 AM on Thursday, September 23rd. This change should improve the scheduler performance given the large number of jobs executed by our users. 

What will PACE be doing: PACE will reduce the retention time for job-specific logs from 24 hours to 6 hours after job completion.  Reducing the amount of job information the scheduler needs to process regularly should provide a more stable and faster job submission environment. Additionally, the downtime associated with scheduler restarts should improve, as job ingestion time will be reduced accordingly.  

Who does this message impact: Any user who attempts to use qstat for a job more than 6 hours after completion will be unable to do so moving forward. In addition to the scheduler job STDOUT/STDERR files, job statistics for completed jobs on Phoenix and Hive can be queried at https://pbstools-coda.pace.gatech.edu. 

What PACE will continue to do: We will monitor the clusters for issues during and after the configuration change to assess any immediate impacts from the update. We will continue to assess the scheduler health to ensure a stable job submission environment. 

As always, please contact us at pace-support@oit.gatech.edu with any questions or concerns regarding this change. 

Best Regards, 
The PACE Team

PACE Maintenance Period (November 3 – 5, 2021)

Monday, September 13, 2021 Posted by
Comments closed

[Full announcement 10/20/21 10:30 AM]

As previously announced, our next PACE maintenance period is scheduled to begin at 6:00 AM on Wednesday, November 3, and end at 11:59 PM on Friday, November 5. As usual, jobs that request durations that would extend into the maintenance period will be held by the scheduler to run after maintenance is complete. During the maintenance window, access to all PACE-managed computational and storage resources will be unavailable. This includes Phoenix, Hive, Firebird, PACE-ICE, COC-ICE, and Buzzard.

Please see below for a tentative list of activities:

ITEMS REQUIRING USER ACTION:

  • TensorFlow upgrade due to security vulnerability. PACE will retire older versions of TensorFlow, and researchers should shift to using the new module. We also request that you replace any self-installed TensorFlow packages. Additional details and instructions will follow in a separate message.

ITEMS NOT REQUIRING USER ACTION:

  • [Datacenter] Databank will clean the water cooling tower, requiring that all PACE compute nodes be powered off.
  • [System] Operating system patch installs
  • [Storage/Phoenix] Lustre controller firmware and other upgrades
  • [Storage/Phoenix] Lustre scratch upgrade and expansion
  • [System] System configuration management updates
  • [System] Updates to NVIDIA drivers and libraries
  • [System] Upgrade some PACE infrastructure nodes to RHEL 7.9
  • [System] Reorder group file
  • [Headnode/COC-ICE] Configure c-group controls on COC-ICE headnode
  • [Scheduler/Hive] separate Torque & Moab servers to improve scheduler reliability
  • [Network] update ethernet switch firmware
  • [Network] update IP addresses of switches in BCDC

If you have any questions or concerns, please contact us at pace-support@oit.gatech.edu.

 

[Early announcement]

Dear PACE Users,

This is a friendly reminder that our next Maintenance period is tentatively scheduled to begin at 6:00AM on Wednesday, 11/03/2021, and it is tentatively scheduled to conclude by 11:59PM on Friday, 11/05/2021. As usual, jobs with resource requests that would be running during the Maintenance Period will be held until after the Maintenance Period by the scheduler. During the Maintenance Period, access to all the PACE managed computational and storage resources will be unavailable.

As we get closer to the Maintenance Period, we will communicate the list of activities to be completed and update this blog post.

If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.

Best,

The PACE Team

Globus maintenance downtime on September 18

Friday, September 10, 2021 Posted by
Comments closed
Summary: Globus maintenance downtime on September 18
What’s happening and what are we doing: Globus will be undergoing maintenance worldwide on September 18, beginning at 11:00 AM and expected to last for up to 30 minutes, to complete database upgrades. Details are available on the Globus website.
How does this impact me: You will not be able to access Globus during this time nor start a transfer. Any transfers in progress will be paused and will automatically resume upon completion of maintenance. This affects all Globus services, including endpoints at PACE on our Phoenix and Hive clusters, plus others you may use at other computing sites.
If you have any questions or concerns, please direct them to pace-support@oit.gatech.edu.

[Complete] PACE is transitioning from current ticketing system FootPrints to ServiceNow

Wednesday, September 1, 2021 Posted by
Comments closed

[Update – September 3]

Dear PACE Users,

PACE has successfully transitioned to ServiceNow, and we have begun receiving user tickets as expected in ServiceNow.

As previously mentioned, you may continue to use the pace-support@oit.gatech.edu email to reach out to PACE support, and for your reference, the following three links listed below are direct links to the ServiceNow forms that you may use going forward to request for help, request new software for the PACE Apps software repository, and request access to ICE cluster.

PACE team will continue to work on the remaining support requests that are in FootPrints system.  Thank you all for your attention and patience through this transition.

If you have any questions or concerns, please direct them to pace-support@oit.gatech.edu 

Best, 

The PACE Team 

 

[Original Message – September 1]

Dear PACE Users,  

We are reaching out to inform you that PACE is transitioning from our current ticketing system FootPrints to ServiceNow. 

What’s happening and what we are doing:   PACE team is transitioning from current ticketing system, FootPrints, to ServiceNow. From September 3, all new PACE support requests will be processed in ServiceNow.  PACE will continue to work on any existing support requests that are in FootPrints.  As part of this transition, we have created two new request forms that replace our existing Software Request Form and PACE ICE Instructional Cluster Request Form.  

How does this impact me: Overall, the transition is seamless to the users for most cases with the exception of the links to our software and ICE request forms that are changing. On Friday, September 3rd, PACE support email address, pace-support@oit.gatech.edu, will redirect users’ emails/requests to ServiceNow, and the new software and ICE request form links will be available on our website. Please use those new forms if you would like to request new software for the PACE Apps software repository or if you are a course instructor interested in using PACE-ICE for your students.  Users who submitted ticket requests via FootPrints directly may use ServiceNow at https://services.gatech.edu (navigate to “Technology” & then “PACE” tile) and submit their request from the available forms.   

The following direct links to ServiceNow forms will be live and available to users on September 3: 

What we will continue to do:   We will continue to work on the existing tickets that are in FootPrints, and you may check the status of this transition on this blog post.   

If you have any questions or concerns, please direct them to pace-support@oit.gatech.edu 

Best, 

The PACE Team 

Email Relay Reconfiguration that’s Impacting PACE Utilities

Friday, August 27, 2021 Posted by
Comments closed

Dear PACE Users,

We are reaching out to inform you that on Monday, August 30, PACE will begin reconfiguring it’s utilities that send out messages to users, which will result in a change in an email address that’s listed in the “from” address to the following one, no-reply@pace.gatech.edu. These changes are required in order for us to be compliant with email notification requirements by the Institute. We want to bring this to your attention so that you are aware of the new email address that you will be receiving messages from PACE.

What’s happening and what we are doing: PACE will be making changes to utilities that send out messages to users, which will result in a change in an email address that’s listed in the “from” address. PACE will begin updating it’s utilities on Monday, August 30, that will continue through the coming weeks. More specifically, the following utilities will be reconfigured:

  • [Complete] Scheduler (all clusters): Emails from the scheduler with job status information will change from moabadmin@<scheduler>.pace.gatech.edu to being from no-reply@pace.gatech.edu.
  • PACE Support script (all clusters): Currently the pace-support script is disabled. The script will change how it sends information to the ticketing system to send it from no-reply@pace.gatech.edu and embed your email address to change the source of the ticket rather than sending as from you. This should be transparent to you the user. Previously it was sending the message to the ticket system as though it was sent from you to accomplish getting the source of the ticket identified properly.
  • [Complete] PI and Department CSR Monthly statements for Phoenix and Firebird clusters: These will change from having a pace-support@oit.gatech.edu from address to being from no-reply@pace.gatech.edu, with a reply-to of pace-support@oit.gatech.edu.
  • Security/system information (all clusters): Security violations and general system mail will be redirected to be from no-reply@pace.gatech.edu. This will include mail sent using the mail commands. System mail will be redirected to your email account as identified in GT systems. This may result in you getting mail messages that were previously left on system in an undeliverable state.
  • Head node violation messages (all clusters): The from for these messages will change from pace-support@oit.gatech.edu to being from no-reply@pace.gatech.edu and the reply-to being set to pace-support@oit.gatech.edu.
  • Scratch storage deleter messages (Phoenix & Hive): The from for these messages will change from pace-support@oit.gatech.edu to being from no-reply@pace.gatech.edu and the reply-to being set to pace-support@oit.gatech.edu.
  • Reconfigure PACE servers to send via GT outgoing mail servers (all clusters): This will increase the likelihood of email messages being delivered and also not being identified as spam. This should be transparent to you, but adds email headers for signatures and changes the server that will deliver the email.

How does this impact me: All messages that you receive from PACE utilities will be addressed from no-reply@pace.gatech.edu. If you have created email rules for your inbox for prior messages coming from PACE, please do update them accordingly with this new address, no-reply@pace.gatech.edu

What we will continue to do: In the coming weeks, PACE will work in implementing the changes listed above. You may check the status of each of the changes on this blog post.

If you have any questions or concerns, please direct them to pace-support@oit.gatech.edu

Best,

The PACE Team