GT Home : : Campus Maps : : GT Directory

Author Archive

[Resolved]: PACE Maintenance Days 8/6/2020-8/8/2020

Posted by on Friday, 24 July, 2020

Dear PACE Users,

RESOLVED: PACE is now ready for research.

We are preparing for our next PACE maintenance period, which will begin at 6:00 AM on August 6th, 2020 and conclude at 11:59 PM on August 8th, 2020. As usual, jobs with long walltimes will be held by the scheduler to ensure that no active jobs will be running when systems are powered off. These jobs will be released as soon as the maintenance activities are complete.

We are still finalizing planned activities for the maintenance period. Here is a current list:

ITEMS REQUIRING USER ACTION:

– None Current.

ITEMS NOT REQUIRING USER ACTION:

– [Resolved] Coda Lustre Upgrade (This will start on Wednesday (08/05), which will impact testflight-coda only, and a scheduler reservation was put in place to prevent any jobs from running past 6:00AM on Wednesday – August 5).

– [Resolved] Install additional line cards for CS8500 Infiniband switch.

– [Resolved] Deploy PBSToools RPM on schedulers

– [Resolved] Upgrade Hive Infiniband switches firmware to version 3.9.0914

– [Resolved] Upgrade Coda Infiniband director switches firmware to version 3.9.0914

– [Resolved] Move DNS appliance from Rich to Coda.

– [Resolved] Update coda-apps file system mounts to use qtrees from NetApp on all servers.

– [Deferred] Update Nvidia GPU Drivers in Coda to support Cuda 11 SDK.

– [Resolved] Reboot of all nodes.

– [Resolved] Rebooted the subnet manager.

 

If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.

 

The PACE Team

[Resolved] Issue with InfiniBand Fabric and subnet managers

Posted by on Friday, 26 June, 2020

Early today, the InfiniBand Fabric located in the Rich Datacenter (where most PACE resources are located) developed issues reaching the subnet managers. After on-site troubleshooting, the subnet manager was initialized. As of 11:30 AM local time, the InfiniBand Fabric is operational.

Some running jobs might have been affected during the outage period as well as potential issues in new jobs using MPI.

Please check any jobs for any potential issues and we deeply apologize for any inconvenience that may have occurred.

OIT Network Services Team Firewall upgrades (5/5/2020)

Posted by on Monday, 27 April, 2020

PACE has been informed that the OIT Network Services Team is preparing for software upgrades on multiple firewall servers across the Georgia Institute of Technology Atlanta campus on 5/5/2020 20:00 – 23:59, 5/7/2020 20:00 – 23:59, 5/8/2020 19:00 – 5/9/2020 02:00. While there are no direct impacts on the Rich and Coda Datacenter networks, there is potential for interruptions in connections to license servers, which can lead to job failures. Applications which may be impacted include

  • Abaqus
  • Ansys
  • Comsol
  • Dymola
  • Matlab

and any other application that may have a license server not internal to PACE. Due to potential interruptions, please check any jobs scheduled to run during these periods. PACE apologizes for any impact on your research workflow that this may cause. 

The Network Team will report their status for the project via the status.gatech.edu. Please check blog.pace.gatech.edu for updates. 

OIT Network Maintenance 12/18/2019-12/19/2019

Posted by on Monday, 16 December, 2019

To Our Valued PACE Research Community,

We are writing to inform our research community of upcoming maintenance, as follows: 

The Office of Information Technology (OIT) will be performing a series of upgrades to the networking infrastructure to improve the performance and reliability of networking operations. Some of these upcoming enhancements may impact PACE users’ ability to connect and interact with computational and storage resources. We do not expect that this network maintenance to have any impact on currently running jobs.   

12/18/2019 20:00-23:59 (Router Code Upgrade) An upgrade to the software on some routers is scheduled and will include an approximate 30-minute disruption to telecommunication services.  

12/18/2019 20:00 – 12/19/2019 02:00 (Date Center Router Code Upgrade & Routing Engine Upgrade)  An upgrade to the software on multiple devices will impact network connectivity across the main campus of the Georgia Institute of Technology. This disruption will include the CODA Building. 

OIT Technical Teams will be actively monitoring the progress of upgrades during the maintenance windows described above. These teams will be providing ongoing communications to student, faculty, and staff members of the Institute. A central location for progress communications will be available at http://status.gatech.edu 

Issues during the upgrade may be reported to the OIT Network Operations Center at (404)894-4669. 

We do not expect any impact on running jobs and no changes to the PACE computational and storage resources are part of this OIT Network maintenance. 

Thank you for your time and diligence,

PACE Outreach and Faculty Interaction Team

The Launcher Documentation Available

Posted by on Tuesday, 10 September, 2019

The Launcher (link) is a framework for running large collections of serial or multi-thread applications as a single job on a batch-scheduled HPC system. The Launcher was developed at the Texas Advanced Computing Center (TACC) and has been deployed at multiple HPC centers throughout the world. The Launcher allows High-Throughput Computing users to take advantage of the benefits of scheduling larger single jobs and to better fit within the HPC environment. 

To better serve our High-throughput Computing users, we have adapted this software for use on the PACE systems.

Information on using Launcher on PACE is available at PACE Documentation.