PACE A Partnership for an Advanced Computing Environment

December 16, 2019

OIT Network Maintenance 12/18/2019-12/19/2019

Filed under: Maintenance,News — Semir Sarajlic @ 9:41 pm

To Our Valued PACE Research Community,

We are writing to inform our research community of upcoming maintenance, as follows: 

The Office of Information Technology (OIT) will be performing a series of upgrades to the networking infrastructure to improve the performance and reliability of networking operations. Some of these upcoming enhancements may impact PACE users’ ability to connect and interact with computational and storage resources. We do not expect that this network maintenance to have any impact on currently running jobs.   

12/18/2019 20:00-23:59 (Router Code Upgrade) An upgrade to the software on some routers is scheduled and will include an approximate 30-minute disruption to telecommunication services.  

12/18/2019 20:00 – 12/19/2019 02:00 (Date Center Router Code Upgrade & Routing Engine Upgrade)  An upgrade to the software on multiple devices will impact network connectivity across the main campus of the Georgia Institute of Technology. This disruption will include the CODA Building. 

OIT Technical Teams will be actively monitoring the progress of upgrades during the maintenance windows described above. These teams will be providing ongoing communications to student, faculty, and staff members of the Institute. A central location for progress communications will be available at http://status.gatech.edu 

Issues during the upgrade may be reported to the OIT Network Operations Center at (404)894-4669. 

We do not expect any impact on running jobs and no changes to the PACE computational and storage resources are part of this OIT Network maintenance. 

Thank you for your time and diligence,

PACE Outreach and Faculty Interaction Team

March 30, 2015

Python users of PACE: Meet “Anaconda”

Filed under: News — Semir Sarajlic @ 5:25 pm

We are happy to announce the availability of “Anaconda” python distribution for data analytics and scientific computing. This distribution is a commercial product, which is free for academic use.

https://store.continuum.io/cshop/anaconda/

The default pyhton (2.7) on PACE systems already offers a comprehensive list of scientific software, so here’s a quick pros & cons to help you decide when to use anaconda:

pros of Anaconda
—————
1. Standalone distribution with no particular dependency on a compiler or MPI stack.
2. A long list of supported libraries for python 2.7 and python 3.4 alike (PACE support for python 3.4 is little to none)
3. Fully tested and supported distribution, backed by the creators of numpy and scipy.
4. Very easy to add and upgrade packages using the “conda” package manager. PACE will regularly upgrade all packages to their latest versions.
5. Constitutes an alternative to PACE distribution when some libraries are found to be missing, outdated, or buggy.

cons of Anaconda
—————
1. Lacks MKL optimizations (which requires a separate license). PACE compiled libraries usually come with MKL and may outperform Anaconda, particularly for linear algebra routines and fast Fourier transform.
2. Limited to the libraries supported by the “conda” package manager (although the list is pretty comprehensive). In comparison, PACE distribution can be extended with any given compatible modules and libraries.
3. Cannot be substituted with PACE versions for scientific packages that are compiled with PACE python.

should you decide to give it a try, here’s how to use it:

# Make sure you remove all PACE python module(s)
module rm python        # Or better yet: module purge

# For python 2.7:
module load anaconda2/2.1.0

# For python 3.4:
module load anaconda3/2.1.0

then use “python” as usual.

Please let us know if you find that a library you need is provided by Anaconda but not PACE python, so we can add it. PACE python will continue to be the primary python distribution, hence must continue to offer all of the libraries that you need for your research.

As always, please contact us at pace-support@oit.gatech.edu and let us know if you have any problems. You can also contact mehmet.belgin@oit.gatech.edu directly to leave general feedback about your experience with this distribution.

Happy computing!

December 4, 2014

Georgia Tech mention in HPCWire Intel IPCC article

Filed under: News — Semir Sarajlic @ 9:29 pm

From the article:

Georgia Tech is conducting research that seeks to modernize quantum chemistry codes used in materials science. By designing a parallel code called GTFock, scientists can closely predict properties of materials using fundamental physical principles. This allows scalability to previously unattainable numbers of computing nodes. The team at Georgia Tech ran large batches of code on the Tianhe-2, one of the world’s most powerful computers, along with two Xeon Phi coprocessors. The experiment produced computations using more than 1.6 million cores, all working in parallel.

The code GTFock, is developed by Xing Liu, Aftab Patel, and Associate Professor Edmond Chow, of the School of Computational Science and Engineering , with assistance from Professor David Sherrill of the School of Chemistry and Biochemistry.

Original article found here:
http://www.hpcwire.com/off-the-wire/intel-piece-reveals-details-ipccs-penn-state-university-oregon-georgia-tech/?utm_source=rss&utm_medium=rss&utm_campaign=intel-piece-reveals-details-ipccs-penn-state-university-oregon-georgia-tech

September 30, 2014

A Bold New Vision For Tech Square

Filed under: News — admin @ 1:41 pm

You may have seen or heard reference to this in other places, but I wanted to highlight some exciting things coming to Tech Square.

–Neil Bright

 

http://www.news.gatech.edu/2014/09/29/bold-new-vision-tech-square

Ron Hutchins is a man on a mission. He wants to raise the visibility of Information Technology on a university campus in ways we’ve seldom seen. Hutchins, Tech’s Associate Vice Provost for Research & Technology and Chief Technology Officer, is the visionary behind the plan to build a data center in the heart of Midtown Atlanta. He’s quick to point out though that the High Performance Computing Center is more than just a building to store equipment and disseminate data. Construction of the HPCC marks the beginning of a new phase in the expansion of Tech Square.

August 29, 2014

recent staff changes in PACE

Filed under: News — admin @ 4:21 pm

I’m sorry to report that Dr. Wesley Emeneker has left the team for a position in industry. We are sad to see him leave, and wish him and his family the best in his future endeavors. We will be posting a Research Scientist position soon to fill this vacancy.

Ann Zhou <dzhou62@mail.gatech.edu> has joined the team as a Systems Support Engineer II. Ann joins us from Columbus State University and will be initially focused on user and hardware support, and taking over some of the system administration work that Wes had been doing.

We are concluding a search to fill the Senior System Support Engineer position vacated by Adam Munro earlier this year. An offer is pending, and I’m hopeful this person will start soon.

Finally, we have a search currently underway for an Applications Developer II. The position description is available at http://pace.gatech.edu/application-developer-ii. Please pass the word along to anybody who may have interest.

June 17, 2014

Physical host failure for VMs – potential job impact

Filed under: News,tech support — Semir Sarajlic @ 1:05 pm

This morning (approximately between 3am and 8am) we suffered a failure in one of our physical hosts which makes up part of our VM farm. This failure caused several head nodes to go offline, as well as one of the PACE run license servers for software.

**********
For ALL PACE run clusters, it would be wise to double check your job runs in case they may have lost their license server prior to kicking off this morning or if it was running during this time.
**********

The following head nodes went offline, but have returned:
cygnus-6
granulous
megatron
microcluster
mps
rozell
testflight-6

The following license server went offline, but has returned:
license-gt

In the cases of the head nodes, no jobs should have been affected nor any data lost because of nodes being offline.

May 13, 2014

Disk failure rate spike

Filed under: News,tech support — Semir Sarajlic @ 9:48 pm

Hey everyone,

We’ve noticed an increase in a type of disk failure on some of the storage nodes that ultimately has a severe negative impact on storage performance. In particular, we observe that certain models of drives in certain manufacturing date ranges seem to be more prone to failure.

As a result, we’re looking a bit more closely at our logs to keep an eye on how widespread this is, but most of the older storage seems fine; it has tended towards some of the newer storage using both 2Tb and 4Tb drives. The 2Tb drives are the more surprising to us as the model line involved has generally been performing as expected, with many older storage units using the same drives without having these issues.

We are also engaging our vendor to see if this is something that they are seeing elsewhere, and making sure we keep a close eye on our stock of replacements to deal with these failures.

March 31, 2014

images requested for annual CASC brochure

Filed under: News — admin @ 10:36 pm

The time has come again to gather images for the annual CASC brochure. CASC is the Coalition for Academic Scientific Computation, and GT is a member institution. We use the brochure in our advocacy efforts at the funding agencies and in D.C. Previous brochures are online at http://casc.org/research-publications.

If you have something you would be interested in sharing, please let me know. Below is some text from the CASC regarding what they are looking for.

This year marks the 25th anniversary of CASC and we want to recognize that milestone in the new brochure. If you have historical pictures, scientific visualizations and/or stories that can help us illustrate how CASC and HPC have evolved over the years, please start gathering those now. We will set up a website soon where you can upload your images and text. We hope to have everything we need by June 1, 2014.

As always, we are looking for high-quality images and stories that illustrate the impact of HPC and related technologies. The more we have the better, but we are especially interested in images and stories about research and accomplishments in Energy, Health and Medicine, Industrial Innovation, Environment and Natural Resources, Matter and the Universe, Education and Outreach, and Big Data. More information about how to upload your images and text will be sent shortly. The deadline will be earlier this year: June 15, 2014.

May 17, 2013

PC1 & PB1 filesystems back online

Filed under: News,tech support — Tags: — Semir Sarajlic @ 3:25 am

Hey folks,

It looks like we may have finally found the issue tying up the PB1 file server and the occasional lock up of the PC1 file server. We’ve isolated the compute nodes that seemed to be generating the bad traffic, and have even isolated the processes which appear to have compounded the problem on a pair of shared nodes (thus linking the two server failures). With any luck, we’ll get those nodes online once their other jobs complete or are cancelled.

Thank you for the patience you have given us while we tracked this problem down. We know it was quite inconvenient, but we have a decent picture of what occurred and thankfully it was something that is very unlikely to repeat itself.

February 5, 2013

Breaking news from NSF

Filed under: News — admin @ 8:14 pm

Looks like Dr. Subra Suresh will be stepping down from his position as Director of NSF, effective late March to become the next President of Carnegie Mellon.

Click the link here: Staff Letter 2-4-13 to download a copy of his letter to the NSF community.

Interesting times are ahead for both NSF and DOE.

Older Posts »

Powered by WordPress