GT Home : : Campus Maps : : GT Directory

Archive for April, 2014

SC14 Program Offers Immersive Program in HPC for Undergrads

Posted by on Thursday, 24 April, 2014

Applications are now being accepted for Experiencing HPC for Undergraduates, a program designed to introduce high performance computing (HPC) research topics and techniques to undergraduate students at the sophomore level and above. The program introduces various aspects of HPC research at the SC14 Conference to increase awareness of opportunities to perform research as an undergraduate and potentially in graduate school or in a job related to HPC topics in computer science and computational science.

SC14 will be held Nov. 16-21, 2014 in New Orleans. Complete conference information can be found at:http://sc14.supercomputing.org

The Experiencing HPC for Undergraduates Program contains selected parts of the main SC Technical Program, with several additional elements. Special sessions include panels with current graduate students in HPC areas to discuss graduate school and research, and panels with senior HPC researchers from universities, government and industrial labs to discuss career opportunities in HPC fields.

Prof. Jeff Hollingsworth, co-chair of Experiencing HPC for Undergraduates, discusses the program and the need to develop the next generation of HPC professionals in an HPCwire podcast at: http://www.hpcwire.com/soundbite/toward-next-generation-hpc-professionals/

Applications must be submitted using the SC14 submission site at https://submissions.supercomputing.org/. The deadline to apply is Sunday, June 15.

ANSYS version 15 and Matlab R2014a installed

Posted by on Thursday, 24 April, 2014

ANSYS version 15 and Matlab version R2014a have been installed on PACE clusters.
To see examples of how to properly load and use the new versions, execute the following commands and follow the instructions provided.

$ module help ansys/15.0

$ module help matlab/r2014a

If you have any problems executing the examples given by “module help”, please contact pace-support@oit.gatech.edu

Mvapich2 2.0rc1 available in PACE repository

Posted by on Friday, 18 April, 2014

We have installed the most recent Mvapich2 stack (2.0rc1), which is available via module “mvapich2/2.0rc1″. Please see this changelog if you would like to know more about the improvements this version provides.

Also, please note that we have not started rebuilding any applications with this stack yet. If you think it will provide significant benefits for any existing application, please send us an email to pace-support@oit.gatech.edu and we will be happy to recompile that application for you.

Another quick note is that versions mvapich1.6 to mvapich1.8 are known to have performance problems, which are fixed with 1.8 (hint: search for “Georgia Institute of Technology” in the changelog).  We are still keeping them in the repository for backwards compatibility, but please refrain from using these old versions as you can.

Happy computing!

Linux Cluster Institute Workshop

Posted by on Friday, 18 April, 2014

FYI –

Please pass along to anybody you think may be interested.  You may see some familiar faces there!  😉
We haven’t flushed out all of the details yet, but registration is likely to be somewhere in the $200-$300 range – pretty reasonable for a weeks worth of training. Official announcement follows below.

–Neil Bright

Save the date and plan to attend!
Linux Cluster Institute (LCI) Workshop
August 4-5, 2014
National Center for Supercomputing Applications (NCSA)
Urbana, Illinois

If you are a user of HPC or are responsible for maintaining an HPC resource, this is the workshop for you!  In just four days you will learn:

  • How to be an HPC cluster system administrator
  • How to be an effective HPC cluster user
  • The key issues of HPC
  • Current and emerging HPC hardware and software technologies

All sessions taught by some of the world’s best experts in HPC.
Program details and registration information coming soon!
www.linuxclusterinstitute.org

PACE clusters ready for research

Posted by on Thursday, 17 April, 2014

Our quarterly maintenance is now complete, and the clusters are running previously submitted jobs and awaiting new submissions.

We have successfully completed a number of things:

  • Athena has been fully migrated to RedHat 6.3
  • The BioCluster /nv/pb4 filesystem has been migrated to the DDN space
  • All our Solaris storage servers have been patched
  • firewall upgrades are complete
  • electrical distribution
  • DDN updates
  • VMware updates
  • the mathlocal collection of software has been migrated to /nv/pma1

However, we were unable to complete the upgrade of the TestFlight cluster to RedHat 6.5.  At the moment TestFlight is down, and we will complete the upgrade over the next couple of days.

As always, please contact us (pace-support@oit.gatech.edu) for any problems or concerns you may have. Your feedback is very important for us, especially regarding file transfers in and out of the clusters.  (i.e. between your workstations and the PACE clusters)

PACE quarterly maintenance – April 15-16 2014

Posted by on Wednesday, 9 April, 2014

PACE Quarterly maintenance has begun

See this space for updates.

PACE Quarterly maintenance notification

It’s time again for our quarterly maintenance.  We will have the clusters down April 15 & 16.

As usual, we’ve instructed the schedulers to avoid running jobs that would cross into a planned maintenance window.  This will prevent running jobs from being killed, but also may mean jobs you submit now may not run until after maintenance completes.  I would suggest checking the wall times for the jobs you will be submitting and, if possible, modify them accordingly so they will complete sometime before the maintenance. Submitting jobs with longer wall times is still OK, but they will be held by the scheduler and released after maintenance completes.

Much of our activities time around are not directly visible, with a couple of notable exceptions.

We will be upgrading the operating system on our TestFlight cluster from RedHat 6.3 to RedHat 6.5.  Please do test your codes on this cluster over the coming weeks and months, as we plan to roll it out (along with any needed fixes) to all other RedHat 6 clusters in July.  This update is expected to bring some performance improvements, as well as some critical security fixes.  Additionally, it adds support for the Intel Ivy Bridge platform, which many of you are ordering.  Any new Ivy Bridge platforms will start with RedHat 6.5.

Other user visible changes include:

  • conclude the migration of the Athena cluster to RedHat 6.3.  We’ll plan to take Athena to 6.5 in July.
  • conclude the migration of the BioCluster /nv/pb4 filesystem to the DDN/GPFS space.
  • migrate mathlocal from /nv/hp24 to /nv/pma1 (Math cluster project space)
  • application of recommended and security patches to our Solaris storage systems.  This is a widespread update will affect filesystems that start with /nv.  A rapid reversion process is available should unanticipated events occur.
  • firewall upgrades to increase bandwidth between PACE and campus

Not so apparent changes include:

  • repairing some electrical distribution to compute node racks
  • minor software/firmware update to DDN to enable support of DDN/WOS evaluation
  • updates to VMware “hardware” levels, enabled by previous migration to VMware 5.1

As always, please follow our blog for communications, especially for announcements during our maintenance activities – and let us know of any concerns via pace-support@oit.gatech.edu.

[RESOLVED] PACE clusters experiencing problems

Posted by on Monday, 7 April, 2014

We’ve identified the source of problems which impacted all of the clusters this (4/7) afternoon.  While making preparations to deploy some firewall upgrades for PACE, one of the campus network team members inadvertently applied a misconfiguration to one of our core network links.  This resulted in widespread packet loss across the PACE internal network.

The head nodes seem to have recovered properly, but please let us know if you see continued issues there.  While it is possible that jobs have been lost, we believe that most things will have recovered without loss.

We’ll continue to monitor the situation and address any remaining problems as soon as we are able.

PACE Team