PACE A Partnership for an Advanced Computing Environment

April 30, 2015

GPFS storage troubles

Filed under: tech support — admin @ 8:15 pm

Dear PACE users,

As part of last week’s maintenance activities, we upgraded the GPFS client software on head nodes and compute nodes to a level recommended by the vendor.

Thanks to some troubling reports from PACE users, we have determined that the new client software has a subtle bug that will cause writes to fail under certain circumstances. We have identified two replicable cases so far, “CMAKE” failing to compile codes, and “LAMMPS” silently exiting after dumping a single line of text.

We have been in close contact with the vendor for an urgent resolution, and have escalated the incident to the highest executive levels. At this point, we have a couple paths to resolution, either moving forward to a newer release or reverting to the version we were running before last week. We are moving to quickly evaluate the merits of both approaches. Implementing either will likely involve a rolling reboot on compute and head nodes. We understand the inconvenience a downtime will cause, and will engage the vendor to find ways to address this problem with minimal interruption.

One way to find out if you are using GPFS is by running the “pace-quota” command, and checking if any of the paths begin with “gpfs”, “pme2” or “pet1”. If you are running on GPFS and having unexplainable problems with your codes, please contact pace-support@oit.gatech.edu and try to use other storage locations to which you have access (e.g. ~/scratch).

A more detailed description of this bug and the code we used to replicate it can be found here.

We will continue to keep you updated on the progress.

April 24, 2015

PACE clusters ready for research

Filed under: Uncategorized — admin @ 3:10 am

Greetings,

Our quarterly maintenance is now complete.  We have no known outstanding issues affecting general operations, but do have some notes for specific clusters which have been sent separately.

Just a reminder that we have removed the modules for old MPI versions (and applications compiled with them), which are known to be incompatible with the new scheduler servers. Please make sure to check your module lists for compatibility before sending new jobs. Accordingly, we have new default versions of the MPI modules. If you do not explicitly specify a version, mvapich2/1.9 or openmpi/1.6.2 will be loaded by default. Our new repository is almost ready for testing, but it requires a post processing step and migration to shared storage, which may take another couple of days.  We will send another communication when this is ready.

Given the delays, we have deferred our stretch goals until a future maintenance window.  These filesystems successfully moved to their new locations across campus and are traversing the new dual 40-gigabit path between data centers.

I’ll take another opportunity to apologize for the extended downtime this week.  We will be taking a critical look at the events that lead to these delays and learn from the events of this week as much as we can.

–Neil Bright

April 23, 2015

PACE maintenance continues into Thursday

Filed under: tech support — admin @ 3:26 am

Folks,

We’ve had some setbacks and things are taking much longer than expected.  We have our issues resolved and a path forward at this point, but we’re going to have to extend the maintenance period into tomorrow.  I apologize for this, and ask continued patience while we get things back into working order.

 

–Neil Bright

April 21, 2015

PACE maintenance is underway

Filed under: tech support — admin @ 10:29 am

For details, please see our previous post, here.

April 16, 2015

PACE quarterly maintenance – April ’15

Filed under: tech support — admin @ 8:19 pm

Greetings everybody.  It’s again time for the quarterly PACE maintenance.  As usual, we will have all PACE clusters down Tuesday and Wednesday of next week, April 21 and 22.  We’ll get started at 6:00am on Tuesday, and have things back to you as soon as possible.There’s some significant changes this time around, so please continue on.

Moab/Torque scheduler:
Last maintenance period, we deployed a new scheduler for the Atlas and Joe clusters.  This time around, we’re continuing that rollout to the rest of our clusters.  Some of the highlights are:

  • increased responsiveness from commands like qsub & showq
  • need to resubmit jobs that haven’t run yet
  • removal of older, incompatible versions of mvapich and openmpi
  • required changes to our /usr/local software repository

Mehmet Belgin has posted a detailed note about the scheduler upgrade on our blog here.  He has also posted a similar note about the related updates to our software repository here.

Additionally, the x2200-6.3 queues on the Atlas cluster will be renamed to atlas-6-sunib and atlas-6-sunge.

Networking:
We’ve deployed upgraded network equipment to upgrade the core of the PACE network to 40-gigabit ethernet and will transition to this new core during the maintenance period.  This new network brings additional capability to utilize data center space outside of the Rich building, and provides a path for future 100-gigabit external connectivity and ScienceDMZ services.  Stay tuned for further developments. 😉  Additionally, the campus network team will be upgrading the firmware of a number of our existing switches with some security related fixes.

Storage:
The network upgrades above will allow us to relocate some of our project directory servers to OIT data center space on Marietta Street, as we’re pressed for generator-protected space in Rich.  We will also be doing some security patching, highly recommended updates and performance optimizations on the DDN/GPFS storage.  As a stretch goal, we will also migrate some filesystems to GPFS.  If we are pressed for time, they will move with their old servers. Details regarding which filesystems are available on our blog here.

Operating System patching:
Last, but not least, we have a couple of OS patches.  We’ll complete the rollout of a glibc patch for the highly publicized “Ghost” vulnerability, as well as deploy a bug fix for autofs that addresses a bug which would sometimes cause a failure to mount /nv filesystems.

Important Changes to PACE storage

Filed under: tech support — admin @ 8:07 pm

During our quarterly maintenance period next week, we will relocate some of our project directory servers to OIT data center space on Marietta Street, as we’re pressed for generator-protected space in Rich.  This is a major undertaking, with over 20 servers moving.  Our intent is that no change is needed on your part, but wanted to ensure transparency in our activities.  The list below contains all of the affected filesystems.  The list of filesystems to which you have access can be obtained with the ‘pace-quota’ command.

  • home directories for all clusters except Gryphon and Tardis
  • /nv/pb4, /nv/archive-bio1 (BioCluster)
  • /nv/hchpro1, /nv/pchpro1 (Chemprot)
  • /nv/pas1 (Enterprise)
  • /nv/pase1 (Ase1)
  • /nv/pb2 (Optimus)
  • /nv/pbiobot1 (BioBot)
  • /nv/pc4, /nv/pc5, /nv/pc6 (Cygnus)
  • /nv/pccl2 (Gryphon, legacy)
  • /nv/pcoc1 (Monkeys)
  • /nv/pe1, /nv/pe2, /nv/pe3, /nv/pe4, /nv/pe5, /nv/pe6, /nv/pe7, /nv/pe8, /nv/pe9, /nv/pe10, /nv/pe11, /nv/pe12, /nv/pe13, /nv/pe14 (Atlas)
  • /nv/hp1, /nv/pf1, /nv/pf2 (FoRCE)
  • /nv/pface1 (Faceoff)
  • /nv/pg1 (Granulous)
  • /nv/pggate1 (GGate)
  • /nv/planns (Lanns)
  • /nv/pmart1 (Martini)
  • /nv/pmeg1 (Megatron)
  • /nv/pmicro1 (Microcluster)
  • /nv/pska1 (Skadi)
  • /nv/ptml1 (Tmlhpc)
  • /nv/py2 (Uranus)
  • /nv/pz2 (Athena)
  • /nv/pzo1, /nv/pzo2 (backups for Zohar and NeoZhoar)

Additionally, the following filesystems will be migrated to GPFS:

  • /nv/pcee1 (cee.pace)
  • /nv/pme1, /nv/pme2, /nv/pme3, /nv/pme4, /nv/pme5, /nv/pme6, /nv/pme7, /nv/pme8 (Prometheus)

As a stretch goal, we will also migrate the following filesystems to GPFS.  If we are pressed for time, they will move with their old servers as listed above.

  • /nv/hp1, /nv/pf1, /nv/pf2 (FoRCE)
  • /nv/pas1 (Enterprise)
  • /nv/pbiobot1 (BioBot)
  • /nv/pccl2 (Gryphon, legacy)
  • /nv/pggate1 (GGate)
  • /nv/planns (Lanns)
  • /nv/ptml1 (Tmlhpc)

Important Notes on Coming PACE Scheduler Upgrades

Filed under: tech support — Semir Sarajlic @ 6:21 pm

We have been running an upgraded scheduler version on Joe and Atlas clusters, who had graciously volunteered to test it out since January. This version brings significant performance and stability improvements, and we are looking forward to roll out the upgrades to the rest of the PACE universe during this maintenance period. Please note the following important changes, which will apply to all PACE users.

  • The new schedulers are not compatible with MPI versions older than mvapich/1.9 and openmpi/1.6.2. If you are using one of the older MPI stacks (a warning would be printed when you load their modules), you will need to replace it with one of the recent versions. This motivated the creation of  a new and improved software repository, which will be available after the maintenance day. For more details, please see our related post.

 

  • The current version uses a different type of database, so we will not be able to migrate submitted jobs.  The scheduler will start with an empty queue, and you will need to resubmit your jobs after the maintenance day. This applies to Joe and Atlas jobs as well, as we are merging exclusive queues on a new and more powerful server with the exception of Tardis, Gryphon and Testflight.

 

  • We will start using the “node packing” policy which allocates as many jobs on a node as possible before jumping on the next one. With the current version, users can submit many single-core jobs, each landing on a separate node, making it more difficult for the scheduler to start jobs that require entire nodes.

 

  • This version fixes a bug that prevents use of msub for interactive jobs. The recommendation from the vendor company is to use “qsub” for everything (we confirmed that it’s much faster than msub), but this bug fix gives you the freedom to pick either tool.

 

  • There will no longer be a discrepancy between job IDs generated by msub (Moab.###) and qsub (####). You will always see a single job ID (in plain number format) regardless of your msub/qsub preference.

 

  • Speed — new versions of Moab and Torque are now multithreaded, making it possible for some query commands (e.g. showq) to return instantly regardless of the load on the scheduler. Currently, when a user submits a large job array, these commands usually timeout.

 

  • Introduction of cpusets — when a user is given X cores, he/she will not be able to use more than that. Currently, users can easily violate the requested limits by spawning any number of processes/threads and Torque cannot do much to stop that. The use of cpusets will significantly reduce the job interference and allows us to finally use ‘node packing’ as explained above.

 

  • Several other benefits from bug fixes and improvements are (including but not limited to) less number of zombie processes, lost output files, missing array jobs. We also expect visible improvements in job allocation times and less frequent command timeouts.

 

We hope these improvements will provide you with a more efficient and productive computing environment. Please let us know (pace-support@oit.gatech.edu) if you have any concerns or questions!

Important Changes to PACE Scientific Software Repository

Filed under: tech support — Semir Sarajlic @ 6:09 pm

As announced earlier, we will remove a set of old MPI stacks (and applications that use them) from the PACE software repository after the April maintenance day. This is required by the planned upgrade of the schedulers (torque and moab), which use libraries that are incompatible with the old MPI stacks. Some MPI-related Python modules (e.g. mpi4py) are built on one of these old MPI versions (namely mvapich2/1.6) and they will also stop working with the new scheduler.

Old MPI versions are also known to have significant performance and scalability problems, and they are no longer supported by developers, therefore their expulsion was inevitable regardless of the scheduler upgrades. Namely, all versions older than than “mvapich2/1.9” and “openmpi/1.6.2” are known to be incompatible, and will be removed along with applications that are compiled with them. MPI stacks newer than these versions are compatible with the new scheduler version, so they will continue to be available. PACE team is ready to offer assistance with all the changes you may need to replace these old MPI versions with new versions with minimal interruptions to your research.

We saw these problems as an opportunity to start creating a new and improved software repository almost from scratch, which not only fixes the MPI problems, but also provides added benefits such as:

* a cleaner MPI versioning without long confusing subversions such as “1.9rc1” or “2.0ga”: You will see a only a single subversion for each major release, e.g.,

mvapich2: 1.9, 2.0, 2.1, …
openmpi: 1.6, 1.7, 1.8, …

* latest software versions: We showed a best effort to compile the most recent (stable) versions as we could, unless they had compilation problems or proved to be buggy.

* a new python that allows parallelization without requiring InfiniBand (IB) network: Current python uses mvapich2, which requires IB network. The new python, on the other hand, will employ openmpi, which can run on *any* node regardless of their network connection while still taking advantage of IB when available.

We will start offering this new repository as an alternative after the April maintenance day. Switching between old and the new repository will be as easy as loading/unloading a module named “newrepo”. E.g.:

# Make sure there are no loaded modules
$module purge
$module load newrepo

… You are now using the new repo …

# since newrepo is also a module itself, ‘module purge’ will put you back in the old repo
$module purge

… You are back in the old repo …

The current plan is to decommission the old repository after the July maintenance, therefore strongly encourage you to try the new repository (which is still beta) as soon as possible to ensure a smooth transition. If the new repository is working for you, continue to use it and never look back. If you notice problems or missing components, you can continue to use the old repository while we are working on fixing them.

Please keep in mind that the new repo is created almost from scratch, so expect changes in module names, as well as new set of dependencies/conflicts between the modules. PACE team is always ready to provide module suggestions for your applications, or answer any other questions that you may have.

We hope the new repository will make a positive contribution to your research environment with visible improvements in performance, stability and scalability.

Thanks!
PACE Team

Powered by WordPress