PACE A Partnership for an Advanced Computing Environment

June 28, 2013

Grace 5.1.23 installed

Filed under: Uncategorized — Semir Sarajlic @ 2:16 pm

Grace

Grace is a WYSIWYG 2D plotting tool for the X Window System and M*tif.
Grace is a descendant of ACE/gr, also known as Xmgr.

Example Usage

$ module load grace/5.1.23
$ xmgrace

June 27, 2013

Newest LAMMPS 17Jun13 is Installed

Filed under: Uncategorized — Cherry Liu @ 3:46 pm

The new version is built with various compiler combinations and with fftw 3.3

after loading compiler’s modules, do

module load  mkl/10.3

module load fftw/3.3

module load lammps/17Jun13

 

June 25, 2013

Intel Cluster Studio 2013 XE Installed

Filed under: Uncategorized — Semir Sarajlic @ 1:36 pm

The Intel Cluster Studio 2013 XE software suite installation adds several new and useful tools for PACE users.

  • VTune: Intel® VTune™ Amplifier XE 2013 is a serial and parallel performance profiler for C, C++, C#, Fortran, Assembly and Java.
  • Inspector: Intel® Inspector XE is an easy to use memory debugger and thread debugger for serial and parallel applications.
  • Advisor: Intel® Advisor XE is a threading prototyping tool for C, C++, C# and Fortran.

This installation includes updated versions of many currently installed packages. The updates include:

  • MKL – updated to 11.0.1
  • TBB – updated to 4.1
  • IPP – updated to 7.1.1
  • Compilers (C, C++, Fortran) – updated to 13.2.146

To use the new or updated software, please load whichever modules are appropriate:

  • intel/13.2.146 (loads the C, C++, and Fortran compilers)
  • vtune/2013xe (loads VTune)
  • advisor/2013xe (loads Advisor)
  • inspector/2013xe (loads Inspector)
  • tbb/4.1 (loads the Thread Building Blocks)
  • ipp/7.1.1 (loads the Performance Primitives)
  • mkl/11.0.1 (load the Math Kernel Library)

For information on using VTune, Inspector, Advisor, or any of the Intel tools, see the Intel Cluster Studio XE site.

June 24, 2013

New 128-procs Allinea DDT license on PACE clusters

Filed under: Uncategorized — Semir Sarajlic @ 5:37 pm

Allinea DDT is a powerful parallel debugger with an easy-to-use GUI. You can run it by loading its module (module load ddt/3.2) and entering “ddt”. Some introduction level information can be found in “http://pace.gatech.edu/workshop/DebuggingProfiling.pdf“.

We extended our single-user 32-procs license to multi-user 128 procs. Aside from the increased number of processors, this license allows multiple users to use the software at the same time, as long as the total number of processors do not exceed 128. E.g., two users can use the software with 64procs run each.

Happy debugging!

 

 

June 6, 2013

PACE Systems Back Online

Filed under: Uncategorized — Semir Sarajlic @ 6:48 pm

The fileserver has recovered, and all headnodes are now accessible. The jobs running off scratch should continue from where they left. You have access to all files, including the scratch. The server is still performing reconstruction of data, which may slow down the system (especially on volumes v0 and v3) for a few more hours. This slowness will go away when the reconstruction is complete.

We are expecting to receive the failed part tomorrow (6/6). The fileserver can function without this part and its installation will not cause any interruptions.

Once again, thank you for bearing with us while we were working on this problem. If you have jobs that you think crashed due to this problem, please send us an email at pace-support@oit.gatech.edu.

Login Problems, current situation

Filed under: Uncategorized — Semir Sarajlic @ 4:28 pm

The Panasas fileserver (scratch storage) crashed today while recovering from a hardware problem. This causes the headnodes (that mount Panasas) to hang, and they are not accessible via SSH now.

We do have a way to disable Panasas and give you access to headnodes right away, without the panasas storage. However, doing so will crash all of the jobs using the scratch space. We do not want that, especially considering that some jobs have been running for days.

We are now running a filesystem check on the system, which will take 3 to 4 hours. This is required to prevent data corruption. After this process, Panasas should recover and the jobs will continue running. At the point, the headnodes will become accessible again.

If you urgently need to access your data in your home or project directories, please contact us at pace-support@oit.gatech.edu. We might be able to help you access your files via a headnode that does not mount Panasas.

The filesystem check has been running for 40 minutes and current at 26% (by 12:25pm EST).

Thank you once again for your understanding and patience, and we apologize for this inconvenience,

Login Problems

Filed under: Uncategorized — rlara3 @ 1:20 pm

With the exception of RHEL-5 Atlas users, it is currently not possible for regular users to log into PACE, due to a problem with the PANFS storage system. We are working to get the problem resolved as quickly as possible.

June 4, 2013

PC1 back online, troublesome process identified

Filed under: Uncategorized — Semir Sarajlic @ 8:34 pm

Hey Cygnus users!

It looks like we have finally been able to identify the cause of recent file server crashes and tracked it down to a particular job run and how it hands file I/O. We’re in contact with the user now to try to improve the job’s I/O behavior to prevent this from happening again (at least, with this job).

Thank you for your patience, we know this has been inconvenient.

Powered by WordPress