PACE A Partnership for an Advanced Computing Environment

September 29, 2012

Joe Cluster Status

Filed under: tech support — Tags: — Semir Sarajlic @ 8:08 pm

Around 8, 8:30pm on September 28, 2012, a power event took down the TSRB data center, knocking a significant fraction of the Joe cluster offline.

With assistance from Operations, we are now bringing these nodes online after determining that several of the management switches for these nodes did not recover from the event gracefully. As these switches control our ability to manage the nodes, we had to wait until the switches were available to bring nodes online, now at about 4pm on September 29, 2012.

Jobs that were running on these nodes (iw-a2-* and iw-a3-*) at the time of the outage may have terminated abnormally. Jobs scheduled but not running should be fine.

UPDATE @ 4:40pm, 2012-09-29: All nodes are online.

September 25, 2012

New and Updated Software: GCC, Maxima, OpenCV, Boost, ncbi_blast

Filed under: tech support — Semir Sarajlic @ 4:17 pm

Software Installation and Updates

We have had several requests for new or updated software since the last post on August 14.
Here are the details about the updates.
All of this software is installed on RHEL6 clusters (including force-6, uranus-6, ece, math, apurimac, joe-6, etc.)

GCC 4.7.2

The GNU Compiler Collection (GCC) includes compilers for many languages (C, C++, Fortran, Java, and Go).
This latest version of GCC supports advanced optimizations for the latest compute nodes in PACE.

Here is how to use it:

$ module load gcc/4.7.2
$ gcc <source.c>
$ gfortran <source.f>
$ g++ <source.cpp>

Versions of GCC already installed on RHEL6 cluster are gcc/4.4.5, gcc/4.6.2, and gcc/4.7.0

Maxima 5.28.0

Maxima is a system for the manipulation of symbolic and numerical expressions, including differentiation, integration, Taylor series, Laplace transforms, ordinary differential equations, systems of linear equations, polynomials, and sets, lists, vectors, matrices, and tensors. Maxima yields high precision numeric results by using exact fractions, arbitrary precision integers, and variable precision floating point numbers. Maxima can plot functions and data in two and three dimensions.

Here is how to use it:

$ module load clisp/2.49.0 maxima/5.28.0
$ maxima
#If you have X-Forwarding turned on, "xmaxima" will display a GUI with a tutorial
$ xmaxima

OpenCV 2.4.2

OpenCV (Open Source Computer Vision) is a library of programming functions for real time computer vision.

OpenCV is released under a BSD license, it is free for both academic and commercial use. It has C++, C, Python and soon Java interfaces running on Windows, Linux, Android and Mac. The library has more than 2500 optimized algorithms.

This installation of OpenCV has been installed with support for Python and NumPy. It has been installed without support for Intel TBB, Intel IPP, or CUDA.

Here is how to use it:

$ module load gcc/4.4.5 opencv/2.4.2
$ g++ <source.cpp> $(pkg-config --libs opencv)

Boost

Boost provides free peer-reviewed portable C++ source libraries.
Boost libraries are intended to be widely useful, and usable across a broad spectrum of applications.

Here is how to use it:

$ module load boost/1.51.0
$ g++ <source.cpp>

NCBI BLAST

Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.

Here is how to use it:

$ module load gcc/4.4.5 ncbi_blast/2.2.27
$ blastn
$ blastp
$ blastx
...

September 19, 2012

Registration open for OpenACC GPU Programming Workshop

Filed under: Events,News — Tags: , , , — Semir Sarajlic @ 1:47 pm

Extreme Science and Engineering Discovery Environment
http://xsede.org/

Registration open for October 2012
OpenACC GPU Programming Workshop

One hundred registrants will be accepted for the OpenACC GPU Programming Workshop, to be held October 16 and 17, 2012. The workshop includes hand-on access to Keeneland, the newest XSEDE resource, which is managed by the Georgia Institute of Technology (Georgia Tech) and the National Institute for Computational Sciences, an XSEDE partner institution.

Based on demand, the workshop is scheduled to be held at ten different sites around the country. Anyone interested in participating is asked to follow the link below and then register by clicking on the preferred site. Only the first 100 registrants will be accepted.

The workshop is offered by the Pittsburgh Supercomputing Center, the National Institute for Computational Sciences, and Georgia Tech.

Questions? Contact Tom Maiden at tmaiden@psc.edu.

Register and read more about the workshop at:
http://www.psc.edu/index.php/training/openacc-gpu-programming

[XSEDE is supported by the National Science Foundation; https://www.xsede.org, info@xsede.org.]

September 14, 2012

Free MATLAB Technical Seminars on Tuesday

Filed under: Events,News — Tags: , — Semir Sarajlic @ 6:43 pm

As a friendly reminder, you are invited to join MathWorks for complimentary MATLAB seminars on Tuesday, September 18, 2012 in Room 144 in Clough Undergraduate Commons.

–Register now– Register at http://www.mathworks.com/seminars/GATech2012

–Agenda—

5:30 – 6:30 p.m.
Session 1: What’s New in MATLAB?
Presented By: Loren Shure, Principal MATLAB Developer (KEYNOTE SPEAKER)

In this session, we will demonstrate workflow examples highlighting and utilizing new MATLAB features. The latest MATLAB release, R2012b, introduces a redesigned Desktop, making it easier to help both new and experienced users navigate the continuously expanding capabilities within MATLAB.

Loren has worked at MathWorks for over 25 years. She has co-authored several MathWorks products in addition to adding core functionality to MATLAB. Loren currently works on the design of the MATLAB language. She graduated from MIT with a B.Sc. in physics and has a Ph.D. in marine geophysics from the University of California, San Diego, Scripps Institution of Oceanography. Loren writes about MATLAB on her blog, The Art of MATLAB.

6:30 – 7:00 p.m.
Georgia Tech Alumni Panel

Hear from a selection of Georgia Tech Alumni who now work at The MathWorks as they discuss their career paths. (Pizza will be served.)

7:00 – 8:30 p.m.
Session 2: Parallel and GPU Computing with MATLAB
Presented By: Jiro Doke, Ph.D., Senior Application Engineer and Georgia Tech alumnus

In this session you will learn how to solve computationally and data-intensive problems using multicore processors, GPUs, and computer clusters. We will introduce you to high-level programming constructs that allow you to parallelize MATLAB applications and run them on multiple processors. We will show you how to overcome the memory limits of your desktop computer by distributing your data on a large scale computing resource, such as a cluster. We will also demonstrate how to take advantage of GPUs to speed up computations without low-level programming. Highlights include:
· Toolboxes with built-in support for parallel computing
· Creating parallel applications to speed up independent tasks
· Scaling up to computer clusters, grid environments or clouds
· Employing GPUs to speed up your computations

Jiro joined MathWorks in May 2006 as an application engineer. He received his B.S. from Georgia Institute of Technology and Ph.D. from the University of Michigan, both in Mechanical Engineering. His Ph.D. research was in biomechanics of human movement, specifically in human gait. His experience in MATLAB comes from extensive use in graduate school, using the tool for data acquisition, analysis, and visualization. At MathWorks, Jiro focuses on core MATLAB; math, statistics and optimization tools; and parallel computing tools.

September 13, 2012

Joe file server back online

Filed under: tech support — Tags: — Semir Sarajlic @ 7:17 pm

After working with the network team, we appear to have stabilized the networking for the file server. We apologize for the inconvenience.

Joe file server still having difficulties

Filed under: tech support — Tags: — Semir Sarajlic @ 1:48 pm

The network interfaces on the file server providing service to Joe cluster are currently having problems determining which is up and which is down. This started around 4:30am, and we are engaging the network team to isolate the problem to the machine, cables, or switches.

September 12, 2012

Joe Fileserver fixed

Filed under: tech support — Tags: — Semir Sarajlic @ 8:51 pm

The fileserver that houses Joe users’ data ( hp3 / pj1 ) started acting squirrelly this morning, finding itself unable to connect to the PACE LDAP server. That, in turn, caused Joe users to have problems logging in or having their jobs hang up because the fileserver could not authenticate users/jobs.

Restarting all the services on the fileserver rectified the problem.

September 11, 2012

Scratch storage issues: update

Filed under: tech support — pm35 @ 3:02 pm

Scratch storage status update:

We continue to work with Panasas on the difficulties with our high-speed scratch storage system. Since the last update, we have received and installed two PAS-11 test shelves and have successfully reproduced our problems on them under the current production software version. We then updated to their latest release and re-tested only to observe a similar problem with this new release as well.

We’re continuing to do what we can to encourage the company to find a solution but are also exploring alternative technologies. We apologize for the inconvenience and will continue to update you with our progress.

[updated] Scratch Storage and Scheduler Concerns

Filed under: tech support — admin @ 2:29 pm

Scheduler

The new server for the workload scheduler seems to have gone well.  We haven’t received much user feedback, but what we have received has been positive.  This matches with our own observations as well.  Presuming things continue to go well, we will relax some of our rate-limiting tuning paramaters on Thursday morning.  This shouldn’t cause any interruptions (even of submitting new jobs) but should allow the scheduler to start new jobs at a faster rate.  The net effect is to try and decrease wait times some users have been seeing.  We’ll slowly increase this parameter and monitor for bad behavior.

Scratch Storage

The story of the Panasas scratch storage does not go as well.  Last week, we received two “shelves” worth of storage to test.  (For comparison, we have five in production.)  Over the weekend, we put these through synthetic tests, designed to mimic the behavior that causes them to fail.  The good news is that we were able to replicate the problem in the testbed.  The bad news is that the highly anticipated new firmware provided by the vendor still does not fix the issues.  We continue to press Panasas quite aggressively for resolution and are looking into contingency plans – including alternate vendors.  Given that we are five weeks out from our normal maintenance day and have no viable fix, an emergency maintenance between now and then seems unlikely at this point.

September 7, 2012

RFI-2012, a competitive vendor selection process

Filed under: News — admin @ 2:29 pm

Greetings GT community,

PACE is in the midst of our annual competitive vendor selection process. As outlined on the “Policy” page of our web site, we have issued a set of documents to various state contract vendors. This time around we have Dell, HP, IBM and Penguin Computing. Contained within these documents are general specifications based on the computing demand we are anticipating coming from the faculty over the next year. I’ve included a link to the documents (GT login required) below. Please bear in mind that these specs are not intended to limit configurations you may wish to purchase, but rather to normalize vendor responses and help us choose a vendor for the next year.

The document I’m sure you will be most interested in is a timeline. The overall timeline has not been published to the vendors, and I would appreciate if it was kept confidential. The first milestone, which obviously has been published, is that responses are due to us by 5:00pm today. The next step is for us to evaluate those responses. If any of you are interested in commenting on those responses, please let me know. Your feedback is appreciated.

Please watch this blog, as we will post updates as we move through the process.  We already have a number of people interested in a near-term purchase.  If you are as well, or you know somebody who is, now is the time to get the process started.  Please contact me at your convenience.

 

--
Neil Bright
Chief HPC Architect
neil.bright@oit.gatech.edu
Older Posts »

Powered by WordPress