PACE A Partnership for an Advanced Computing Environment

August 31, 2011

Joe Cluster storage issues

Filed under: tech support — Tags: — Semir Sarajlic @ 4:22 pm

Hey folks,

It looks like the project server for Joe started having issues with its hardware around 3:40pm on August 30. The particular unit of hardware affected the ability of this server to effectively store/retrieve data from the storage array. As such, it is possible that there has been some data loss, as well as some issues with jobs.

Please check the status of your jobs if they have been running on the Joe cluster between 3:39pm on August 30 to 12:00pm on August 31.

The project server has been brought online, and should be functioning normally. We will be keeping an eye out to make sure that if the system sees this error again that we will know about it immediately and can address it. We are also checking additional equipment on hand for similar issues.

August 26, 2011

Fileserver issues this morning (8/26)

Filed under: tech support — Semir Sarajlic @ 2:06 pm

This morning at just short of 1am one of the primary fileservers that hosts home directories suffered a hard drive failure that locked the machine into an unresponsive state.

The machine was rebooted just after 9am and appears to be fine. We’re running diagnostics and watching its behavior closely to see if this was a one-off event or whether we have a more serious issue.

No compute jobs were lost. Data should not have been lost as this had no effect on project spaces, and, if data were were being written to the affected areas, they should have been queued until the server’s return.

Affected home directories:

  • hp9 Uranus
  • hp11 Cygnus
  • hp13 Joe
  • hp17 Critcel
  • hp19 Apurimac

Thanks for your patience. If there are further issues, we will let you know.

August 18, 2011

new GT courses this fall

Filed under: Events — admin @ 2:21 pm

I would like to call your attention to two new courses that are being offered this Fall that may be of interest to your students:

CS 8803-EA: Towards Exascale Analytics (instructor: Joel Saltz): Covers topics in HPC and data analysis (see below for a list of topics)

CSE 8803-CPS: Computational Problem Solving (Instructors: Edmond Chow and Richard Fujimoto): an introductory course intended for math, science and engineering students to develop computing knowledge and skills; includes an introduction to parallel programming.

Details available at: http://www.cc.gatech.edu/~echow/cs4803.html

 

Richard Fujimoto

 

 

 

List of Topics to be covered in Fall Semester CS 8803: Towards Exascale Analytics

CLUSTERING, DATA AND GRAPH MINING: Large scale clustering, data mining and graph algorithms.  Scalable parallel graph algorithms, high end techniques to support dimensionality reduction and summarization of high dimensional data, massive scale clustering and data mining, collaborative clustering methods, performance modeling for distributed data mining applications, system software to support graph mining, data mining and clustering, scalable distributed reasoning.

DATA SYSTEMS SOFTWARE: Active semantic caching, filter stream middleware, in-transit data processing,  data staging services,  adaptable IO system, collaborative threads, active storage, storage management for complex array processing, SciDB – array oriented science oriented database management system.

MAPREDUCE, DATABASES AND FINE GRAINED PARALLELISM: MapReduce, Hadoop,  the Google File System,  HDFS, Big Table, HIVE, Llama, Sawzall, PIG,  Twister, MapReduce for Multi-core and multiprocessor systems, MapReduce and Parallel Database management systems,  HadoopDB,  Hadoop-GIS

OPTIMIZATION OF HIGH END FILE SYSTEM PERFORMANCE: Object based storage, overview of Panasas parallel file system, checkpointing, scalable directories for shared file systems, collective I/O and parallel file systems, active storage strategies for parallel file systems,  relationship between data intensive scalable computing systems (e.g. Google file system and HDFS) and cluster file systems (e.g. Lustre, Panasas, GPFS).

STREAMS, ONLINE AGGREGATION AND CONTINUOUS QUERY SUPPORT: High performance stream processing, System S, DataCutter,  applications of stream processing to sensor data analysis,  workflow and quality of service,  performance insightful query languages,  MapReduce and stream processing,  streaming query languages,  stateful key-value storae with performance service level objectives.

SPATIAL ANALYTICS — SYSTEMS SOFTWARE AND MACHINE LEARNING: Spatial object association algorithms, crossmatch,  parallel database for multi-dimensional data,  spatial datamining,  content based image retrieval, high level image representation for scene classification.

TEMPORAL ANALYSES AND TIME SERIES – SYSTEMS SOFTWARE AND MACHINE LEARNING:  Time series mining, finding semantics in time series, multiple resolution time series, specifying and identifying temporal sequences, temporal RFID processing.

DRIVING APPLICATIONS – IMAGE ANALYSIS, GENE SEQUENCING, CLINICAL DATA ANALYTICS AND COMPUTATIONAL ASTRONOMY:  Examples of challenging problems (primarily drawn from the biomedical domain)

August 11, 2011

IDH workshop on many-core processors

Filed under: Events — admin @ 1:59 pm

IDH is sponsoring Georgia Tech’s participation in an HPC training course called “Proven Algorithmic Techniques for Many-Core Processors”. The course will take place August 15-19, 2011. Georgia Tech and ten other sites will participate, with lectures being delivered remotely, primarily from the University of Illinois. This is a great follow up to the IDH one-day short courses we offered this past summer that many of you and your students attended. Additional information and registration instructions are available at:

http://www.vscse.org/summerschool/2011/manycore.html

The course will be offered on-site in Room 129 of the Global Learning Center. There will be hands on exercises and local support (teaching assistants) to administer the course. Please note there are a limited number of spaces available, so do not delay signing up!

Please pass this information on to any interested parties.

 

Thanks

Richard Fujimoto

August 5, 2011

Cygnus head node failure and recovery

Filed under: tech support — Semir Sarajlic @ 8:30 pm

Hey folks,

Unfortunately the Cygnus head node had a failure at about 3:50pm. Part of this was related to the disk failure a couple of weeks ago on the virtual machine storage complex and part of it was the virtual machine’s limited resources.

We had a secondary ready to go in case this happened, and that secondary is now operating as the primary. This machine has double the allocations of the original to help eliminate some of the resource related slowdowns.

No running or scheduled jobs were affected, and no data was lost in the failure.

August 1, 2011

FoRCE headnode emergency restart

Filed under: tech support — pm35 @ 3:52 pm

As most of you had noticed over the past week, the FoRCE headnode has had difficulty keeping up with the high utilization demand, and often times was unresponsive. We have reconfigured this node to increase its memory and CPU power to address this issue. The configuration and reboot required a short offline period, but none of the submitted jobs on the compute nodes were affected.

We would like once again remind our users that FoRCE headnode, or other headnodes in that matter, are not intended for running computations. Head nodes are best for editing, compilations, submitting jobs, but not for actual computations. Please limit your use of GUI sessions, such as browsers, Comsol, Matlab, etc., which put a lot of pressure on shared system resources. We will continue to work to improve the responsiveness and function of the FoRCE headnode.

Thanks for your understanding!

-PACE support

Powered by WordPress