PACE A Partnership for an Advanced Computing Environment

December 19, 2013

PACE quarterly maintenance – 2 days; January 2014

Filed under: tech support — admin @ 6:07 pm

…back to regularly scheduled events.

Our next maintenance window is fast approaching.  We will continue the 2-day downtimes, with the next one occurring Tuesday, January 14 and Wednesday, January 15.  The list of major changes is small this time around, but impactful.

The largest change, affecting all clusters, is a major update to the Moab & Torque scheduling system that is used to schedule and manage your jobs.  The upgraded versions fix a number of long-standing problems and scaling issues with command-timeouts, stability, and processing large job-sets.

The testflight cluster has been updated, and is available to anyone that wishes to test their submission processes against the new upgraded versions.In many cases, the processes used to submit and query your jobs will remain the same. For some, a change in the way that you use the system — may be required.  You will still be able to accomplish the same things, but may need to use different commands to do it.

We have updated our usage documentation to include a simple transition guide here.

In addition to the guide, we have also written a FAQ, which can be viewed by running the command ‘jan2014-faq‘ after logging in.

Because of the version differences between the old software and the new software, we will unfortunately not be able to preserve any jobs that are still in a queued state once maintenance begins. If you have any queued jobs going into maintenance, then you will need to resubmit them after maintenance.

The fixes planned for January also include the following:

Infrastructure:

  • Operating System upgrades to the server running scheduling software for the “shared” clusters.  This will bring it up to the same level as the other scheduler servers.
  • Adjustments to scalability & performance parameters on our GPFS filesystem.

Optimus cluster:

  • Optimus users will have access to a new queue: ‘optimus-force-6’, as well as access to the iw-shared-6 queue.

Gryphon cluster:

  • The current (temporary) head node and scheduler server will return to their roles as compute nodes for the cluster.
  • New servers will be brought into production for the head node & scheduler servers.
BioCluster cluster:
  • Data migrations between the pb1, pb4 and DDN filesystems.  This should be transparent to users, and ease the space crunch everybody has been experiencing.

Power loss in Rich Datacenter

Filed under: tech support — Semir Sarajlic @ 1:33 pm

UPDATE: All clusters are up and ready for service.

At this time, all PACE-managed clusters are believed to be working.
You should be able to login to your clusters and submit and run jobs.

Any jobs that were running before the power outage have failed, so please resubmit them.

Please let us know immediately if anything is still broken.

PACE Team

What happened

At around 0810 Thursday morning, Rich lost its N6 feed, half of the feed powering the Rich building and the Rich chiller plant. This also caused multiple failures in the high voltage vault in the Rich back alley, so Rich also lost its other feed, N5. However, the N5 feed was still up in the chiller plant. Though the chillers still had power, as a precaution operators transferred cooling over to the campus loop. Rich office space was without power, but the machine rooms failed over to the generator and UPSes.

PACE systems were powered down gracefully to prevent a hard-shutdown that would make recovery more difficult.

Original Post

This morning (December 19), the Rich datacenter suffered a power loss.
We had to perform an emergency shutdown of all nodes.

As we receive new information we will update this blog and the pace-availability email list.

December 18, 2013

COMSOL 4.4 Installed

Filed under: tech support — Semir Sarajlic @ 12:30 pm

COMSOL 4.4 – Student and Research versions

COMSOL Multiphysics version 4.4 contains many new functions and additions to the COMSOL product suite. These Release Notes provide information regarding new functionality in existing products and an overview of new products.
See the COMSOL Release Notes for information on updates to this version of COMSOL.

Using the research version of COMSOL

#Load the research version of comsol 
$ module load comsol/4.4-research
$ comsol ...
#Use the matlab livelink
$ module load matlab/r2013b
$ comsol -mlroot ${MATLAB}

Using the classroom/student version of COMSOL

#Load the classroom/student version of comsol 
$ module load comsol/4.4
$ comsol ...
#Use the matlab livelink
$ module load matlab/r2013b
$ comsol -mlroot ${MATLAB}

Powered by WordPress