GT Home : : Campus Maps : : GT Directory

PACE clusters ready for research

This entry was posted by on Thursday, 21 April, 2016 at

Our April maintenance window is now complete.  As usual, we have a number of compute nodes that still need to be brought back online, however, we are substantially online and processing jobs at this point.

We did run into an unanticipated maintenance item with the GPFS storage – no data has been lost.  As we’ve added disks to the DDN storage system, we’ve neglected to perform a required rebalancing operation to spread load amongst all the disks.  The rebalancing operation has been running over the majority of our maintenance window, but the task is large and progress has been much slower than expected.  We will continue to perform the rebalancing during off-peak times in order to mitigate the impact on storage performance as best we are able.

Removal of /nv/gpfs-gateway-* mount points

Task complete as described.  The system should no longer generate these paths.  If you have used these paths explicitly, your jobs will likely fail.  Please continue to use paths relative to your home directory for future compatibility.  (e.g. ~/data, ~/scratch, etc.)

New GPFS gateway

Task complete as described

GPFS server and client tuning

Task complete as described

Decommission old Panasas scratch

Task complete as described.  Paths starting with /panfs no longer work.  Everybody should have been transitioned to the new scratch long ago, so we do not expect anybody to have issues here.

Enabling debug mode

Task complete as described.  You may see additional warning messages if your code not well behaved with regards to memory utilization.  This is a hint that you may have a bug.

Removal of compatibility links for migrated storage 

Task complete as described.  Affected users (Prometheus and CEE clusters)  were contacted before maintenance day.  No user impact is expected, but please send in a ticket if you think there is problem.

Scheduler updates

Task complete as described

Networking Improvements

Task complete as described

Diskless node transition

Task complete as described

Security updates

Task complete as described

Comments are closed.