Greetings!
The PACE team is once again preparing for maintenance activities that will occur starting at 6:00am Tuesday, April 19 and continuing through Wednesday, April 20. We are planning several improvements that hopefully will provide a much better PACE experience.
GPFS storage improvements
Removal of all /nv/gpfs-gateway-* mount points (user action recommended): In the past, we had noticed performance and reliability problems with mounting GPFS natively on machines with slow network connections (including most headnodes, some compute nodes, and some system servers). To address this problem, we deployed a physical ‘gateway’ machine that mounts GPFS natively and serves its content via NFS to machines with slow network (see https://blog.pace.gatech.edu/?p=5842).
We have been mounting this gateway on *all* of the machines using these locations:
/nv/gpfs-gateway-pace1
/nv/gpfs-gateway-scratch1
/nv/gpfs-gateway-menon1
Unfortunately, these mount points caused some problems in the longer run, especially when a system variable (PBS_O_WORKDIR) being assigned these locations as the “working directory” for jobs even on machines with fast network connections. As a result, a large fraction of the data operations went through the gateway server, instead of the GPFS server, causing significant slowness.
We partially addressed this problem by fixing the root cause for unintended PBS_O_WORKDIR assignment, and also with user communication/education.
On this maintenance day, we are getting rid of these mount points completely. Instead, GPFS will always be mounted on:
/gpfs/pace1
/gpfs/scratch1
/gpfs/menon1
Regardless of how that particular node is mounting GPFS (natively or via the gateways).
User action: We would like to ask all of our users to please check your scripts to ensure that old locations are not being used. Jobs that try to use these locations will fail after the maintenance day (including those that have already been submitted).
A new GPFS gateway (no user action required): We increasingly rely on GPFS filesystem for multiple storage needs, including the scratch, majority of project directories, and some home directories. While the gateway provided some benefits, some users continued to report unresponsive/slow commands on headnodes due to a combination of high levels of activity and limited NFS performance.
On this maintenance, we are planning to deploy a second gateway server to separate headnodes from other functions (compute nodes and backup processes). This will improve the responsiveness of headnodes, providing our users with a better interactivity on headnodes. In other words, you will see much less slowness when running system commands, such as “ls”.
GPFS server and client tuning (no user action required): We identified several configuration tuning parameters to improve the performance and reliability of GPFS in light of vendor recommendations and our own analysis. We are planning to apply these configuration changes on this maintenance day as a fine tuning step.
Decommissioning old Panasas scratch (no user action required)
When we made the switch to the new scratch space (GPFS) on the January maintenance, we kept the old (Panasas) system accessible as read-only. Some users received a link to their old data if their migration had not completed within the maintenance window. We are finally ready to pull the plug on this Panasas system. You should have no dependencies on this system anymore, but please contact the PACE support as soon as possible if you have any concerns or questions regarding decommissioning of this system.
Enabling debug mode (limited user visibility)
RHEL6, which has been used on all PACE systems for a long while, optionally comes with a implementation of the memory-allocation functions to perform additional heap error/consistency checks at runtime. We’ve had this functionality installed, but memory errors have been silently ignored per our configuration, which is not ideal. We are planning to change the configuration to print diagnostics on the stderr when an error is detected. Please note, you should not see any differences in the way your codes are running, this only changes how memory errors are reported. This behavior is controlled by the MALLOC_CHECK_ environment variable. A simple example is when a dynamically allocated array is freed twice (e.g. using the ‘free’ statement in C). Here’s a demo for different behaviors for three different values of MALLOC_CHECK_ when an array is freed twice:
MALLOC_CHECK_=0
(no output)
MALLOC_CHECK_=1
*** glibc detected *** ./malloc_check: free(): invalid pointer: 0x0000000000601010 ***
MALLOC_CHECK_=2
Aborted (core dumped)
We currently have this value set to “0” and will make “1” the default to dump some description of the error(s). If this change is causing any problems for you, or you simply don’t want any changes in your environment, then you can simply assign “0” to this value in your “~/.bashrc” to overwrite the new default.
Removal of compatibility links for migrated storage (some user action may be required)
We had migrated some of the NFS project storages (namely pcee1 and pme[1-8]) to GPFS in the past. When we did that, we placed links in the older storage (that starts with /nv/…) that points to the new gpfs location (starts with /gpfs/pace1/project/…) to protect active jobs from crashing. This was only temporary to facilitate the transition.
As a part of this maintenance day, we are planning to remove these links completely. We already contacted all of the users whose project are on these locations and confirmed that their ~/data links are updated accordingly, so we expect no user impact. That said, if you are one of these users, please make sure that none of your scripts reference to the old locations mentioned in our email.
Scheduler updates (no user action required)
We have a patched version of the resource manager (Torque) that had been deployed on the scheduler servers shortly after the January maintenance day. This patch addresses a bug in the administration functions only. While it’s not critical for compute nodes, we will go ahead and update all compute nodes to bring their version at par with the scheduler for consistency. This update will not cause any visible differences for the users. No user action required.
Networking Improvements (no user action required)
Spring is here and it’s time for some cleanup. We will get rid of unused cables in the datacenter and remove some unused switches from the racks. We are also planning some recabling to take better advantage of existing switches to improve redundancy. We will continue to test and enable jumbo frames (where possible) to lower networking overhead. None of these tasks require user actions.
Diskless node transition (no user action required)
We will continue the transition away from diskless nodes that we started in October 2015. This mainly affects nodes in the 5-6 years old range. Apart from more predictable performance of these nodes, this should also be a transparent change.
Security updates (no user action required)
We are also planing to update some system packages and libraries to address known security vulnerabilities and bugs. There should be no user impact.