As some of you are already aware, the dirty cow exploit was a source of great concern for PACE. This exploit can allow a local user to gain elevated privileges. For more details, please see “https://access.redhat.com/blogs/766093/posts/2757141”.
In response, PACE has applied a mitigation on all of the nodes. While this mitigation is effective in protecting the systems, it has a downside of causing debugging tools (e.g. strace, gdb and DDT) to stop working. Unfortunately, none of the new (and patched) kernel versions made available by Red Hat supports our Infiniband network drivers (OFED), so we had to leave the mitigation running for a while. This caused inconvenience, particularly for users who are actively developing codes and relying on these debuggers.
As a long term solution, we patched the source code of the kernel and recompiled it, without changing anything else. Our initial tests were successful, so we deployed it on three of the four online nodes in the testflight queue:
rich133-k43-34-l recompiled kernel
rich133-k43-34-r recompiled kernel
rich133-k43-35-l original kernel
rich133-k43-35-r recompiled kernel
We would like to ask you to please test your codes on this queue. Our plan is to deploy this recompiled kernel to all of the PACE nodes, including headnodes and compute nodes. We would like to make sure that your codes will continue to run after this deployment without any difference.
The deployment will be a rolling update, that is, we will opportunistically patch nodes starting from the idle nodes. So, there will be a mix of nodes with old and recompiled kernels in the same queues until the deployment is complete. For this reason, we strongly recommend testing multi-node parallel applications that will include the node with the original kernel (rich133-k43-35-l) in the hostlist to test the behavior of your code with mixed hostlists.
As always, please keep your testflight runs short to allow other users to test their own codes. Please report any problems to pace-support@oit.gatech.edu and we will be happy to help. Hopefully, this deployment will be completely transparent to most users, if not all.