GT Home : : Campus Maps : : GT Directory

Inchworm federation issues

This entry was posted by on Friday, 6 August, 2010 at

As with most any new cluster of significant size, there are always some issues that crop up once the system starts experiencing some heavy load. Please know that we are committed towards isolating and resolving them as quickly as possible, and that I appreciate your continued patience with us as we get things resolved.

Last night, we implemented a potential fix to the server running the job scheduler that hopefully will resolve some of the “missing home directory” issues as well as problems either starting new jobs or the premature termination of running jobs. We are optimistic that this fix will provide a significant amount of relief to these problems.

We are still working an issue with the Infiniband fabric that causes significant decrease in performance to the scratch storage. We do not believe this causes problems with normal MPI traffic over the Infiniband.

If you continue to experience problems, please do let us know via the usual support methods.

Comments are closed.