PACE A Partnership for an Advanced Computing Environment

August 6, 2010

Inchworm federation issues

Filed under: tech support — admin @ 9:50 pm

As with most any new cluster of significant size, there are always some issues that crop up once the system starts experiencing some heavy load. Please know that we are committed towards isolating and resolving them as quickly as possible, and that I appreciate your continued patience with us as we get things resolved.

Last night, we implemented a potential fix to the server running the job scheduler that hopefully will resolve some of the “missing home directory” issues as well as problems either starting new jobs or the premature termination of running jobs. We are optimistic that this fix will provide a significant amount of relief to these problems.

We are still working an issue with the Infiniband fabric that causes significant decrease in performance to the scratch storage. We do not believe this causes problems with normal MPI traffic over the Infiniband.

If you continue to experience problems, please do let us know via the usual support methods.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress