PACE A Partnership for an Advanced Computing Environment

July 23, 2015

Changes in qstat format

Filed under: tech support — Semir Sarajlic @ 6:32 pm

Before the July maintenance, “qstat” command did now allow querying jobs belonging to others. The only way to list cluster/scheduler wide information was the “showq” command. However we received (and confirmed) multiple reports that showq may get out of sync from time to time.

For this reason, we configured qstat to display all of the jobs managed by the scheduler (regardless of users or queues).

You will notice two differences:

(1) qstat, when run without any parameters,  lists all of the jobs in
the schedulers (not just yours).
(2) You can still filter the results to show your jobs only using “qstat
-u <username>”, but the output format will be slightly different.

If you have scripts that parse the qstat output, please modify and test them to make sure they are working as intended.

PACE clusters ready for research

Filed under: tech support — admin @ 2:15 am

Greetings,

Our quarterly maintenance is now complete. We have no known outstanding issues affecting general operations, but have a few straggling nodes that we will address over the next couple of days.

GPFS client

All compute, login and interactive nodes have been updated to version 3.5.0-25 of the GPFS client per recommendation of DDN. This update addresses the bugs identified in the -20 version that caused problems during our April maintenance. No user changes should be needed.

Software Repository

The “newrepo” software repository has been made the default. Please note that there are a significant number of changes in available versions of software relative to the old repository. Jobs that reference versions that are no longer available will have difficulty running. If you have been running by doing a ‘module load newrepo’ before our maintenance activities, you should not experience any difference.

Reset Infiniband fabric

We’ve reset our infiniband fabric and it appears to be in good health.

New home directory and /usr/local storage

The storage devices for this project finally arrived earlier today. This item will be deferred until a future maintenance period.

New “data mover” servers

We weren’t quite ready to complete this bonus objective, so we’ll try and find a period of inactivity to do so between now and our next maintenance period. Whenever this happens, no user changes will be needed.

July 21, 2015

UNDERWAY: PACE quarterly maintenance – July ’15

Filed under: tech support — admin @ 10:11 am

 

 

Our maintenance activities are now underway.  All PACE clusters are down.  Please watch this space for updates.

 

For details on work to be completed, please see our previous posts, here and here.

July 15, 2015

REMINDER & UPDATE: PACE quarterly maintenance – July ’15

Filed under: tech support — admin @ 6:50 pm

First, I’d like to remind folks of our quarterly maintenance activities NEXT WEEK starting at 6:00am Tuesday morning.

Second, we have a little more information regarding some of our high-level tasks. The storage we plan to use for home directories and /usr/local isn’t due to be delivered until Friday of this week. As such, we’ll not have time to get it installed and tested in time. We’ll defer this until a future maintenance period.

Our new data mover servers have been delivered, and we are beginning some tests. We’ll consider these a bonus objective at this point, pending the outcome of testing.

Powered by WordPress