PACE A Partnership for an Advanced Computing Environment

September 29, 2012

Joe Cluster Status

Filed under: tech support — Tags: — Semir Sarajlic @ 8:08 pm

Around 8, 8:30pm on September 28, 2012, a power event took down the TSRB data center, knocking a significant fraction of the Joe cluster offline.

With assistance from Operations, we are now bringing these nodes online after determining that several of the management switches for these nodes did not recover from the event gracefully. As these switches control our ability to manage the nodes, we had to wait until the switches were available to bring nodes online, now at about 4pm on September 29, 2012.

Jobs that were running on these nodes (iw-a2-* and iw-a3-*) at the time of the outage may have terminated abnormally. Jobs scheduled but not running should be fine.

UPDATE @ 4:40pm, 2012-09-29: All nodes are online.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress