PACE A Partnership for an Advanced Computing Environment

December 16, 2024

[Resolved] Phoenix login nodes outage on Dec 5, 2024

Filed under: Uncategorized — Grigori Yourganov @ 2:26 pm

On the morning of December 5, 2024, the RHEL9 login nodes of the Phoenix cluster became unresponsive. The problems started at 4:37 AM, when one login node (out of two) had a memory problem; at 6:27 AM, it crashed. The other login node crashed at 9:37 AM, rendering the RHEL9 environment on Phoenix inaccessible. Both login nodes were restarted at 11:30 AM, which resolved the issue. The jobs that crashed between 4:37 and 11:30 AM have been refunded.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress