On the morning of December 5, 2024, the RHEL9 login nodes of the Phoenix cluster became unresponsive. The problems started at 4:37 AM, when one login node (out of two) had a memory problem; at 6:27 AM, it crashed. The other login node crashed at 9:37 AM, rendering the RHEL9 environment on Phoenix inaccessible. Both login nodes were restarted at 11:30 AM, which resolved the issue. The jobs that crashed between 4:37 and 11:30 AM have been refunded.
December 16, 2024
[Resolved] Phoenix login nodes outage on Dec 5, 2024
Comments Off on [Resolved] Phoenix login nodes outage on Dec 5, 2024
No Comments
No comments yet.
RSS feed for comments on this post.
Sorry, the comment form is closed at this time.