GT Home : : Campus Maps : : GT Directory

[Resolved] GPFS outage on Red Hat 7 queues

This entry was posted by on Friday, 30 August, 2019 at

An issue occurred around 3:30 AM on several queues running on the Red Hat 7 operating system, where a number of nodes failed to mount GPFS, our project (data) and scratch storage system. This caused the nodes to be offlined and unavailable for jobs. We repaired the affected nodes at approximately 9:30 AM today, and all queues should be functioning normally. Any jobs that were held should have begun. Please check your overnight jobs for errors.

The following queues were impacted:
atlas-he
ece-gpu
flamel-gpu
gaanam-gpu
gemini-cpu
gemini-gpu
megatron
ml_gpu
sake
skylake-test
starscream
swarm
swarm-gpu

Should you notice the problem recur, or if you have any other concerns, please contact us at pace-support@oit.gatech.edu, and we will be happy to help you. We apologize for the inconvenience this morning.

Comments are closed.