Summary: A configuration issue with the Firebird scheduler caused failures to Firebird jobs over the weekend and this morning as storage was not accessible on compute nodes. The issue was resolved by 2:00 PM today.
Details: Changes to the Firebird scheduler configuration were made during last week’s maintenance period (May 7-9) in order to facilitate future updates to Firebird. A repair was made on Friday, after which jobs were running successfully. Over the weekend, a different issue occurred, and jobs were launched on compute nodes without the proper storage being mounted. We have fully reverted the Firebird configuration changes to their state prior to the maintenance period, and jobs should no longer face any errors.
Impact: Some jobs launched on Firebird over the last three days may have failed due to missing home and project storage on the compute nodes with messages like “no such file or directory” or an absent output file. Jobs attempted mid-day on Monday, May 13, may have been queued for an extended period while repairs were made to the scheduler configuration.
Thank you for your patience as we resolved this issue. Please contact us at pace-support@oit.gatech.edu with questions or if you continue to experience errors.