PACE A Partnership for an Advanced Computing Environment

December 21, 2022

Storage-eas read-only during configuration change

Filed under: Uncategorized — Michael Weiner @ 11:08 am

[Update 1/9/23 10:58 AM]

The migration of storage-eas data to a new location is complete, and full read/write capability is available for all research groups on the device. Researchers may resume regular use of storage-eas, including writing new data to it.

Thank you for your patience as we completed these configuration changes to improve stability of storage-eas. Please email us at pace-support@oit.gatech.edu with any questions.

 

[Original Post 12/21/22 11:08 AM]

Summary: Researchers have reported multiple outages of the storage-eas server recently. To stabilize the storage, PACE will make configuration changes. The storage-eas server will become read-only at 3 PM today and will remain read-only until after the Winter Break, while the changes are being implemented. We will provide an update when write access is restored.

Details: PACE will remove the deduplication setting on storage-eas, which is causing performance and stability issues. Beginning this afternoon, the system will become read-only while all data is copied to a new location. After the copy is complete, we will enable access to the storage in the new location, with full read/write capabilities.

Impact: Researchers will not be able to write to storage-eas for up to two weeks. You may continue reading files from it on both PACE and external systems where it is mounted. While this move is in progress, PACE recommends that researchers copy any files that need to be used in Phoenix jobs into their scratch directories, then work from there to write during a job. Scratch provides each researcher with 15 TB of temporary storage on the Lustre parallel filesystem. Files in scratch can be copied to non-PACE storage via Globus.

Thank you for your patience as we complete these configuration changes to improve stability of storage-eas. Please email us at pace-support@oit.gatech.edu with any questions.

December 8, 2022

Phoenix Project & Scratch Storage Cables Replacement for Redundant Controller

Filed under: Uncategorized — Jeff Valdez @ 5:52 pm
[Update 2022/12/08, 5:52PM EST]
Work was been completed on the cable replacement on the redundant storage controller and associated systems connecting to the storage were restored back to normal. We were able to replace 2 cables on the controller without interruption to service.

[Update 2022/12/05, 9:00AM EST]
Summary: Phoenix project & scratch storage cable replacement for redundant controller and potential outage and subsequent temporary decreased performance

Details: A cable connecting enclosures of the Phoenix Lustre device, hosting project and scratch storage, to the redundant controller needs to be replaced, beginning around 10AM Wednesday, December 8th, 2022. The expected time to finish the work for cable replacement will take about 3-4 hours. After the replacement, pools will need to be rebuilt over the course of about a day.

Impact: Because we are replacing a cable on the redundant controller while maintaining the main controller, there should not be an outage during the cable replacement. However, a similar replacement has previously caused storage to become unavailable, so an outage is possible. If this happens, your job may fail or run without making progress. If you have such a job, please cancel it and resubmit it once storage availability is restored. In addition, performance may be slower than usual for a day following the repair as pools rebuild. Jobs may progress more slowly than normal. If your job runs out of wall time and is cancelled by the scheduler, please resubmit it to run again. PACE will monitor Phoenix Lustre storage throughout this procedure. If a loss of availability occurs, we will update you.

Please accept our sincere apology for any inconvenience that this temporary limitation may cause you. If you have any questions or concerns, please direct them to pace-support@oit.gatech.edu.

December 2, 2022

Slow Storage on Phoenix

Filed under: Uncategorized — Michael Weiner @ 1:11 pm

[Update 12/5/22 10:45 AM]

Performance on Phoenix project & scratch storage has returned to normal. PACE continues to investigate the root cause of last week’s slowness, and we would like to thank those researchers we have contacted with questions about your workflows. Please contact us at pace-support@oit.gatech.edu with any questions.

[Original Post 12/2/22 1:11 PM]

Summary: Researchers may experience slow performance on Phoenix project & scratch storage.

Details: Over the past three days, Phoenix has experienced intermittent slowness on the Lustre filesystem hosting project & scratch storage due to heavy utilization. PACE is investigating the source of the heavy load on the storage system.

Impact: Any jobs or commands that read or write on project or scratch storage may run more slowly than normal.

Thank you for your patience as we continue to investigate. Please contact us at pace-support@oit.gatech.edu with any questions.

 

Powered by WordPress