[Update – October 19 – 5:30pm]
We are following up to inform you that our maintenance for TestFlight-Coda and Phoenix clusters has completed. At this time, all Rich and Coda datacenter clusters are ready for research. We appreciate everyone’s patience as we worked through this partially extended maintenance day to address our activities in Coda datacenter.
At this time, we are updating you on the status of tasks:
ITEMS REQUIRING USER ACTION:
- [COMPLETE] [some user action may be needed] Rename primary groups for future p- and d-
ITEMS NOT REQUIRING USER ACTION:
- [COMPLETE] [Compute] Applying a tuned profile to the Hive compute nodes
- [COMPLETE] [Compute] Update Nvidia GPU drivers on coda to Support CUDA 11 SDK
- [COMPLETE] [Network] Rebooting the Hive IB switch (atl1-1-01-014-3-cs7500)
- [COMPLETE] [Network] Rebooting PACE IB switch (09-010-3-cs7520)
- [COMPLETE] [Network] Update Phoenix subnet managers to RHEL7.8
- [COMPLETE] [Storage] Replace DDN 7700 storage controller 1
- [COMPLETE] [Storage] Replace DDN SFA18KE storage enclosure 8
- [COMPLETE] [Data Management] Update globus-connect-server on globus-hive from version 4 to version 5.4.
- [COMPLETE] [Coda Datacenter] Databank: Hi-Temp Cooling Tower reboot
- [COMPLETE] [Emergency readiness test] Test emergency power down scripts for CODA and Hive compute nodes
- [COMPLETE] [Storage] Lustre Client Patches
- [COMPLETE] [Storage] Lustre filesystem controller to be replaced
- [COMPLETE – 10/19/2020] We conducted, further testing of Lustre storage in coordination with our vendor.
- [COMPLETE] [Coda Datacenter] Top500 test run across Coda datacenter resources (excludes Hive, COC-ICE and PACE-ICE clusters).
ITEMS REQUIRING USER ACTION:
As previously mentioned, with regards to renaming the primary groups task that may require some user action, we will be adjusting the names of most users’ Linux primary groups to reflect a new standardized format as part of our preparation for the migration to Coda that’s starting in October. Most users will see the name of their primary group change from school-pisurname (e.g., chem-burdell) to p-piusername (e.g., p-gburdell3) or d-school (e.g., d-chem). This change will be reflected across all PACE systems, including Hive and CUI. The “gid” (group id number) is not changing, so this will not affect any file permissions you have set. Most users will not need to take action. However, if you manually change file permissions using the group name or use group names in your scripts, you may need to make an adjustment. You can always run the “id” command on yourself (“id gburdell3”) to see all of your groups. Not all primary groups will change name, so do not be concerned if yours is left unchanged.
If you have any questions, please don’t hesitate to contact us at pace-support@oit.gatech.edu
Best,
PACE Team
[UPDATE – October 16 – 6:31pm]
We are following up with an update on the PACE maintenance period. As mentioned yesterday, our maintenance for Rich datacenter has completed 1-day ahead of the schedule, and we are partially complete with CODA datacenter. All clusters in Rich datacenter are ready for research. Hive, COC-ICE and PACE-ICE clusters in Coda datacenter are ready for research and instructional learning. We have released users jobs on Hive, COC-ICE, PACE-ICE clusters, and the Rich datacenter clusters. The Phoenix cluster in CODA will remain under maintenance through Monday, October 19, as scheduled. Also, we need to extend the maintenance for the Testflight-Coda cluster through Monday, October 19, to address remaining pending task.
At this time, we are updating you on the status of tasks:
ITEMS REQUIRING USER ACTION:
- [COMPLETE] [some user action may be needed] Rename primary groups for future p- and d-
ITEMS NOT REQUIRING USER ACTION:
- [COMPLETE] [Compute] Applying a tuned profile to the Hive compute nodes
- [COMPLETE] [Compute] Update Nvidia GPU drivers on coda to Support CUDA 11 SDK
- [COMPLETE] [Network] Rebooting the Hive IB switch (atl1-1-01-014-3-cs7500)
- [COMPLETE] [Network] Rebooting PACE IB switch (09-010-3-cs7520)
- [COMPLETE] [Network] Update Phoenix subnet managers to RHEL7.8
- [COMPLETE] [Storage] Replace DDN 7700 storage controller 1
- [COMPLETE] [Storage] Replace DDN SFA18KE storage enclosure 8
- [COMPLETE] [Data Management] Update globus-connect-server on globus-hive from version 4 to version 5.4.
- [COMPLETE] [Coda Datacenter] Databank: Hi-Temp Cooling Tower reboot
- [COMPLETE] [Emergency readiness test] Test emergency power down scripts for CODA and Hive compute nodes
- [COMPLETE] [Storage] Lustre Client Patches
- [COMPLETE] [Storage] Lustre filesystem controller to be replaced
- [PENDING] [Coda Datacenter] Top500 test run across Coda datacenter resources (excludes Hive, COC-ICE and PACE-ICE clusters).
ITEMS REQUIRING USER ACTION:
As previously mentioned, with regards to renaming the primary groups task that may require some user action, we will be adjusting the names of most users’ Linux primary groups to reflect a new standardized format as part of our preparation for the migration to Coda that’s starting in October. Most users will see the name of their primary group change from school-pisurname (e.g., chem-burdell) to p-piusername (e.g., p-gburdell3) or d-school (e.g., d-chem). This change will be reflected across all PACE systems, including Hive and CUI. The “gid” (group id number) is not changing, so this will not affect any file permissions you have set. Most users will not need to take action. However, if you manually change file permissions using the group name or use group names in your scripts, you may need to make an adjustment. You can always run the “id” command on yourself (“id gburdell3”) to see all of your groups. Not all primary groups will change name, so do not be concerned if yours is left unchanged.
We will follow up with further updates.
If you have any questions, please don’t hesitate to contact us at pace-support@oit.gatech.edu
[UPDATE – October 15, 2020, 8:44pm]
Our maintenance period has completed for Rich datacenter 1-day ahead of the schedule, and we are partially complete for CODA datacenter. All clusters in Rich datacenter are ready for research. Only Hive cluster in Coda datacenter is ready for research. We have released users jobs on Hive cluster, and the Rich datacenter clusters.
The remaining clusters in CODA datacenter that include, Phoenix, Testflight-Coda, CoC-ICE, and PACE-ICE will remain under maintenance for the remainder of the maintenance period as we address the remaining tasks from our maintenance period.
At this time, we are updating you on the status tasks:
ITEMS REQUIRING USER ACTION:
- [COMPLETE] [some user action may be needed] Rename primary groups for future p- and d-
ITEMS NOT REQUIRING USER ACTION:
- [COMPLETE] [Compute] Applying a tuned profile to the Hive compute nodes
- [COMPLETE] [Compute] Update Nvidia GPU drivers on coda to Support CUDA 11 SDK
- [COMPLETE] [Network] Rebooting the Hive IB switch (atl1-1-01-014-3-cs7500)
- [COMPLETE] [Network] Rebooting PACE IB switch (09-010-3-cs7520)
- [COMPLETE] [Network] Update Phoenix subnet managers to RHEL7.8
- [COMPLETE] [Storage] Replace DDN 7700 storage controller 1
- [COMPLETE] [Storage] Replace DDN SFA18KE storage enclosure 8
- [COMPLETE] [Data Management] Update globus-connect-server on globus-hive from version 4 to version 5.4.
- [COMPLETE] [Coda Datacenter] Databank: Hi-Temp Cooling Tower reboot
- [COMPLETE] [Emergency readiness test] Test emergency power down scripts for CODA and Hive compute nodes
- [PENDING] [Coda Datacenter] Top500 test run across Coda datacenter resources (excludes Hive cluster).
- [PENDING] [Storage] Lustre Client Patches
- [PENDING] [Storage] Lustre filesystem controller to be replaced
ITEMS REQUIRING USER ACTION:
As previously mentioned, with regards to renaming the primary groups task that may require some user action, we will be adjusting the names of most users’ Linux primary groups to reflect a new standardized format as part of our preparation for the migration to Coda that’s starting in October. Most users will see the name of their primary group change from school-pisurname (e.g., chem-burdell) to p-piusername (e.g., p-gburdell3) or d-school (e.g., d-chem). This change will be reflected across all PACE systems, including Hive and CUI. The “gid” (group id number) is not changing, so this will not affect any file permissions you have set. Most users will not need to take action. However, if you manually change file permissions using the group name or use group names in your scripts, you may need to make an adjustment. You can always run the “id” command on yourself (“id gburdell3”) to see all of your groups. Not all primary groups will change name, so do not be concerned if yours is left unchanged.
We will follow up tomorrow regarding the remaining CODA datacenter tasks impacting Phoenix, CoC-ICE, PACE-ICE, and Testflight-CODA.
If you have any questions, please don’t hesitate to contact us at pace-support@oit.gatech.edu
[Update – October 12, 1:07PM]
We are following up with a reminder that our scheduled maintenance period begins at 6:00AM on October 14th, 2020 and concludes at 11:59PM on October 16th, 2020. Please note our blog post: https://blog.pace.gatech.edu/?p=6905contains an updated list of tasks for this upcoming maintenance period, and for your reference the updated list is provided below:
ITEMS REQUIRING USER ACTION:
- [some user action may be needed] Rename primary groups for future p- and d-
ITEMS NOT REQUIRING USER ACTION:
- [Compute] Applying a tuned profile to the Hive compute nodes
- [Compute] Update Nvidia GPU drivers on coda to Support Cuda 11 SDK
- [Network] Rebooting the Hive IB switch (atl1-1-01-014-3-cs7500)
- [Network] Rebooting PACE IB switch (09-010-3-cs7520)
- [Network] Update Phoenix subnet managers to RHEL7.8
- [Storage] Replace DDN 7700 storage controller 1
- [Storage] Replace DDN SFA18KE storage enclosure 8
- [Data Management] Update globus-connect-server on globus-hive from version 4 to version 5.4.
- [Coda Datacenter] Databank: Hi-Temp Cooling Tower reboot
- [Coda Datacenter] Top500 test run across Coda datacenter resources (excludes Hive cluster).
- [Emergency readiness test] Test emergency power down scripts for CODA and Hive compute nodes
As previously mentioned, with regards to renaming the primary groups task that may require some user action, we will be adjusting the names of most users’ Linux primary groups to reflect a new standardized format as part of our preparation for the migration to Coda that’s starting in October. Most users will see the name of their primary group change from school-pisurname (e.g., chem-burdell) to p-piusername (e.g., p-gburdell3) or d-school (e.g., d-chem). This change will be reflected across all PACE systems, including Hive and CUI. The “gid” (group id number) is not changing, so this will not affect any file permissions you have set. Most users will not need to take action. However, if you manually change file permissions using the group name or use group names in your scripts, you may need to make an adjustment. You can always run the “id” command on yourself (“id gburdell3”) to see all of your groups. Not all primary groups will change name, so do not be concerned if yours is left unchanged.
If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.
[Original – September 30, 4:42PM]
We are preparing for our next PACE maintenance period, which will begin at 6:00 AM on October 14th, 2020 and conclude at 11:59 PM on October 16th, 2020. As usual, jobs with long walltimes will be held by the scheduler to ensure that no active jobs will be running when systems are powered off. These jobs will be released as soon as the maintenance activities are complete. Please note, during the maintenance period, users will not have access to Rich and Coda datacenter resources.
We are still finalizing planned activities for the maintenance period. Here is a current list:
ITEMS REQUIRING USER ACTION:
- [some user action may be needed] Rename primary groups for future p- and d-
ITEMS NOT REQUIRING USER ACTION:
- [Compute] Applying a tuned profile to the Hive compute nodes
- [Network] Rebooting the Hive IB switch (atl1-1-01-014-3-cs7500)
- [Network] Rebooting PACE IB switch (09-010-3-cs7520)
- [Storage] Replace DDN 7700 storage controller 1
- [Storage] Replace DDN SFA18KE storage enclosure 8
- [Coda Datacenter] Databank: Hi-Temp Cooling Tower reboot
- [Emergency readiness test] Test emergency power down scripts for CODA and Hive compute nodes
Regarding the renaming of primary groups task that may require some user action, we will be adjusting the names of most users’ Linux primary groups to reflect a new standardized format as part of our preparation for the migration to Coda that’s starting in October. Most users will see the name of their primary group change from school-pisurname (e.g., chem-burdell) to p-piusername (e.g., p-gburdell3) or d-school (e.g., d-chem). This change will be reflected across all PACE systems, including Hive and CUI. The “gid” (group id number) is not changing, so this will not affect any file permissions you have set. Most users will not need to take action. However, if you manually change file permissions using the group name or use group names in your scripts, you may need to make an adjustment. You can always run the “id” command on yourself (“id gburdell3”) to see all of your groups. Not all primary groups will change name, so do not be concerned if yours is left unchanged.
If you have any questions or concerns, please do not hesitate to contact us at pace-support@oit.gatech.edu.