PACE A Partnership for an Advanced Computing Environment

November 17, 2020

FAQs after user migration to the Phoenix cluster in CODA

Filed under: Uncategorized — Semir Sarajlic @ 8:35 pm

Dear PACE research community,

After we completed our second wave of user migration last week, we received some common questions from users in reference to the new cost model that was announced on September 29 and about the new cluster, Phoenix, in general, which we will address below for the benefit of the community:

  • The Phoenix scheduler has been redesigned.   Unlike previous PACE-managed clusters, there are only two queues on the Phoenix cluster: inferno and embers.  To submit a job, you will need to specify a charge account (i.e., MAM account) that was/will be provided to you in the “welcome email” after migration to the Phoenix cluster in Coda.  You may have access to multiple MAM accounts, for example, a PI and their user group may have access to an Institute sponsored account (GT-gburdell3  – $68/mo), account for refreshed PI cluster (e.g., GT-gburdell3-CODA20 -> $43,011.32), or account for recent FY20 purchase (e.g., GT-gburdell3-FY20Phase2 -> $17,860.75).  For further details on submitting jobs on the Phoenix cluster, please refer to the documentation at  http://docs.pace.gatech.edu/phoenix_cluster/submit_jobs_phnx/ .
  • Access to departmental PACE resources (e.g. CoC, CEE, biology,…) are restructured based on departmental preferences. As with the rest of PACE, access is now managed at a group level, each owned by a specific PI, although the distribution of available departmental credits may vary from one department to another.
  • We are in a process of providing PIs further details with regards to their cluster(s) from Rich datacenter that were refreshed and converted into credits/MAM account according to the new cost model.  Additionally, PIs who participated in the FY20 purchases will receive further details about the conversion from purchased equipment to the credits/MAM account.
  • As mentioned in our initial announcement on September 29, users will not be charged for their usage of compute resources until at least January 1, 2021.   Until that time, all jobs that run on Phoenix are free as we work to migrate all users into the cluster and for the users to get familiar with the new environment.  Please note that your credits will be declined, but we will reset your total before we start billing.
  • All of your data has been migrated to Phoenix, but the structure has changed. Note that the data is now in your project storage under a different directory name, and the symbolic links to different locations have been broken due to this.  Please visit our documentation for information on locating your group’s shared directory and on recreating symbolic links as documented at http://docs.pace.gatech.edu/phoenix_cluster/where_is_my_rich_data/ .  For further details, please refer to the documentation at http://docs.pace.gatech.edu/phoenix_cluster/storage_phnx/ .
  • pace-vnc-job command is functional, however, you will need to setup VNC for the Phoenix cluster.  To setup VNC, remove ~/.vnc directory, then run vncpasswd to set the new password for VNC on the Phoenix cluster.  After this, you will be able to submit pace-vnc-job with the additional MAM account that you will need to pass to the command.

If you have any questions, concerns or comments about your recent migration to Phoenix, upcoming migration or the new cost model, please direct them to pace-support@oit.gatech.edu.

Best,

The PACE Team

November 13, 2020

[RESOLVED] PACE-archive storage – scheduled migration – November 17

Filed under: Uncategorized — Semir Sarajlic @ 7:20 pm

[Update – November 18, 10:08am] 

We are following up to inform you that the migration of pace-archive storage from Rich to BCDC datacenter has completed.  The service is fully operational.  You may now access your archived data via Globus PACE Phoenix endpoint if you have migrated to the Phoenix cluster, or PACE Internal endpoint if you are in Rich datacenter.

If you have any questions, please don’t hesitate to contact us at pace-support@oit.gatech.edu .

Thank you for your patience during this brief outage while migrated pace-archive.

 

[Update – November 17, 7:01am] 

At this time, the migration of pace-archive storage has started.  During the migration, you will not have access to the pace-archive. This migration is anticipated to last 1 day.  We will keep you posted on the progress of the archive storage migration, and you may check our blog post for further updates: https://blog.pace.gatech.edu/?p=6990

If you have any questions, please don’t hesitate to contact us at pace-support@oit.gatech.edu.

Thank you for your attention to this notice.

 

[Update – November 16, 8:08pm] 

Dear PACE Users,

This is reminder that the migration of pace-archive storage will begin tomorrow as scheduled.   This migration is anticipated to last 1 day.  Please note, during the archive storage migration from Rich to BCDC, you will not have access to pace-archive.  Please make necessary arrangements in accessing your data prior to this scheduled outage so that the impact to your research is minimized.

What is happening:  Tomorrow, PACE users, will not be able to access pace-archive storage during the scheduled migration of the storage servers from Rich to BCDC datacenter.  PACE team is planning to restore access to archive storage by November 18, 2020.   During this outage, users will not be able to access their data, for example, use Globus pace-internal endpoint to access, retrieve, or upload their data from/into pace-archive.

Who does this message impact and what should you do: This outage impacts all PACE users who have access to the pace-archive storage.  Please  use this notice to plan accordingly in accessing your data around this scheduled outage so that the impact to your research is minimal.

What will PACE do: We will keep the users updated on the progress of the archive storage migration, and you may check back this blog post for further updates.

If you have any questions, please don’t hesitate to contact us at pace-support@oit.gatech.edu.

Thank you for your attention to this notice.

 

[Original Post – November 13, 7:20pm]

Dear PACE Users,

We are reaching out to inform you about the upcoming migration of pace-archive storage servers that’s scheduled for November 17.  The migration is anticipated to last 1 day.  During the archive storage migration from Rich to BCDC datacenter, you will not have access to pace-archive.  Please make necessary arrangements in accessing your data prior to this scheduled outage so that the impact to your research is minimized.

What is happening:  On November 17, 2020, PACE users, will not be able to access pace-archive storage during the scheduled migration of the storage servers from Rich to BCDC datacenter.  PACE team is planning to restore access to archive storage by November 18, 2020.   During this outage, users will not be able to access their data, for example, use Globus pace-internal endpoint to access, retrieve, or upload their data from/into pace-archive.

Who does this message impact and what should you do: This outage impacts all PACE users who have access to the pace-archive storage.  Please  use this notice to plan accordingly in accessing your data around this scheduled outage so that the impact to your research is minimal.

What will PACE do: We will keep the users updated on the progress of the archive storage migration, and you may check back this blog post for further updates.

If you have any questions, please don’t hesitate to contact us at pace-support@oit.gatech.edu.

Thank you for your attention to this notice.

November 9, 2020

December 1, 2020 – PACE Users will have Access to Rich Datacenter Disabled (This does not apply to users accessing CUI resources in Rich)

Filed under: Uncategorized — Semir Sarajlic @ 3:08 pm

Dear PACE Users,

In the past couple months, we have reached out to research groups with regards to the required user migrations from Rich to CODA datacenter.  At this time we are actively migrating users into CODA, and we have another migration of research groups scheduled for December 1st.  In an abundance of caution, if you have not received an email about your migration to CODA datacenter, please contact PACE about your migration at your earliest convenience.

What is happening:  On December 1, the remaining PACE users (non-CUI) in the Rich datacenter will have their access disabled as part of the last migration to CODA datacenter that starts on December 1.  Please note, this does not apply to CUI resources and their user migrations at this time.

Who does this message impact, and what should I do:  If you are NOT already migrated to CODA, in the process of migrating to CODA, or received an email from PACE research scientist about your planned migration to CODA, then please contact pace-support@oit.gatech.edu so that we may address your migration and prevent interruption to your research as we disable access to Rich datacenter.

This message is being sent out of abundance of caution to ensure that no user is left behind in Rich datacenter as we disable access to all non-CUI resources in Rich datacenter on December 1, 2020.   If you have any questions or concerns, please direct them to pace-support@oit.gatech.edu.

Best,

The PACE Team

 

 

 

November 2, 2020

[Resolved] Phoenix Storage (Lustre) slowness that’s impacting data and scratch

Filed under: Uncategorized — Semir Sarajlic @ 12:50 pm

[Update – 11/03/2020 – 11:01am]

As of late last night, the slowness experienced on Phoenix storage was resolved.   Thank you for your patience and understanding while we worked to address this issue.

What is happening and what we have done:   In response to reports from users about slowness in accessing files on Phoenix’s Lustre storage, PACE team was able to replicate this issue during our investigation, and through our troubleshooting that included Lustre Metadata Service (MDS) reboots, we were able to resolve the slowness.   The Phoenix Lustre storage is stable at this time, and there was no loss of user data during this incident.   

What we will continue to do: PACE will continue to monitor the Phoenix storage out of abundance of caution, and we will update as needed.

Again, this issue did not impact any of the other resources in Coda and Rich Datacenter.

Thank you for your attention to this message, and we apologize for this inconvenience.

 

[Original Post – 11/02/2020 – 1:03pm]

Dear PACE Users,

PACE is aware of the slowness experienced on Phoenix’s storage.  At this time, PACE is able to replicate the issue, and we are investigating the root cause of the storage issue.

What is happening and what we have done:   We’ve received couple reports from users about slowness in accessing files from ‘data’ and ‘scratch’ directories on Phoenix’s Lustre storage.  Some users are experiencing slowness in accessing their files, and running commands such as ‘ls’ or opening a file with ‘vim’ may be very slow.  During our investigation, PACE team is able to replicate this issue, and we are investigating the root cause of the slowness with storage.   

What we will continue to do: This is an active situation, and we will follow up with updates as they become available.

This issue does not impact any of the other resources in Coda and Rich Datacenter.

Thank you for your attention to this message, and we apologize for this inconvenience.

The PACE Team

Powered by WordPress