PACE A Partnership for an Advanced Computing Environment

October 27, 2014

Major Storage Issue (why were the head nodes unavailable?)

Filed under: tech support — Semir Sarajlic @ 1:32 pm

Yesterday (10/26), early evening (4:50pm), it appears one of our primary storage units decided to have a serious crash (page fault in the kernel, if you wanted more detail), and that proceeded to offline a good share of the storage allocated to supporting our VM infrastructure. Since most of the head nodes we run are in fact VMs, this of course meant that the head nodes themselves started having problems handling new job requests and allowing logins.

Please note, any submitted jobs were not affected, only jobs that were in the process of submission around 4:50pm yesterday until 8:30am this morning.

We have restored functionality to this array and will be submitting tickets with the vendor shortly to evaluate what has occurred on the machine, and any remediations we can apply. We may need to reboot the head nodes affected by this to get them to their proper state as well, but we are evaluating where we are before making that call.

UPDATE 1:
Unfortunately, upon review, we will have to restart the head node VMs, and that process will start immediately so that folks can submit jobs as soon as possible.

UPDATE 2:
With the engagement of the vendor, we have identified the likely cause of this problem which will ultimately be addressed during our January Maintenance, due to its requirement for a reboot (which would be service interrupting right now). Thankfully, a work-around for the bug that we could apply without requiring a reboot is available and should keep the system stable until then. At this time, we have enacted that work-around.

October 23, 2014

PACE clusters ready for research

Filed under: tech support — admin @ 8:08 am

Greetings!

Our quarterly maintenance is now complete, and the clusters are running previously submitted jobs and awaiting new submissions.

In general, all tasks were successfully completed.  However, we do have some compute nodes that have not successfully applied the kernel update.  We will keep those offline for the moment and continue to work through those tomorrow.

As always, please contact us (pace-support@oit.gatech.edu) for any problems or concerns you may have. Your feedback is very important to us!

October 21, 2014

quarterly maintenance underway

Filed under: tech support — admin @ 10:24 am

Scheduled maintenance has begun.  Please see our previous post here for details.

October 1, 2014

Georgia Tech’s HPCC Initiative Planning – Second Industry/Research Partnership Meeting

Filed under: Events — admin @ 6:57 pm

78052A32-B2DB-4BC2-9546-6F62E3DF2152

For most of you receiving this email, the Technology Square Phase Two – High Performance Computer Center (HPCC) is not a new initiative. Following up on a successful first meeting this past March where Georgia Tech hosted over 100 GT faculty and industry partners, today I’m very happy to invite you to participate in the second planning meeting for the Tech Square Phase II/HPCC.  GT faculty and researchers who work in cloud computing, smart grid, building information modeling, big data and secure storage, networking (data centers as well as community networking, network virtualization, etc.) will be in attendance as well as researchers working on the model of using the data center as a key part of urban sustainability in our community (heat reuse, analytics capabilities for startup companies). Researchers and current industry partners in these areas will present their interests and capabilities in a tight 5 minute presentation format. You will have an opportunity to participate in our discussion and review the ideas which have been proposed, helping to guide us in this endeavor.

Continuing on the momentum from our first planning meeting, we are hosting this second meeting at Georgia Tech on November 11th, from 8 AM until 12 PM.  This meeting will immediately precede Georgia Tech’s People and Technology Forum, which you are invited to attend as well.

RSVP for the meeting is requested. To RSVP for this planning session click here.

If you would like to attend the IPaT Forum as well, you can register here.

As we finalize the agenda for this meeting we will follow-up with more details.  If you have any questions, please don’t hesitate to reach out to me or the GT Corporate relations team directly.

See you in November!
ron

Ron Hutchins, PhD
Associate Vice Provost for Research and Technology and
Chief Technology Officer
Office of the Executive Vice President for Research

Powered by WordPress