PACE A Partnership for an Advanced Computing Environment

April 30, 2010

HPC Status update – 4/30/2010

Filed under: Inchworm deployment: Winter 2009 — admin @ 10:40 pm

Much progress has been made, and we’ll likely be ready for our first users next week.  Please send along a list of users you would like us to enable access to your clusters.  If at all possible, please include their GT account names (eg gtg123x).

We are still in need of names for the individual clusters.  Please send me an alternate if you would prefer something other than our working names below:

  • M. Chou (Physics) – “Atlantis” cluster
  • S. Harvey, et.al. (Biology) – “B1” cluster
  • S. Kumar (ME) – “K1” cluster
  • P. Laguna (Physics) – “L1” cluster
  • D. Sholl, et.al. (ChBE) – “Joe” cluster
  • V. Yang (AE) – “Y1” cluster
  • M. Zhou (ME) – “Z1” cluster

In order to meet purchasing deadlines, we intend to proceed with the purchase of Jacket and Matlab Distributed toolkit next week.  This is your final chance to object!

I’ve indicated changes from last week in blue.

Base networking – 97% complete

  • 1 gig links to BCDC (for backups) – complete
  • 475 names and addresses defined in DNS & DHCP – complete
  • 360 ethernet ports configured – complete
  • dual 10GbE uplink from HPC backbone to campus backbone – complete
  • 10 GbE uplink from Inchworm to HPC backbone – complete
  • second 10GbE uplink Inchworm to HPC backbone – delayed (not critical)
  • 10 GbE links to BCDC (replace 1gig) – one link complete, second delayed in favor of new OIT NAS connectivity (not critical)

home & project directories – 95% complete

  • iSCSI targets – complete
  • configure dual 10GbE interfaces on storage servers – complete
  • configure ZFS filesystems on storage servers – complete
  • configure file servers to use PACE LDAP – (new task, complete)
  • provision user filesystems and set quotas – deferred until next week
  • configure backups for user filesystems – deferred until next week

scratch storage – 95% complete

  • Panasas setup and configuration – complete
  • Infiniband router setup and configuration – complete
  • basic host network configuration – complete
  • Panasas client software install & configuration – configuration script complete, need to deploy on nodes

server infrastructure – 99% complete

  • install OS and configure support servers – complete
  • install OS and configure head nodes – complete
  • name head nodes (we need your names!)
  • install and configuration of DNS & DHCP appliances – complete

compute nodes

  • support scripts – 90% complete
  • configure lights-out network consoles (new task) – 85% complete
  • creation of diskless system images (16 types) – 30% complete
  • 8 Community Cluster nodes online
  • bringup of ~275 compute nodes

Moab workload scheduler

  • creation and testing of prologue & epilogue scripts
  • initial configuration of scheduler queues

Software

  • GSL, GIT, ACML – installed
  • Intel Compiler Suite – installed
  • Portland Group PGI Server Complete – installed
  • mvapich2 (w/ gcc, intel and PGI permutations) – installed
  • mpich2 (w/ gcc, intel and PGI permutations) – installed
  • mpich (w/ gcc and intel permutations) – installed
  • ATLAS – in progress
  • lammps – in progress
  • Jacket & Matlab distributed toolkit – under discussion
  • GPU software

Penguin tasks

  • 17 of 50 Penguin blades out for capacitor/diode/resistor repair
  • Supermicro has identified a further resistor fix needed by all 50 blades
  • 50 of 50 Penguin blades in need of BIOS information fix

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress