HPC Maintenance Complete

We are happy to announce that the HPC is back online. Thank you for your patience during our extended maintenance period.

Last week, we moved around many compute nodes and racks in the Sliger data center. We also installed 80 new compute nodes with 1,280 new cores. We removed all old 2008 and 2009 nodes (consisting of 2,048 cores). We will re-purpose some of these for the Condor cluster in the coming weeks.

Also, we are going to slowly start phasing out existing department-specific login nodes in favor of a new login cluster. The old nodes will remain online until they break. In the meantime, we encourage you to connect via hpc-login.rcc.fsu.edu from now on.

Some more details about the upgrade:

  • We upgraded to new firmware on the Panasas storage system.
  • We upgraded to a new version of our MOAB Scheduler software. This fixes some problems we've encountered with the scheduler in the past.
  • We upgraded to a new version of the PGI Compiler
  • We upgraded the RedHat OS on all systems to version 6.5
  • We installed new versions of openmpi and mvapich2
  • We installed new debugging software for all users: TotalView Debugger (https://rcc.fsu.edu/software/totalview)
  • We added Python version 2.7 (the default 'python' command is still 2.6; refer to Python documentation) And more..

One last note: Inevitable hardware problems will arise from moving as many nodes around as we did. For example, during our testing period, we noticed problems such as malfunctioning nodes or bad InfiniBand and Ethernet connections. Therefore, not all nodes are immediately available, but we are working hard to bring the remaining systems online.

Let us know if you run into any issues by emailing support@rcc.fsu.edu, and we will try to fix them.