HPC and Spear upgrade to occur May 6 - May 12

This May, we will perform periodic maintenance on our HPC and Spear clusters.  During this time, the HPC and Spear will be unavailable.  We will also be performing brief maintenance on our storage system.

The maintenace window will begin on Monday, May 6 at 7am and last for one week.  All systems will be back online no later than Monday, May 13 at 9am.  Some systems may be available earlier.  We have timed the upgrade to occur between academic semesters in the hope of minimizing potential impact on research activities.

What we are doing

The 2019 software upgrade will allow us to accomplish the following:

  • Upgrade over 500 software packages to new versions (list and details)
  • Upgrade the Slurm scheduler to Version 18.08 (release notes)
  • Run new benchmarks on the HPC and post results in our website
  • Upgrade the hardware network configuration on portions of the HPC cluster
  • Perform critical storage system maintenance activities

Services Affected

  • GPFS and Archival storage will be unavailable briefly on Monday, May 6 from 9am until no later than 12pm.
  • HPC and Spear will remain offline all week until Monday, May 13 at 9am.

On Monday, we will perform brief maintenance to our Archival and GPFS storage systems.  We expect to have these services back online very quickly.  You will be able to read and write data via Globus and SFTP/RSYNC for the remainder of the maintenance period.

The "SKY" VM cluster will not be affected and will remain online throughout the maintenance period.

Tentative Schedule

  • Friday, May 3 - 9am
    • We will begin draining HPC compute nodes.
  • Sunday, May 5 - 5pm
    • We will disable HPC job submission sin Slurm.  The cluster will stop accepting new jobs at this time.  Already-running jobs will continue to run.
  • Monday, May 6 - 7am - MAINTENANCE BEGINS
    • We will disable access to the following systems:
      • HPC Login nodes
      • Spear nodes
      • Export nodes (GPFS and Archival storage)
    • We will turn off and rebuild HPC login nodes and compute nodes.  Any jobs running at this time will be cancelled.
  • Monday, May 6 - 12pm (or earlier)
    • We will restore access to to the Export Nodes (GPFS and Archival storage)
  • Saturday, May 4 - 9am
    • We will run benchmarks and tests on the HPC and Spear
  • Monday, May 13 - 9am
    • HPC and Spear will be back online.

If we are able to provide access to any service early, we will do so and notify RCC users.

Summary

We will publish updates and schedule changes as we get closer to the maintenance window.  In the meantime, we appreciate your patience and support.  If you have any questions, issues, or requests, please let us know: support@rcc.fsu.edu.