HPC and Spear software upgrade to occur this May

This May, we will conduct software maintenance on our HPC and Spear clusters.  We will also perform brief maintenance on our Lustre export nodes.  This maintenance will allow us to upgrade all of the software on our clusters to newer versions, and more.

From Monday, May 8 to no later than Monday, May 22 we will conduct a software upgrade on the HPC and Spear clusters.  Both clusters will be unavailable at certain times during this period (see tentative schedule below).

What we are Doing

These upgrades will include the following:

  • We are upgrading operating systems from CentOS7.1 to CentOS7.3.
  • We are installing new versions of most software on the cluster.  We will release a manifest listing all of the packages that are being upgraded, along with the new versions as soon as we have it.
  • We are installing a new remote connectivity tool on Spear called Xpra.  This will replace the current NX server, and will provide a new set of features for secure graphical connectivity to Spear.
  • We are upgrading the HPC Slurm Scheduler from Version 15.08 to Slurm Version 17.02.  There are some new features in this version.  You can see the details in the official release notes here and here.
  • We are updating all software documentation on our website, and filling in documentation gaps where we find them.
  • We are improving our authentication code, and making a few other backend changes to simplify management and improve reliability of our services.

Tentative Schedule

This schedule is based on where we are with our project planning, and it may change slightly as we get closer to the maintenance window.

  • Monday, May 8 at 7am - We will prevent any new job submissions to the HPC.
  • Monday, May 8 at 8am - We will disable access HPC, Spear, and Lustre export nodes.  This will kill any running jobs on the HPC.
  • Tuesday, May 9 at 5pm - Spear and Lustre will be back online.
  • Monday, May 22 at 8am - HPC will be back online.  We may be able to bring up significant parts of the cluster before this date, and if so, we will let you know.

Summary

We will publish updates as the project moves along and we get closer to the maintenance window date.  In the meantime, if you have any questions, issues, or requests, please let us know.