Upcoming Maintenance: May, 2020

Starting on Monday, May 4, we will bring down the HPC cluster to perform critical upgrades and other maintenance.

Due to travel restrictions related to COVID-19, we are delaying this maintenance until at-least August; read our announcement.

During this upgrade, we will perform upgrades to all major software on the HPC and Spear. Notable highlights include:

  1. We will upgrade Slurm to the latest version (v19.05 as of the time of this article)
  2. We will update the firmware on all of our Nexus switches.
  3. We will upgrade GPFS to v5.0
  4. We will perform hardware maintenance on the Research Archival System
  5. We will reorganize part of our network configuration
  6. We will audit and reconfigure our power infrastructure
  7. We will update the software on our database server
  8. We will reorganize our software stack to support for multiple installed versions of the same package. This will allow for "live" rolling upgrades of software and reduce the number of maintenance periods with required downtime.

We will have more information as the project planning moves along. Stay tuned!