UPDATE - Thursday, Aug 6: 4:30pm - Maintenance continues through tomorrow. Today's updates are as follows:
- All power upgrades and reconfiguration have been completed.
- The Slurm Scheduler has been upgraded to v20.02 (release notes).
- The vendor reports that the storage upgrade is nearly complete, and he expects to be done no later than tomorrow morning.
- We optimized the IB network configuration in one out of two racks of HPC nodes, and we plan on doing the other one tomorrow.
We will post another update tomorrow before 5pm with another status update.
UPDATE - Tuesday, Aug 4: 4pm - Maintenance continues to go smoothly. Today's updates are as follows:
- The vendor for our storage system is on-site, and we estimate two or three more days until they complete the GPFS and Archival system upgrades.
- We are almost done with our power reconfiguration. Tomorrow morning, we will remove the last remaining customer circuits off of Power Distribution Unit (PDU) "A".
- Work continues on all of the other items.
UPDATE - Monday, Aug 3: 4pm - Maintenance is off to a good start. We have powered off our storage systems for hardware and software upgrades, which we expect will take several days.
There are a lot of moving parts to a parallel storage system, and we have to make sure all of the steps are done in the correct order. Our storage vendor arrived on-site today, and they are working carefully on the upgrade.
Tomorrow, we have planned on more power reconfiguration while the system is turned off, continuing work on our storage system software upgrade, and some network maintenance.
We are planning on conducting system maintenance the week of Monday, August 3 through Friday, August 7. Affected systems include:
The affected services include:
- all HPC and Spear services, including login nodes, parallel storage, and compute nodes,
- all Research Archival volumes,
- all VMs, including those that are hosted for customers
Services not affected include:
- Most data center hosting customers will remain online; we've already reached out and have been working with customers affected by the maintenance.
Read more details in our news announcement.