HPC Services Restored - Sliger Cooling Issues

A cooling issue occurred in our data center earlier today. As of 6pm, we are bringing nodes back online.

UPDATE Thurs Oct 8; 9am

This is a final message to let you know that the HPC is fully functional as of 9am this morning. We are checking all compute nodes to ensure they are online. If you see any that are down, please let us know.

The original issue was caused by a failure of the cooling units at the Sliger Data Center that began yesterday afternoon. Temperatures were in excess 95 degrees Fahrenheit when we began turning nodes off.

ITS and the vendor worked to correct the issues yeterday, and we were subsequently able to begin turning on compute nodes. This morning, we restarted the HPC Slurm Controller, and the service is online.

If we can answer any questions, please let us know: support@rcc.fsu.edu. Thanks for your patience during this issue.


UPDATE 6pm -- Cooling issues are resolved. We are bringing nodes back online.


Hi RCC Users,

As of 3pm today, we have shut down all HPC compute nodes (login nodes are currently available, but may also be shut-off if necessary). This is due to an emergency cooling situation in the Sliger Data Center.

The staff at the Sliger Data Center are reporting temperatures in excess of 90 degrees Fahrenheit, and have requested that we shutoff all equipment until the issue is resolved. Shutting down the systems will mitigate the potential for permanent hardware damage due to excess heat.

Systems that are OFFLINE:

  • HPC Compute Nodes

Systems that are ONLINE:

  • Panasas
  • HPC Login nodes (not compute nodes)
  • Lustre
  • Spear
  • SKY VMs
  • NoleStor

Updates will follow. If you have any questions, please let us know: support@rcc.fsu.edu