News Items
-
Consolidating Condor into Slurm
As of today, over 229,300 jobs have successfully run in the Slurm scheduler. Given the stability and flexibility of the new scheduler, we are consolidating the Condor system into the Slurm. This means that jobs you previously submitted to Condor, you will now submit to a HPC partition named Condor.
-
Announcing the rcctool
We have created a CLI tool to allow you to see your partitions, account information, and reset your password. Simply run
rcctool
when logged into the HPC. -
Lustre and Spear Status
Update - Oct 14 - 4:45pm Our Systems Team has been working hard today to restore the Lustre storage service. As of 4:45pm today, the Lustre system is online but in recovery mode. It is currently working on Spear nodes, but not yet on export nodes. This means that Spear is now online. …
-
HPC Services Restored - Sliger Cooling Issues
A cooling issue occurred in our data center earlier today. As of 6pm, we are bringing nodes back online.
-
HPC Cheat Sheet
We've published a handy HPC Cheat Sheet. Download and Print it if you want a quick reference.
-
Scheduler Update: Memory Limits
Recently, we noticed a substantial amount of nodes crashing, causing job failures. We have been investigating this issue and have determined that the problem is related to memory issues. Jobs have been filling up all available RAM and swap partitions. Under Moab and RHEL 6.5, this issue did not …
-
HPC Status Update
We've been tuning, tweaking, and fixing the HPC since we upgraded the system in July, and we have lots of updates to report on.
-
We're Hiring (SysAdmin)!
The RCC is hiring a systems administrator to work on our team at FSU. If you're interested, you should apply!
-
Status Report on the HPC
Here is a few updates on the HPC, including the state of accounts, job preemption, and other items.
-
Slurm Scheduler Issues Resolved
UPDATE 7pm : The HPC issues are resolved. Thanks for your patience. We are currently experiencing issues on the HPC where Slurm commands are not responding. Our Systems Team is working to restore the service, and we will keep you posted as soon as we resolve the issue.