Active Alerts

No current alerts. All systems operational.

Alerts Archive

  • Check out our new website https://rcc.fsu.edu

    This is our old website. It will be going away soon. Check out our NEW website at https://rcc.fsu.edu 2021-12-31 00:00:00 1

  • RESOLVED: VPN issues with the "/hpc" profile

    UPDATE - 3:30pm: We are pleased to inform you that the VPN issues that we encountered this morning with the "/hpc" profile are resolved. As always, if you have any issues with it or any of our services, you can send us a message at support@rcc.fsu.edu . We are experiencing network …

  • Slurm Scheduler Issues

    This morning, our Systems Team made an update to the job scheduler (Slurm) in order to fix an ongoing issue we've been having with authentication. This change affected job submissions and most jobs that were already running. If you had any jobs that were pending or running as of this morning, …

  • RESOLVED: Issues with Archival System this morning (Thurs, Aug 27)

    UPDATE - 5pm - The Systems Team reports that the issue has been resolved.  Thanks for your patience, and if you continue to experience issues, please send them to our support email (indicated below). We are having some issues with connectivity between our Export nodes and the Research …

  • Off-campus guest access to RCC resources via VPN not working

    As of Thursday, August 13, 2020, VPN access to FSU requires two-factor authentication (2FA) via Duo .  If you have an FSUID, follow the instructions to setup 2FA. As of now, we recommend that all users use the AnyConnect client ; previously, we had advocated use of the OpenConnect client on …

  • Planned maintenance occuring NOW (Aug 3 - 7)

    UPDATE - Thursday, Aug 6: 4:30pm - Maintenance continues through tomorrow.  Today's updates are as follows: All power upgrades and reconfiguration have been completed. The Slurm Scheduler has been upgraded to v20.02 ( release notes ). The vendor reports that the storage …

  • Tropical Storm Cristobal Notice: Sliger Data Center fully operational at this time

    Tropical Storm Cristobal has formed near the Yucatan Peninsula in the Gulf of Mexico. We have completed checks at the Sliger Building Data Center and are prepared. We will be carefully monitoring the storm’s progress in the coming days, and provide you with another update if the storm …

  • RESOLVED: Intermittent Off-campus VPN Access: ITS VIA VPN is available for Windows, Mac, and Linux users

    UPDATE, May 26 at 3pm: ITS has implemented a second VPN solution, Aruba VIA , to complement the existing AnyConnect/OpenConnect already in-place. In addition, ITS has added additional licenses to the existing AnyConnect solution. This has mostly resolved peoples' inability to access the VPN, …

  • Coronavirus: RCC staff working remotely, but otherwise mostly business-as-usual

    RCC staff will follow the guidance by FSU leadership and work remotely. What does that mean for you? We will work on existing and future requests for software. We will still handle issues you may have with running your jobs on our cluster; we will continue to respond to …

  • Storm Update: No service outage expected at this time.

    A severe storm system is threatening the greater Tallahassee area.  We have completed checks at our data center in the Sliger Building, and we do not expect any interruption in service. We will let you know if the situation changes via this notice list, but for now, we are confident that we …

  • RESOLVED: GPFS storage issues

    UPDATE - Monday, January 20, 2020 : The storage issues we reported on Saturday were apparently isolated to only a few customers, and there were no reports of any running jobs being affected. If you have any issues to report related to our storage system, please feel free to reach us at  …

  • RESOLVED: Globus login issues

    UPDATE - 1pm RESOLVED: This issue is resolved.  Thanks for your patience. This issue occurred, because CAS was unable to load the Service Provider information from InCommon metadata. After manually creating metadata for these two services, authentication is working as expected. We are …

  • Winter Break - FSU Closed Mon, Dec 23 thru Wed, Jan 1

    It's that time of year again! The University will be closed from Monday, December 23 through Wednesday, January 1 for FSU Winter Break.  We will re-open on Thursday, January 2. During this break, our systems will remain online and available, but staff support will be limited. RCC staff will …

  • Happy Thanksgiving! RCC staff out Nov 27 - Dec 1.

    RCC staff will be off from Wed, Nov 27 through Sun, Dec 1 .  We will return on Monday, December 2 During this break, our systems will remain online and available. RCC staff will respond to any critical support requests sent to support@rcc.fsu.edu as soon we are able to. All non-critical …

  • RESOLVED: Research Archival System Issue (Globus affected)

    UPDATE (3pm) -- All archival volumes have been brought back online and the globus fsurcc#archival endpoint has been reactivated. The issue was that a failed drive was being replaced, which is a pretty standard operation for a raid configuration and usually does not impact operations. However, …

  • RESOLVED: Ongoing issues with the HPC login nodes

    UPDATE, Tuesday Oct 22 (8:40am) - The issues with the login nodes have been resolved.  Thanks for your patience. UPDATE, Monday Oct 21 (4pm) - We are experiencing ongoing issues with our virtualization cluster, which is affecting the HPC Login Nodes.  As soon as we have further …

  • Tropical Disturbance 16 - All Systems to Remain Online

    UPDATE - Friday Oct 18 12pm -  We are closely watching Potential Tropical Cyclone Sixteen, and at this time are planning to keep the Sliger server room in operation throughout the storm.  This includes all RCC systems (HPC, Spear, VMs, and storage). We will make a further announcement …

  • RESOLVED - Power Distribution Unit issue affecting HPC

    UPDATE - 4pm - All of the affected nodes (see list below) are back online and operational.  Unfortunately, due to the nature of the problem, all jobs running on the affected nodes were killed. We apologize for the inconvenience, and if we can do anything, please let us know (support@rcc…

  • RESOLVED - Archival Storage Issues

    UPDATE — 2:30pm - We believe all issues with the Archival Storage System are resolved.  Thank you again for your patience. We are experiencing some issues with our Archival Storage System.  The Systems Team is working to rapidly resolve the issue, and minimize downtime. Thanks for …

  • ALL CLEAR - Hurricane Dorian

    UPDATE - Tuesday, September 3 - 11am - ALL CLEAR- Dorian is no longer an immediate threat to Tallahassee. UPDATE - Thursday, August 29 - 2:10pm - As you know, Hurricane Dorian is threatening the greater Tallahassee area.  As a precautionary measure, the ITS Research Computing Center is …

  • RESOLVED - Emergency maintenance on D30 and D31 racks

    UPDATE - Friday Aug 16 - 9:20am: Good news!  The repairs have been completed. As it turns out, we did not have to interrupt any running jobs in any racks, including D30 and D31.  No HPC jobs were affected. The problem was that one of the electrical wires feeding our power supply was …

  • Storage issue on login nodes

    We are currently having an issue with our storage system (GPFS) on the login nodes.  We are working on it, and hope to have it resolved quickly. As far as we know, the issue doesn't affect compute nodes, or already submitted, currently running jobs. 2019-07-26 17:00:00 1

  • HPC, Storage, and Spear Maintenance - May 6 through May 12

    This page provides updates on the Systems Maintenance occurring May 6 through May 12.  You can find a detailed overview of what we are doing in our official announcement .  Please direct questions/concerns to support@rcc.fsu.edu . May 8 - 4pm Most of our compute nodes are reinstalled …

  • Archival storage issue

    We are having a minor network issue with our archival system: the system is up and running, but not accessible through it's virtual IP. We hope to fix the issue this morning.  Our apologies for any inconveniences. 2019-04-04 17:00:00 1

  • PGI Compiler Issues

    There are currently issues with the license for the PGI Compiler .  We are aware of the issue and working to resolve it.  We will post an update as soon as we have fixed the problem.  The GNU and Intel compilers are not affected by this issue. 2019-01-18 00:00:00 1

  • Holiday Break Notice - Dec 22 through Jan 2

    It's that time of year again!  The FSU Holiday Break is upon us. During this break, our systems will remain online and available. RCC staff will respond to any critical support requests sent to support@rcc.fsu.edu as soon we are able to. All non-critical support requests will be answered …

  • RESOLVED - IB switch issues affecting approx 20 compute nodes

    UPDATE - Wed, Nov 28 - 12:30pm - This issue is now RESOLVED.  We appreciate your patience. Our vendor rush-shipped replacement parts, and our systems team installed them today.  All affected nodes and partitions are running at full capacity again. In addition, we have added automated …

  • Hurricane Micheal: RCC Services online

    UPDATE - Monday, Oct 15, 5pm - We are happy to announce that our cluster is back online.  Bringing back a cluster as diverse as ours is a complicated task, so please report an unusual things you might encounter to support@rcc.fsu.edu. UPDATE - Monday, Oct 15 - 8:30am - We are currently …

  • System Maintenance (Residual issues: MATLAB, LAMMPS, engineering, nwchem)

    UPDATE - Tuesday, August 21 - 4:45pm - MATLAB is now working correctly on all nodes.  We are still working on the following known issues: Omnipath networking - this affects users in the engineering partitions on the HPC. Some jobs in these partitions may fail if they use cores …

  • Job failures on hpc-[d30/d31] nodes

    Over the past few days, we've noticed a large number of jobs failing on the HPC when they are assigned to a certain set of nodes. The affected nodes are all nodes in the D30 and D31 racks, and possibly the nodes in the D32 rack.  (hpc-d30..., hpc-d31..., hpc-d32...). Errors typically look …

  • Resolved: HPC Issues

    Starting last Friday, we experienced a problem with our authentication system which caused a number of the cluster nodes to fail.  As a result, some jobs wouldn't run, and other odd things happened.  In some cases, you may have seen a message stating " srun: error: slurm_receive_msgs: Socket timed …

  • Intermittent Panasas Performance Issues

    We have been having intermittent performance issues with the Panasas filesystem today.  This has caused logins and other file operations to hang for up to 30-40 seconds at a time. Our Systems Team is looking into the issue.  If the problem continues, or when we find the root cause of the …

  • Systems Maintenace Saturday (Feb 10) - License Manager, VPN, and Websites

    UPDATE February 12 - 9:30am - All maintenance is complete, and all services were reported online this weekend.  Let us know if you have any issues: support@rcc.fsu.edu . UPDATE February 10 - 4:30pm - VMs, website, and license manager are back online.  However, due to an expected scope …

  • Information regarding the 'meltdown' and 'spectre' Intel vulnernabilities

    UPDATE: Jan 17 - We are still waiting on vendors to release kernel patches and microcode patches that address the vulnerabilities without severely impacting performance. A preliminary kernel patch that we tested caused a significant performance hit on our benchmarks.  Consequently, we are …

  • Holiday Break Notice - Dec 22 thru Jan 2

    It's that time of year again!  The FSU Holiday Break is upon us. During this break, our systems will remain online and available. RCC staff will respond to any critical support requests sent to support@rcc.fsu.edu as soon we are able to. All non-critical support requests will be answered when …

  • Sliger Network Maintanence Thursday - 6-7:30am

    ITS Networking has informed us that they will be performing switch maintenance on the building from 6-7:30am tomorrow (Thursday, December 14). We do not expect RCC Resources to be affected. However, if your jobs read or write data to the Lustre, or your jobs are running on certain …

  • HPC Login Node Maintenance - Saturday, Dec 7 from 7am - 9am

    This Saturday at 7am, we will conduct maintenance on two HPC login nodes.  We expect this maintenance to last two hours.  If you are logged-in to hpc-login.rcc.fsu.edu via SSH around 7am, your session may be disconnected. If do get disconnected, simply re-connect to another login node in our …

  • Planned Globus Downtime - Saturday, Dec 9

    We've received notice from our storage partner, Globus that there will be a brief downtime on Saturday, December 9 from 11am to 3pm (EST).  Here is the notice: As we recently announced, we are working towards making Globus data management solutions suitable for use with protected data and …

  • PanFS Performance Issues - Emergency Maintenance Tuesday, Nov 28

    UPDATE - Tuesday, Nov 28 - 4:50pm - The HPC and the Spear clusters are back online.   You may now submit jobs to the cluster. A few nodes need additional work, but the majority of the cluster is up and running jobs.  Thanks for your patience.  We ran into several residual issues today, which …

  • HPC Maintenance - HPC Rack 6 - Nov 27 & 28 (some compute nodes unavailable)

    Our next (and final) round of HPC maintenance begins on Monday, November 27, and will affect nodes in Rack 6.  The maintenance will last for two days (Monday and Tuesday).  A list of affected partitions and nodes is below.  We have already begun to drain jobs from these nodes in preparation for the …

  • COPMLETE - HPC Slurm Maintenance

    UPDATE - Nov 21 (10:40am) - Maintenance is complete.  Slurm is back online.  Thanks for your patience. We will conduct maintenance on the HPC Slurm Controller on Tuesday, November 21 from 9am - 11am .  During this time, job submission and control commands (sbatch, squeue, etc) will not …

  • HPC Maintenance - Mon Nov 13 thru Thurs Nov 16 - HPC Racks 8, 12, and 20

    Our next round of HPC maintenance begins on Monday, November 13, and will affect nodes in Rack 8, 12, and 20.  The maintenance will last for four days (Monday through Thursday).  A list of affected partitions and nodes is below.  We have already begun to drain jobs from these nodes in preparation …

  • ITS Network Maintenance (Sun, Nov 5 from 12am to 5am)

    TS is conducting network maintenance this weekend.  On Sunday (Nov 5) from 12am until 5am, ITS staff will perform a router repair that will affect intracampus network traffic.  This includes RCC resources, particularly those in Dirac (VMs, Lustre, some HPC compute nodes).  We do not expect any …

  • HPC Maintenance - HPC Rack 7 (some compute nodes unavailable)

    Our next round of HPC maintenance begins on Monday, November 6, and will affect nodes in Rack 7.  The maintenance will last for two days (Monday and Tuesday).  A list of affected partitions and nodes is below.  We have already begun to drain jobs from these nodes in preparation for the downtime. …

  • COMPLETED: Spear and HPC Maintenance

    UPDATE - Oct 26 (12pm)  - Maintenance is complete, and the Spear system is back online.  Thanks for your patience. UPDATE - Oct 24 (4:20pm)  - HPC Rack is back online, and we are currently adding nodes back to the Slurm scheduler.  The Spear system will remain offline through Thursday, Oct …

  • Intermittent PanFS / HPC Login Issues

    UPDATE: Oct 16 - 2:30pm --  We've opened a support request with our storage vendor, and we are working with them to come to a resolution. We are experiencing intermittent issues with our PanFS storage system.  This is causing timeouts when users attempt to connect to the HPC login nodes.  It …

  • HPC Maintenance - hpc-4-[1-40] (some compute nodes unavailable)

    Monday and Tuesday, October 16, and 17 , we will upgrade networking in HPC Rack 4. Some HPC compute nodes will be unavailable during this time, but most of the HPC will be available. We anticipate that this maintenance will be complete no later than  Tuesday, October 17 at 5pm . If you have …

  • Login Issues (VPN and Login Nodes)

    We are working on a few issues related to authentication today: If you are having trouble logging into  hpc-login.rcc.fsu.edu , you can instead use  hpc-login-35.rcc.fsu.edu .  A few other nodes in our login cluster are down today.  We anticipate that they will be back online within a few …

  • HPC Maintenance - hpc-3-[1-40] (some compute nodes unavailable)

    Today and tomorrow, we are upgrading networking in HPC Rack 3. Some HPC compute nodes will be unavailable during this time, but most of the HPC will be available. We anticipate that this maintenance will be complete no later than  tomorrow, Tuesday, October 10 at 5pm . If you have access to an …

  • RCC Monitoring Tropical Storm Nate

    The Research Computing Center staff is monitoring the progress of Hurricane Nate. We do not expect a direct hit at this time, but we are preparing for any possible impact scenarios, including a potential landfall near Tallahassee on or about Sunday, October 8. Tomorrow, October 6, we will …

  • COMPLETE: Maintenance on Export Nodes and Globus

    Update Oct 5, 2:45pm -  Maintenance is now complete.  Thank you for your patience. The RCC export nodes are offline from  Wednesday, October 4 at 9am  until  Thursday, October 5  at 5pm  for planned maintenance.  This includes Globus  any NFS-mounted shares  on virtual machines. We …

  • Lustre Issues - Data Loss Incident

    UPDATE - Sep 18 (11:50am) -  Lustre Data Loss:  https://rcc.fsu.edu/news/lustre-data-loss UPDATE - Sep 18 (9:30am) - Lustre and Spear back online, but there has been some data loss.  We are drafting a message now to send to users with details. We are working on issues related to the …

  • Hurricane Irma Update (9/9 5pm) - All RCC Services Offline

    UPDATE 9/9 - 5pm - Since the hurricane threat to Tallahassee has continued to increase over the past 24 hours, we are obligated to turn off  all RCC services .  Please stay tuned to our Twitter feed  for updates. UPDATE 9/9 - 12:30pm - Hurricane Irma continues to pose a greater threat to …

  • Slurm Controller Issues

    UPDATE (9:15am - Tue, July 25) -  Slurm issues are resolved.  We are continuing to monitor the system today in case we see any residual problems. UPDATE (7:35pm) - Slurm issues persist, and job submissions are currently not working.  Currently running jobs will continue to run, but you may …

  • RESOLVED: Slurm Controller Issues

    We have corrected the issue with the Slurm controller, and the system is back online.  Thank you for your patience. We are currently experiencing issues with the Slurm controller.  Submitting jobs and other Slurm commands are unavailable.  We are looking into the issue and will resolve it as …

  • HPC Issues with backfill and backfill2 partitions

    We are currently examining issues on the  backfill  and backfill2  partitions.  Users attempting to submit jobs to either of these partitions may see their jobs wait indefinitely (or for a very long time), with the reason being shown as "Priority". As soon as we have some updates on the …

  • We are upgrading MATLAB

    We are currently working on upgrading MATLAB to the latest version (R2017a).  You may experience some issues if you or your jobs attempt to use it while we are working on it. 2017-06-05 18:00:00 1

  • SYSTEMS ONLINE - HPC, Spear, and Lustre Export Node

    We have fully completed all of the planned maintenance for the HPC, Spear, and Lustre Export nodes.  All of our services are back online, including: HPC Spear Globus Lustre Export Nodes This upgrade includes a number of end-user changes on our systems.  The major …

  • HPC, Spear, and Lustre Export Node Maintenance

    UPDATE: May 17 @ 2:45pm Globus is now available.  If you use Globus to transfer data to and from our storage systems, you can resume operations. UPDATE: May 16 @ 8:50am We are making progress on the software upgrade, and we are on-track to restore HPC availability early next …

  • General Access Spear Service Restored

    UPDATE Monday, February 20, 2016 - The General Access Spear nodes have been restored.  Thanks for your patience while we worked to bring these back online.   We disabled our General Access Spear nodes yesterday (Spear 1...8, available via  spear-login.rcc.fsu.edu ) to perform some maintenan…

  • Brief (< 15 min) Lustre downtime - Fri at 7am

    There will be a brief service disruption for our Lustre storage system on Friday, December 16 from 7am until 7:15am. The storage system itself will not be affected, but we need to reconfigure a network switch attached to the service.  This will require disconnecting the main distributed …

  • RESOLVED - Lustre Issues

    UPDATE - Nov 18 - 11am - Most Lustre-based services are now resolved.  Please let us know if you have any continuing issues: support@rcc.fsu.edu. UPDATE - Nov 18 - 10:45am - We have discovered the cause of the issue, and are working on resolving it. --- We are having issues with …

  • Maintenance on core router in Dirac data center 11/11/2016

    We will perform a software upgrade on our core Nexus router in the Dirac data center on Friday November 11th around 7AM. We performed a similar upgrade on an other nexus router and did not encounter any problems. The total upgrade can be performed in 10 - 30 minutes, with an anticipated unavailabil…

  • Off-campus access slow or unreliable

    The FSU campus network has been experiencing periodic bouts of slow or unreliable connectivity with the Internet for the past week or two. Some of our off-campus VPN users may experience slow connecitivty to our systems, or may not be able to connect.  If you experience this, please try again …

  • Hurricane Matthew Alert

    Update - October 7 - 10am -  We don't expect any impact from hurricane Matthew over the coming days, but RCC staff members will stay in standby mode in case the path of the hurricane changes First Alert - Octobe 4, 10am -  While it looks like hurricane Matthew will not have any …

  • Latest Update - Lustre Restored, other items

    We have completed restoration of the Lustre filesystem, and the system is now operational. Nearly all data was recovered during the restoration. The copy of data from our backup went much slower than we anticipated, but completed without error A very small number of files on the system that …

  • VMs in Virtual Cluster RESOLVED

    UPDATE 11:45AM -  The issues with the VM cluster are resolved.  Thank you very much for your patience. There was an issue with the underlying storage system.  Systems staff are meeting today to evaluate ways to mitigate future instances of this particular storage issue.  We will keep you …

  • VMs in Virtual Cluster

    There is a storage issue on our systems this morning affecting several VMs in the virtual machine cluster. 2016-09-19 13:00:00 1

  • Hurricane Hermine Recovery - Spear Online; Lustre recovery proceeding

    UPDATE - Thurs, Sept 15, 3:20pm - Lustre at 36% recovered At this time, only three services remain affected by Hermine: Lustre data -  We have recovered 36% of the data on Lustre that was affected by the loss of our OST.  This process is moving slower than expected, and will likely …

  • We're conducting HPC Maintenance July 13 - 20 #rccupgrade2015

    Our transisition from MOAB to Slurm and upgrade occurs this week. Although the Login Nodes are available, the HPC scheduler will be offline during this period. 2015-07-20 00:00:00 1