Lustre Data Loss

An issue on our Lustre storage system has caused some data to be irrerovacbly lost.  Approximately 12TB (~6.5% of total storage) was affected.

We regret to inform users that a system issue has caused some irrevocable data loss to a large number of files in the Lustre storage system. Approximately 12TB (~6.5% of total storage) was affected.


Distribution of lost files was random, which means that most RCC users with data on Lustre should expect to see missing files in home or shared directories. RCC does not keep backups of the Lustre file system, so any file that was deleted is gone. We did notice that most of the lost data had been written recently.


This problem occurred during the service restoration of our Lustre file system after Hurricane Irma. One of the 24 object storage systems (OSTs) entered an unconfigured state and reinitialized itself. This effectively deleted all objects on that part of the system, and it caused a large number of other files to become corrupt. RCC staff tried to recover this data, but we were unsuccessful.


In order to prevent storage issues like this from causing catastrophic data loss in the future, RCC is taking steps to ensure that we keep adequate backups of user data. To that end, we are in the process of purchasing a new high-capacity storage solution that will replace both Lustre and Panasas. We are committed to configuring and maintaining user data backups on this new consolidated storage system. We'll post more information about this new storage system and our data reliabilty plan as the project moves forward.


In the meantime, if we can assist you in any way with the data on Lustre, please let us know (support@rcc.fsu.edu).


We apologize for the issues that this failure has caused for you and your research.


PS > Panasas and Archival Storage were not affected by this.