Why LizardFS – Software defined storage is a great choice for Genomics and Biotech facilities.
Technological advances in next generation sequencing has meant that genomics and bioinformatics organizations are able to generate more genomics data, faster than ever before. In the past 10 years the amount of genomics data produced globally, has almost doubled every 6 months. Michael Schatz an expert in this area, produced a paper projecting the figures for data growth up to 2025, they are just staggering. He said “For a very long time, people have used the adjective ‘astronomical’ to talk about things that are really, truly huge, but in pointing out the incredible pace of growth of data-generation in the biological sciences, my colleagues and I are suggesting we may need to start calling truly immense things ‘genomical’ in the years just ahead.”
With all of this valuable data accumulating in the field of genomics, it introduces the challenge of scalability – how can organizations provide access to data to enable research, that requires high speed access, distributed access, long term access and importantly affordable access?
LizardFS has proven successful within the genomics field already, but don’t take my word for it, continue reading and decide for yourself if it can fit your needs.
Let’s talk about Scalability:
With the projected figures of genomical data growth, storage infrastructures need to be able to scale. If you have terabytes of data now you will probably have petabytes in the not too distant future. Several of our clients are already at the petabytes level. Most organizations can throw some more servers or drives at the infrastructure to increase capacity, but scalability is more than that, you need performance even if you have hundreds of scientists accessing the data at the same time, scalability is about being able to share the data across large teams, even if they are in different buildings/countries. In our case scalability is given without limits in flexibility so you are able to use multiple operating systems, and grow your technology infrastructure without constraints of vendor/brand utilizing commodity platforms etc.
Yes LizardFS is scalable in all of these ways.
Next is Data Integrity and Security:
In today’s modern world of data, almost everything has to be kept for a period of time, CCTV recordings for 3-5 years, official documentation for 10 years, in genomics it’s not unusual to hear this data retention period is forever! Organizations need an infrastructure that allows a researcher to access the valuable data in 4 years, 10 years or even 50 years.
When preserving data for decades, you need to have a storage solution that plays well with others, that can interoperate with traditional NFS and CIFS applications – along with cloud protocols and S3 access, you need something that maintains your data integrity and prevents errors from occurring over time, keeping your data valid and safe.
To have your data in 50 years, you need to make it secure now. Safe from hurricanes, power outages, hardware problems, human stupidity!
This means a solution that takes your data and replicates it across different disks, storage nodes and racks, even across different data centers in multiple countries. This allows for a particular drive, node or data center to fail but your data is still safe, as you have multiple copies stored in multiple places. A major plus of this other than data security, is that spreading the data across locations makes it easier to share between those same locations without the overhead and bandwidth of traditional data replication technologies.
LizardFS has 2 forms of distributing the data, replication goals and erasure coding. You choose which suits you best.
Last but not least let’s talk price:
While genomics data has been growing, data storage budgets have not grown at the same rate, if at all. Due to the many features of LizardFS, it can actually reduce existing operating costs, by reducing storage costs, reducing time spent on storage maintenance and eliminating storage related downtime, ohh and it’s open source so free to download.
With the high value that research teams place on their data and want to preserve it for future reuse and analysis, LizardFS is definitely worth taking for a spin.