Technological advances in next generation sequencing have meant that genomics and bioinformatics organizations are able to generate more genomics data, faster than ever before. In the past 10 years, the amount of genomics data produced globally has almost doubled every 6 months. Michael Schatz an expert in this area, produced a paper projecting the figures for data growth up to 2025, they are just staggering. He said “For a very long time, people have used the adjective ‘astronomical’ to talk about things that are really, truly huge, but in pointing out the incredible pace of growth of data-generation in the biological sciences, my colleagues and I are suggesting we may need to start calling truly immense things ‘genomical’ in the years just ahead.”
The scientific and research community tends to be a budget and a grant driven environment, and the need to scale storage can often be sudden and unexpected. Traditional appliance based scale out storage vendors often take advantage of customers predicaments by charging expansion prices far in excess of what it costs to buy the solution initially. This forces customers to either accept the ransom or have to seek out alternative storage solutions that do not integrate with their existing infrastructure. This not only increases the data management costs but also leaves them with complex and disjointed storage infrastructures.
So what are the solutions available on the market that can help in these situations?
Let’s first look at how it has been done for the past decades.
In the field of data storage, the most popular solution until now has been a Disk Array. However, this technology turned out to be costly, generating large costs for ongoing maintenance and scalability. Once the storage capacity of a Disk Array was full, the only way to continue to store more data was to buy another costly “shelf” from the same producer. When the possibility of adding additional elements was fully exhausted, one had to migrate all the data to a larger, usually a much more expensive Disk Array. Thus, one was left with an old and now redundant Disk Array.
RAID groups are also a very common solution. Unfortunately, the time to rebuild a RAID group after a hard disk failure increases with the capacity of the disks. Currently, with the popular 4TB+ hard drives the recovery of a RAID6 group can take between 2 and 20 hours. Moreover, there is a risk another hard drive will fail during this process. This can lead to irreversible loss of the entire data sets stored on the hard drives.
These options come with some serious expense for maintenance and scaling, along with a risk of data loss.
Is this acceptable?
Well for many years it has had to be, but now we have other options available.
Let’s look at Software defined storage.
This makes it possible to build storage on multiple servers, regardless of their manufacturer. Increasing storage capacity is accomplished by adding an additional server(s) with hard drives to the cluster. This type of solution ensures readiness for the exponential growth of storage whilst using servers from your preferred vendor with the specifications that are most important to you be it for performance or storage capacity. This flexibility can significantly decrease your overall data storage costs.
With a Distributed Software Defined Storage solution, the process of hard drive recovery has a completely different progression. Data Recovery from a damaged or failed hard drive is done by replicating the data from other hard drives. Consequently, the system always keeps a previously set number of copies of the same data in different physical locations. Thanks to the distribution and dissemination of data, an individual hard drives’ workload are lightened during the recovery process, which in turn makes the whole operation quicker and more secure – thus with the growth of storage capacity hard drive recovery time actually decreases.
- downtime of your storage to 0
- time required for maintenance tasks by 90%
- TCO by up to 50%
Get even more added value:
- exit vendor lock-in
- choose components for high performance or low cost/TB
- Scalability up to multiple Exabytes in some cases
Servers instead of an expensive Disk Array/s
Let’s take an installation with 500TB of data as an example for comparison of costs for the old way to the new way.
A disk array with a capacity of 500TB would cost on average $600,000.00 + support contract
A commodity hardware set up with the same capacity would cost $75,000.00 + support contract
When you compare the support costs from the big disk array suppliers and the software defined storage vendors the possible savings of moving away from the old way become even clearer.
In my opinion, it is time to adapt to the new technologies, that give you flexibility, scalability, security and free up your technical team to deal with more important issues than storage. If cutting costs is your key factor then it is a no brainer.