LizardFS – Software Defined Storage is a distributed, scalable, fault-tolerant, and highly available file system. It allows combining disk space located on several servers into a single namespace visible on Unix-like and Windows systems in the same way as other file systems. LizardFS was inspired by the GoogleFS distributed file system that was introduced in 2010.
LizardFS keeps metadata and the data separately. Metadata is kept on metadata servers, while data is kept on chunkservers. Check up a typical installation on the scheme.
LizardFS makes files secure by keeping all the data in many replicas spread over all available servers. It can also be used to build affordable storage as it runs perfectly on commodity hardware. Disk and server failures are handled transparently without any downtime or data loss.
When storage requirements grow, you can scale up LizardFS installation just by adding new servers – at any time, without any downtime. Data chunks will be automatically moved to the new servers, as it is continuously balancing disk usage across all connected nodes.
Removing a server is just as simple and easy as adding a new one.
Keeping several copies of each file is not space-efficient. Instead of doing that the Erasure Coding, divides each chunk of data into parts. It also creates additional parts called parity stripes which can easily recover missing parts of the original data, allowing the cluster to work efficiently in every condition. Depending on the configuration, this feature allows you to save even 70% of your storage capacity.
Copying large files can be done extremely efficiently by using the snapshot feature. When creating a snapshot, only the metadata of a target file is copied, speeding up the operation. Chunks of the original and the duplicated file are now shared until one of them is modified.
Georeplication allows you to have data replicated between two data centers located in different geographical locations. With Georeplication you can decide where the data is stored. The topology feature allows for suggesting which copy should be read by a client in the case when more than one copy is available.
LizardFS offers mechanisms that allow administrators to set read/write bandwidth limits for all the traffic generated by a given mount point, as well as for a specific group of processes spread over multiple client machines and mountpoints.
LizardFS support disk quota mechanism known from other POSIX file systems. It offers an option to set soft and hard limits for a number of files and their total size for a specific user or a group of users. A user whose hard limit is exceeded cannot write new data to LizardFS.
Another feature of LizardFS is a transparent and fully automatic trash bin. After removing any file, it is moved to a trash bin, which is visible only to the administrator. Any file in the trash bin can be restored or deleted permanently. Thus, data stored on LizardFS is more secure than data stored on a hardware RAID.
The challenges of big data are three-fold: data volume, data velocity, and data type variety. Big Data requires storage that not only scales but is flexible enough to handle a wide range of workloads ranging from real-time data processing and analytics to content delivery and “cold” data storage. With the ability to scale to exabytes, to be distributed across a network, its programmability and automation, LizardFS is an ideal solution for big data workloads.
LizardFS is used for storing and aggregating data from many applications before it gets to ELK. Thanks to that configuration (Elasticsearch and LizardFS) we get storage with searching capabilities that are compliant with GDPR.
Data aggregation due to security precautions is done from flat files. These files are later processed by parsers in Logstash. The whole process is done this way so data can return to the canonical state of a log.LizardFS and Elasticsearch are open source products, you can install them on existing infrastructure. They enable you to create a platform for content management and storage in a cost-effective way. High availability and extreme scalability are just a few added-value features provided by LizardFS.
Financial organizations require highly secure and highly available storage capabilities to scale up and out. SDS offers tiered capacity by service level and the ability to provision storage on demand, which enables optimal capacity based on current business requirements. It also provides detailed metrics for reporting storage infrastructure usage.
LizardFS proved itself to be a great storage solution for finance because of its scalability and performance as well as automatic disaster recovery. LizardFS’ architectures can provide for improved business continuity. In the event of a hardware failure, LizardFS’ environment can shift load and data automatically to another available node. It also allows data to be delivered in a timely manner.
Put an end to time-consuming data copying: you only need one single storage system for providing media assets to every department and every application in your facility, from scanning to color grading, editing, compositing, VFX, and GFX – and everything else.
LizardFS satisfies the most demanding requirements of media and entertainment businesses: high performance of sequential and multi-thread workloads, providing integrity of media content, reliable and scalable storage of large volumes of data.
LizardFS can be effectively built on the basis of standard server hardware. They reduce capital expenditures through a simple scale-out system and can be easily integrated into any IT environment. These properties help to assess not only the current benefits of the solution but also its cost-efficiency in long-term use. Applied technologies of fault-tolerance and data reconstruction minimize production downtime and shorten the time needed for video processing.
Health systems and scientific research are benefitting from an explosion of information technologies to help run their facilities, improve performance, and keep costs contained.
LizardFS can provide hospitals and research centers with uninterrupted data access, massive architectural scale-out, and game-changing application performance. Software-defined storage can be uniquely suited for helping healthcare organizations address the formidable challenges of consumerization by forming a transparent virtualization layer across diverse storage systems to maximize the availability, scalability, and performance of all storage resources as well as reduce existing operating costs by reducing storage costs, reducing time spent on storage maintenance and storage-related downtime.
Maintaining a high pace of innovation requires the kind of IT agility for telecoms that is only available in a software-defined computing environment. Telcos can no longer afford the kinds of delays typically associated with the procurement of hardware to accommodate new and rapid development.
Resources must be instantly provisioned to support the responsiveness required to capitalize on new competitive market opportunities. LizardFS can offer a storage solution that can be distributed across a network, making it far more accessible and faster than traditional appliance-based solutions. Additionally, SDS provides a new level of resilience and reliability via redundancy across many storage devices across different data centers. No single point of failure.