Hadoop plugin for LizardFS is here!

As everything that we do Hadoop Plugin for LizardFS is as simple as we could make it.

This is a java based solution allowing Hadoop to use LizardFS storage, implementing an HDFS interface to LizardFS. It functions as kind of a File System Abstraction Layer. It enables you to use Hadoop jobs to directly access the data on a LizardFS cluster. The plugin translates LizardFS protocol and makes the metadata readable for Yarn and Map Reduce. For performance, Hadoop nodes should run on the same machines as LizardFS chunk servers.

LizardFS mount gives direct access to stored files, from the OS level. This allows you to use it as a shared storage in your company and a computation storage for HADOOP at the same time. It is not required to use HADOOP tools to put/get files from your storage in comparison to HDFS. We can also take advantage of Erasure Coding and save a lot of disk space (HDFS recommends to store 3 copies).

The function:

public BlockLocation[] getFileBlockLocations(FileStatus file, long start, long len)

Returns information where data blocks are held in your LizardFS installation. If Hadoop is run on the same machines, it can take advantage of data locality.

To install Hadoop with LizardFS:

1) Install and setup LizardFS cluster

2) Install HADOOP – but don’t start

3) Install LizardFS-HADOOP plugin on all HADOOP nodes

4) Configure LizardFS-Plugin in HADOOP (alongside HDFS or replace it)

5) Start HADOOP

Let us know what you think of it.

Enjoy!

LizardFS @ NABSHOW 2018

 

Come visit our stand at NABSHOW in Las Vegas.

North Hall Central Lobby in the Startup Loft, Booth number: N2936SUL-B

LizardFS@NABSHOW (how to find us)

LizardFS entering Big Data world by releasing LizardFS plugin for Hadoop.

 

 

After so many tests we decided to release pre-alfa, cutting-edge Hadoop connector for LizardFS.

You can download it from here.

We are waiting for the feedback.

At the moment you will be required to build the binaries yourself.

We are looking forward to some feedback. Please bear in mind that we are not Hadoop experts, thu we might have missed some test scenarios.

We really need a help from the community site on this one. Help greatly appreciated and needed.