Community Article #3


Hi, my name is Nick Coons. I was born in Phoenix, AZ (USA), and have lived here my whole life, though I do like to travel quite a lot. Everyone knows that our summers are hot and that once the temperature drops below 70F (21c) it’s jacket weather. What a lot of people don’t know is that a lot of tech companies from California are opening offices here, or relocating here entirely. We’re becoming known as the Silicon Desert.

While I only speak English and American :), I’ve written code in many languages; including C/C++/C#, Assembly, Perl, PHP, JavaScript, and numerous varieties of BASIC over the decades.

Besides LizardFS, I’ve also worked with MooseFS and GlusterFS.

I went with LizardFS primarily for its simplicity and ease of management. LizardFS can be as simple or as complicated as one needs it to be, I’ve been using LizardFS for about five years.

We have a couple of clusters in our data center that we use as the storage pool for ProxMox-based VMs as well as user data.

The things I like most about LizardFS include:
The storage devices can be of differing sizes and unevenly distributed (unlike RAID). This allows a cluster to be easily upgraded by adding larger drives that are cost-effective today but may have been too expensive or even non-existent when the cluster was put into service, without having to scrap the existing drives.

While the web UI could use some polishing, I think the information that it provides at a glance is very valuable.

The performance out of the box is nearly as fast as direct writes to the storage devices. There appears to be very little performance loss by adding this system.

I like that it’s FUSE-based and detached from the kernel. I can do kernel updates and filesystem updates independently.

Other than some things that I’d like to see added or changed (which I’ll describe later), I can’t think of anything that I really don’t like with LizardFS.

There are a few changes I’d like to see made:
The biggest would probably be a multi-master setup rather than just a high-availability failover. So I’m thinking something along the lines of a MariaDB Galera Cluster rather than uRaft. It’s one of the things that I like about GlusterFS (but it has too many other downsides to consider using). I can see two benefits to this:
Right now, the uRaft daemon manages the master daemon. If the uRaft daemon fails for some reason, the cluster may continue to run normally because the master daemon is still running. Then if something happens with the master, the failover malfunctions. Or if uRaft fails, a different shadow is promoted but the master with the failed uRaft isn’t demoted, so you end up with two masters that don’t know about the other. Both of these scenarios have happened to us.

A cluster distributed across a WAN in a multi-master setup would allow the client to access the cluster to access the master closest to it, and wouldn’t require that masters on the other side of a WAN link be opened up to clients.

Right now, I can query a file to see how many chunks it has, if it’s undergoal, etc. But I can’t do the reverse. If I see in the UI that there are missing or undergoal chunks, I’d like to be able to get a list of those files. I believe there’s a process that runs that updates a list of these, but it’s not real-time. Real-time access to this information would be valuable.

The ability to have different types of storage within a single cluster and set goals to direct it. For instance, I might have a mix of HDDs and SSDs in a cluster, and I would want to define what data goes where. We do this now by running two chunkserver instances on one physical machine and setting goals by chunkserver label, but it seems like a hack and it’d be nice to see this configurable without spawning multiple chunkserver instances.