Community Article #4

My name is Navid Malek Ghaini.

I was born in Tehran, Iran. Iran is a very beautiful country, especially from nature and historical perspective (have a look). I’ve recently graduated from Sharif University of Technology which is the highest-ranking university of the nation and also recognized internationally (most of the graduates apply for the top 50 universities around the world). My major was computer engineering; however, it is called engineering but it is rather computer science (in Iran CS and CE are relatively the same). Currently, I’m a research fellow at INL-LAB and we are working on encrypted traffic analysis; I’m mainly at the network and OS side.

My mother-tongue language is Farsi (aka Persian), I’m also fluent in English. My preference for coding languages that I use is Python, C, JavaScript, and BASH (if considered as a programming language!).

About a year ago, I was a DevOps engineer in a company based in Iran called Nopayar; I was part of the project Yumcoder where we developed a scalable and extensible cross-platform infrastructure with the help of open source technologies. Obviously, at some point in the project, we needed a distributed network file system/storage. I considered so many available options back then, but in the end, I’ve chosen LizardFS for my purpose. The other famous options which I considered were Ceph, BeeGFS, GlusterFS, and ZFS.

I considered LizardFS to be the best open-source solution that existed. The main advantage of LizardFS was its simplicity in comparison with its other open-source rivals. The configuring and deploying of every node and even the whole cluster of nodes is not only easy but also relatively fast. The web GUI for cluster management is also a great benefit that made the work much faster and easier (especially for team leaders and managers who are interested in GUI and integrated information about nodes and cluster).

We needed an open-source platform to further configure and even alter it for our scalable infrastructure. My main interaction with LizardFS was to scale and tune the performance of it, and I should say that I didn’t have any major problem paving this path; the parallel file serving system was extremely well coded, flexible and easy to both tune and alter. Also, the community and the developers were both fast responding and friendly. Moreover, the replicating strategies were not only quite easy to configure but also highly advanced; for instance, we used a slightly modified version of EC replication of LizardFS which was one of the most advanced replicating strategies among open-source distributed file storages back then.

I had been using LizardFS for around four months and it was a great experience. However, it had some minor performance issues with SSD storage back then (which are now history), but I believe it was by far the best option available.

The statement on LizardFS’s website “Get Your Storage Up and Running in 28 Minutes” is the greatest thing about it. Plus, LizardFS is highly configurable and customizable and indeed fast and easy to scale.

With the new team and development strategy, I believe the minor issues that it had, like the one I mentioned, will be eradicated and it will become one of the best-known options, if not the best, network defined storage in the near future.

In the end, I would like to thank Mark Mulrainey for starting this exciting movement of sharing the community experiences with LizardFS.

LizardFS Community Article #3

Logo LizardFS

Hi my name is Nick Coons. I was born in Phoenix, AZ (USA) and have lived here my whole life, though I do like to travel quite a lot. Everyone knows that our summers are hot, and that once the temperature drops below 70F (21c) it’s jacket weather. What a lot of people don’t know is that a lot of tech companies from California are opening offices here, or relocating here entirely. We’re becoming known as the Silicon Desert.

While I only speak English and American :), I’ve written code in many languages; including C/C++/C#, Assembly, Perl, PHP, JavaScript, and numerous varieties of BASIC over the decades.

Besides LizardFS, I’ve also worked with MooseFS and GlusterFS.

I went with LizardFS primarily for its simplicity and ease of management. LizardFS can be as simple or as complicated as one needs it to be, I’ve been using LizardFS for about five years.

We have a couple of clusters in our data center that we use as the storage pool for ProxMox-based VMs as well as user data.

The things I like most about LizardFS include:

The storage devices can be of differing sizes and unevenly distributed (unlike RAID). This allows a cluster to be easily upgraded by adding larger drives that are cost-effective today but may have been too expensive or even non-existent when the cluster was put into service, without having to scrap the existing drives.

While the web UI could use some polishing, I think the information that it provides at a glance is very valuable.

The performance out of the box is nearly as fast as direct writes to the storage devices. There appears to be very little performance loss by adding this system.

I like that it’s FUSE-based and detached from the kernel. I can do kernel update and filesystem updates independently.

Other than some things that I’d like to see added or changed (which I’ll describe later), I can’t think of anything that I really don’t like with LizardFS.

There are a few changes I’d like to see made:

The biggest would probably be a multi-master setup rather than just a high-availability failover. So I’m thinking something along the lines of a MariaDB Galera Cluster rather than uRaft. It’s one of the things that I like about GlusterFS (but it has too many other downsides to consider using). I can see two benefits to this:

Right now, the uRaft daemon manages the master daemon. If the uRaft daemon fails for some reason, the cluster may continue to run normally because the master daemon is still running. Then if something happens with master, the failover malfunctions. Or if uRaft fails, a different shadow is promoted but the master with the failed uRaft isn’t demoted, so you end up with two masters that don’t know about the other. Both of these scenarios have happened to us.

A cluster distributed across a WAN in a multi-master setup would allow the client accessing the cluster to access the master closest to it, and wouldn’t require that masters on the other side of a WAN link be opened up to clients.

Right now, I can query a file to see how many chunks it has, if it’s undergoal, etc. But I can’t do the reverse. If I see in the UI that there are missing or undergoal chunks, I’d like to be able to get a list of those files. I believe there’s a process that runs that updates a list of these, but it’s not real-time. Real-time access to this information would be valuable.

The ability to have different types of storage within a single cluster and set goals to direct it. For instance, I might have a mix of HDDs and SSDs in a cluster, and I would want to define what data goes where. We do this now by running two chunkserver instances on one physical machine and setting goals by chunkserver label, but it seems like a hack and it’d be nice to see this configurable without spawning multiple chunkserver instances.

LizardFS Community article #2

Logo LizardFS

Hi, my name is Tony Travis.

Born in Preston, Lancashire, UK. An old Mill-Town that used to be famous for weaving cotton. It’s up north as the southerners say, although when I moved to Aberdeen they would call me a southerner, Confused? Mee too!

I’m a computer-literate biologist fluent in 6502 assembly,Forth,C/C++, Fortran,Ratfor,Java,Perl,Python,R,awk, with more than thirty years experience of developing and using High Performance Computing techniques for biological research, including analysis of NGS RNAseq data, de-novo assembly and annotation of first and second generation DNA sequence data and the analysis of biological images.

I’m experienced in the construction and administration of Beowulf clusters for computationally demanding work, and creating distributed data sharing and bioinformatics infrastructure within geographically dispersed virtual organisations for which I used Bio-Linux with auto mounted SSHFS folders over the WAN between 32 hosts at 26 partners in different EU countries.

I originally used Sandia National Laboratories oneSIS with NFS, but quickly became aware of the lack of cache coherence in NFS causing problems with multiple writers and started to look for alternatives. Having tried almost all the FLOSS distributed filesystems at one time or another, I came across RozoFS, which uses the Mojette Transform instead of Reed-Solomon codes and is incredibly efficient. I built a six-node RozoFS cluster with 256TB of storage for my colleague Luca Beltrame at the non-profit Mario Negri Institute in Milan that worked well, but was very difficult to administer, despite very good support from the RozoFS developers. I decided to look for alternatives and found MooseFS/LizardFS, so decide to evaluate it. I’ve been using LizardFS for three years now.

I built another six-node storage cluster At the mario Negri Institute to evaluate LizardFS and compare it with RozoFS. Although RozoFS is an extremely interesting leading edge SDS, on balance, I found LizardFS much easier to configure and administer, so I installed LizardFS on both of the Mario Negri SDS clusters to provide a combined total storage of 400TB, the capacity of which could be upgraded to 1,056TB if all the servers are fully populated with disks.

What I like about LizardFS is that it’s easy to install and configure, has good admin tools including the Web GUI, great resilience to hardware failures and good performance even on a 1Gb network.

Things I dislike about it are the need for manual recovery from disk failures, involving editing config files and reloading the service manually. Disk failures are not automatically removing volumes from the configuration. Cold start requires a lot of manual intervention. Bad vibes on the LizardFS blog about the future of the project, but it seems to have been addressed recently.

I would love to see admin tools in the Web GUI for the addition and removal of disks and other admin tasks. More automatic reconfiguration after disk or host failures and automatically accepting hosts that are configured back into the storage cluster.

Community member article

DNA

My Name is Luca Beltrame, I was born in Milan, the second largest city in Italy, which also hosts a large number of scientific institutions. I got a degree in Biotechnology in 2002 and moved into scientific research: while I initially started working as a bench scientist, in 2005, during the course of my Ph.D., I moved on to computational biology  and in particular into genomics (studies of large-scale changes in DNA and RNA), and I’ve been working in this field since then. 

Currently I work as a senior scientist at the Mario Negri Institute for Pharmacological Research (a non profit research institute) in the Department of Oncology. 

I speak Italian, Japanese and English, and I’ve been mainly programming in  Python for my main job and C++ for side projects.

In 2013, the group I am working in bought some new instruments for DNA and RNA sequencing at very high throughput, and that can generate very large amounts of data. Our first objective was to build an analysis infrastructure that would allow us to deal with the data (which can be massive) and to produce output that could be properly interpreted in a biological sense.

For this reason, we built two HPC clusters (as of today, we have three) as an activity on the sideline of my main research projects, using resources graciously donated to the institute and we started thinking about how we could store the data reliably and efficiently: I say efficiently because we would not only need to archive the data, but actually be able to access it with at least acceptable performance. 

The data is also precious: the experiments that produce such data are quite expensive, costing up to $4000 for a single run of the instruments that generate them. Therefore consistency and integrity were also a top priority. 

For this reason we tried a number of different solutions. Since exporting with NFS was neither reliable nor efficient, we looked at distributed file systems, and thus we evaluated and used GlusterFS, MooseFS (prior to the LizardFS fork) and RozoFS. We ultimately settled for LizardFS, which we have been using for about 2 1/2 years now.

The reasons we chose lizardFS is that it’s easy to administer, ease of set up, robustness (it survived several power outages without a single file getting lost), EC22 erasure coding (not all distributed file systems offer it), and high resilience to heavy I/O. We have set up with minimal hassle about 400TB storage. It has many features that we like: administration via Web CGI, the fact that you can create a storage cluster relatively easily, and the use of standard tools for accessing it in a POSIX compliant way (FUSE). The right FUSE options also can considerably increase performance. 

Of course nothing is perfect, and what I’d like to see in the future would be ways to automatically detect disks  being replaced (as opposed to editing configuration files and restarting services), or auto removal of failed disks (kind of what md-raid does in Linux). Lastly, NFS support in LizardFS could use an updated Ganesha plugin to work in more modern setups.

Any other community members that would like to take part let me know.

mark.mulrainey@lizardfs.com

LizardFS developer article

Lizard server

Hi, my name is Krzysztof (shishtof),

Born in Wołomin and have been living in Kobyłka almost my entire life. They are both pretty small cities close to Warsaw, where not much is ever happening, that’s why I’m currently moving to Warsaw.

I speak English, Polish and a little bit of German, which I was recently trying to get to know better, but since joining LizardFS team, I haven’t had much time to do so and focus more on thousands of lines of code. There were also times of my life, in which I was learning Spanish, Russian and Italian, so I’d probably understand some basic phrases, but nothing more than that.

Mostly my experience is in C++, in which I’ve been programming for more than 10 years, since secondary school where I started participating in algorithmic contests. I’ve also programmed professionally in Python and Javascript and their web frameworks. Besides that I’ve done a couple of medium-sized projects in C and Java and single ones in several other languages.

I attended the same secondary and high school in Warsaw, where I spent 6 years. Later I studied Computer Science and Maths at the University of Warsaw and got BSc in both.

During my studies I had a couple of internships and part time jobs in various types of projects – from web development to machine learning programming on GPU.

I also worked at Intel but didn’t like it there very much, so I didn’t stay there for long. Thus, LizardFS is my first job in which I’d like to stay for longer than a couple of months 🙂

I think that although I worked on a couple of bigger projects in my professional career, they weren’t extraordinarily interesting. Out of them LizardFS looks definitely the best.

Previously the best project I’ve worked on is quite a small one, a thing that I wrote with 3 other guys as our bachelor’s thesis. Coding-wise it wasn’t big, but we needed to do lots of research and implement complicated compression algorithms. The biggest value for me were the people in our team. They were all extraordinarily talented developers, after university they all haven’t had any problems getting jobs in the most sought after companies for developers, and during that year of work with them I’ve learned a lot(Really) about coding, but most importantly, work ethic and the way one should think while developing software.

So far in my career I haven’t really experienced anything bad yet. Only my job at Intel that was a bit disappointing – there was lots of corporate stuff I didn’t like and tasks I was given were quite different to what I was promised during the recruitment process.

LizardFS used to be very very stable, which was its biggest strength. The latest release, which added lots of functionality, compromised it a little, that’s why our biggest priority is to fix a couple of bugs that were introduced at that time to bring back the stability. Afterwards besides adding new functionalities, I see lots of possibilities to enhance Lizard’s speed and throughput, even though it’s not bad right now in comparison to our competitors. More is always better right?

I think it’s definitely possible for LizardFS to become one of the most renowned solutions in its genre, not inferior to its competitors in any way, and better in some. That, together with it being open source, makes me hope it will be commonly used, also by large companies on a big scale, with thousands of nodes. It would be very rewarding and I’d feel very proud of being a member of the team.

LinkedIn

Github

Although especially during CS studies I’ve written a lot of code in various projects containing hundreds of lines of code, unfortunately I’ve never really felt the need to use git, that’s why there’s not much stuff on my github. I’m really regretting it now, because of that I lost most of the code I wrote. You live and learn!

LizardFS developer profile

LizardFS server

Hi, my name is Przemysław (pronounced pshemyswav), Przemek for short. I come from Poland.

 

I was born in Lublin. I could give you details like it’s the ninth largest city in Poland etc, but it’s just another ordinary city in Poland. Nothing particularly interesting about it to be honest.

 

After high school in Lublin I moved to Warsaw (the capital city of Poland) to study IT at Warsaw University of Technology, where I’m doing my bachelor’s. During studies I worked as an intern at Samsung where I was part of a team developing and maintaining a C/C++ library for IoT devices to communicate with cloud services. There I also created a simple but versatile cloud-service-mocking framework that helps in testing of this library.

 

I speak English and Polish. I know C, C++, Java, Python, Kotlin (Android). I have most experience programming in C++ and I use it on a daily basis – especially nowadays as I work as part of LizardFS team.

 

LizardFS: It’s open-source, it’s interesting, intimidating at first, but very exciting to work on. It makes use of low-level APIs, clever optimizations, exotic high-performance data structures and various open-source third-party libraries. There is always something new to discover in its source code. On the other hand, that’s also why it is so difficult and time-consuming to become familiar with. But it surely is a rewarding process. There are many things I have learned from its code already.

 

Right now we focus mainly on fixing bugs, most of which have been detected by the project’s community (thanks for that). While doing it, we try to put some effort into increasing the code quality of the parts concerning the bugs, where necessary. After that, an idea I have is to parallelize the master implementation, which currently is the main bottleneck of LizardFS, to speed up filesystem operations performed on Lizard.

 

I envisage LizardFS in the not-too-distant future as being a well-known solution that people trust, are satisfied with and willingly use in their environments. I would love it to become one of the top players in the field of storage solutions, one that is identified with high availability and being open-source.

 

More about me:

GitHub

Linkedin

We have had some interest from the community to take part in this series of articles, they will come shortly.

Anybody else that would like to take part drop me an email:

mark.mulrainey@lizardfs.com

Introduction to the community

My name is Patryk, I was born and raised in Warsaw, Poland, not Indiana.

Warsaw is a fantastic city to live in and has a large amount of very talented programmers.

I speak Polish and English fluently, dabbled a bit in German, Spanish and Italian for a few years. Now, I could probably manage ordering in a restaurant (maybe). I concentrate more on other types of languages, C++, C, Python, Perl, Bash and a few others.

I did my masters of science in engineering and computer science at Warsaw University of Technology, along with my Bachelors degree in engineering and computer science. Most of the creators of LizardFS come from there.

During my studies I gained experience working for 2 Polish companies, Comarch and Gemius, mainly doing C++ stuff. After studies I moved to Intel as a Linux kernel developer working on the Intel puma project (google it if you like, it’s quite interesting stuff). Since March this year I have been working on the LizardFS project.

LizardFS is by far the most interesting and exciting project I have been involved with. Really! It’s a great Polish project (yes I am patriotic), started in Poland by talented MIM UW postgraduates. Unfortunately all of those brilliant minds left the project pursuing new challenges. Now I and our new team try our best to learn the code base, fix bugs left in 3.13-rc1 and develop new features. As most of you understand, learning someone else’s code is never an easy task, but we are coping. We have the benefit of some sessions with the original developers to help us get acquainted with the code faster and push the project back to its former glory.

Currently, as you have seen from the last press release we are working on fixing 3.13-rc2, we succeeded in getting the first parts of that done on schedule, now we are pushing to finish it and have a bullet proof new 3.13 version by the end of the year. LizardFS was always known for its stability, we want that back. After that we will work on improving performance by adding the Agama project mount and Kubernetes support (seems to be the most frequently requested direction for us to take). Besides that I would like to add a security layer to the LizardFS protocol, so that it can be used even in untrusted subnets.

In the future, I would like LizardFS to be an open-source standard for storing files on corporate servers and small businesses, but also on nerds’ homemade server racks. 😉

If you’re interested, you can see more of me below:

StackOverflow

GitHub

LinkedIn

We have already had several of the community step forward and show interest in taking part in these articles, would be great to have many more.

Remember the idea is to try and pull the community together and make the project stronger, hopefully growing along the way.

Drop me an email if you would like to take part: mark.mulrainey@lizardfs.com

Version 3.13.0-rc2

Logo LizardFS

A Promise is a Promise!

 

The first 2 major bug fixes in 3.13.0-rc1 have been completed!

https://lizardfs.com/download/

 

Next we move on to finishing 3.13 by the end of the year. I hope this starts to restore some faith in the project from the community. We need you to make it work!

 

We have an idea to try and involve the community more, to make you more integrated with the whole project, hopefully gaining more and more members along the way.

 

First step: We would like to introduce you to our team, we will publish a series of articles detailing our guys, where they came from and what they expect to achieve etc.

It would be really good if we could do the same with the community, some simple questions to answer, they will tell a bit about you, your experience, your love/hate for LizardFS.

 

How does it sound?

 

Drop me an email if you are interested, I will send you the 10 questions, to help create an article from. You will have total control over what is published!

mark.mulrainey@lizardfs.com

 

Update on LizardFS project

As most of you will have noticed, the LizardFS project has been lacking commitment to the community, and development for the past I would say 2 years. It’s time for a change and I think you will agree. Very recently the project and company have had new owners, new management and new developers. I hope the community still have a glimmer of faith in LizardFS. We have a lot of plans and ideas of how to get LizardFS back to its former glory and way beyond that.

First, I hope you will understand the mess we have taken over, including a large financial burden that we need to work our way out of. With a new team, we will need a little patience from the community while they get up to speed. 3.13.0-rc1 was a total disaster, but it does have lots of good stuff in it and basically 4 major bugs + lots of minor bugs, so we plan to continue forward with that (remember 3.12 is rock solid for any production clusters meanwhile), and we will make 3.13 the way it should have been in the first place.

What I propose is that we are totally transparent with our roadmap and flexible enough to listen to the community’s needs. Below is our plan till the end of the year and the first quarter of 2020.

End of October release of 3.13.0-rc2: 2 major bug fixes issue #780 and #662. Plus lots of minor bug fixes and several enhancements. End of December release of full bulletproof version 3.13 with everything that was promised before but working this time.

In the 2nd quarter we will have the first installment of the agama project, a new mount that will give you minimum 3 times better performance than 3.12 ever produced.

Going further at the moment would be dreaming and just blah blah blah, I prefer to keep it realistic, follow through with what is promised and allow for the community to get behind the project again, then we can plan the rest of 2020 together.

I hope this shows a commitment from us to the project.

If any of the community would like to share knowledge and experience with our devs, it can only help their learning process and also help build a stronger community, we’ve opened up a gitter room for direct communication with the team to help this process. As always any help in bug fixing, enhancements, functionalities, testing the community would like to take part in, will be graciously accepted.

Let’s make LizardFS great again!