Community member article

DNA

My Name is Luca Beltrame, I was born in Milan, the second largest city in Italy, which also hosts a large number of scientific institutions. I got a degree in Biotechnology in 2002 and moved into scientific research: while I initially started working as a bench scientist, in 2005, during the course of my Ph.D., I moved on to computational biology  and in particular into genomics (studies of large-scale changes in DNA and RNA), and I’ve been working in this field since then. 

Currently I work as a senior scientist at the Mario Negri Institute for Pharmacological Research (a non profit research institute) in the Department of Oncology. 

I speak Italian, Japanese and English, and I’ve been mainly programming in  Python for my main job and C++ for side projects.

In 2013, the group I am working in bought some new instruments for DNA and RNA sequencing at very high throughput, and that can generate very large amounts of data. Our first objective was to build an analysis infrastructure that would allow us to deal with the data (which can be massive) and to produce output that could be properly interpreted in a biological sense.

For this reason, we built two HPC clusters (as of today, we have three) as an activity on the sideline of my main research projects, using resources graciously donated to the institute and we started thinking about how we could store the data reliably and efficiently: I say efficiently because we would not only need to archive the data, but actually be able to access it with at least acceptable performance. 

The data is also precious: the experiments that produce such data are quite expensive, costing up to $4000 for a single run of the instruments that generate them. Therefore consistency and integrity were also a top priority. 

For this reason we tried a number of different solutions. Since exporting with NFS was neither reliable nor efficient, we looked at distributed file systems, and thus we evaluated and used GlusterFS, MooseFS (prior to the LizardFS fork) and RozoFS. We ultimately settled for LizardFS, which we have been using for about 2 1/2 years now.

The reasons we chose lizardFS is that it’s easy to administer, ease of set up, robustness (it survived several power outages without a single file getting lost), EC22 erasure coding (not all distributed file systems offer it), and high resilience to heavy I/O. We have set up with minimal hassle about 400TB storage. It has many features that we like: administration via Web CGI, the fact that you can create a storage cluster relatively easily, and the use of standard tools for accessing it in a POSIX compliant way (FUSE). The right FUSE options also can considerably increase performance. 

Of course nothing is perfect, and what I’d like to see in the future would be ways to automatically detect disks  being replaced (as opposed to editing configuration files and restarting services), or auto removal of failed disks (kind of what md-raid does in Linux). Lastly, NFS support in LizardFS could use an updated Ganesha plugin to work in more modern setups.

Any other community members that would like to take part let me know.

mark.mulrainey@lizardfs.com

LizardFS developer article

Lizard server

Hi, my name is Krzysztof (shishtof),

Born in Wołomin and have been living in Kobyłka almost my entire life. They are both pretty small cities close to Warsaw, where not much is ever happening, that’s why I’m currently moving to Warsaw.

I speak English, Polish and a little bit of German, which I was recently trying to get to know better, but since joining LizardFS team, I haven’t had much time to do so and focus more on thousands of lines of code. There were also times of my life, in which I was learning Spanish, Russian and Italian, so I’d probably understand some basic phrases, but nothing more than that.

Mostly my experience is in C++, in which I’ve been programming for more than 10 years, since secondary school where I started participating in algorithmic contests. I’ve also programmed professionally in Python and Javascript and their web frameworks. Besides that I’ve done a couple of medium-sized projects in C and Java and single ones in several other languages.

I attended the same secondary and high school in Warsaw, where I spent 6 years. Later I studied Computer Science and Maths at the University of Warsaw and got BSc in both.

During my studies I had a couple of internships and part time jobs in various types of projects – from web development to machine learning programming on GPU.

I also worked at Intel but didn’t like it there very much, so I didn’t stay there for long. Thus, LizardFS is my first job in which I’d like to stay for longer than a couple of months 🙂

I think that although I worked on a couple of bigger projects in my professional career, they weren’t extraordinarily interesting. Out of them LizardFS looks definitely the best.

Previously the best project I’ve worked on is quite a small one, a thing that I wrote with 3 other guys as our bachelor’s thesis. Coding-wise it wasn’t big, but we needed to do lots of research and implement complicated compression algorithms. The biggest value for me were the people in our team. They were all extraordinarily talented developers, after university they all haven’t had any problems getting jobs in the most sought after companies for developers, and during that year of work with them I’ve learned a lot(Really) about coding, but most importantly, work ethic and the way one should think while developing software.

So far in my career I haven’t really experienced anything bad yet. Only my job at Intel that was a bit disappointing – there was lots of corporate stuff I didn’t like and tasks I was given were quite different to what I was promised during the recruitment process.

LizardFS used to be very very stable, which was its biggest strength. The latest release, which added lots of functionality, compromised it a little, that’s why our biggest priority is to fix a couple of bugs that were introduced at that time to bring back the stability. Afterwards besides adding new functionalities, I see lots of possibilities to enhance Lizard’s speed and throughput, even though it’s not bad right now in comparison to our competitors. More is always better right?

I think it’s definitely possible for LizardFS to become one of the most renowned solutions in its genre, not inferior to its competitors in any way, and better in some. That, together with it being open source, makes me hope it will be commonly used, also by large companies on a big scale, with thousands of nodes. It would be very rewarding and I’d feel very proud of being a member of the team.

LinkedIn

Github

Although especially during CS studies I’ve written a lot of code in various projects containing hundreds of lines of code, unfortunately I’ve never really felt the need to use git, that’s why there’s not much stuff on my github. I’m really regretting it now, because of that I lost most of the code I wrote. You live and learn!

LizardFS developer profile

LizardFS server

Hi, my name is Przemysław (pronounced pshemyswav), Przemek for short. I come from Poland.

 

I was born in Lublin. I could give you details like it’s the ninth largest city in Poland etc, but it’s just another ordinary city in Poland. Nothing particularly interesting about it to be honest.

 

After high school in Lublin I moved to Warsaw (the capital city of Poland) to study IT at Warsaw University of Technology, where I’m doing my bachelor’s. During studies I worked as an intern at Samsung where I was part of a team developing and maintaining a C/C++ library for IoT devices to communicate with cloud services. There I also created a simple but versatile cloud-service-mocking framework that helps in testing of this library.

 

I speak English and Polish. I know C, C++, Java, Python, Kotlin (Android). I have most experience programming in C++ and I use it on a daily basis – especially nowadays as I work as part of LizardFS team.

 

LizardFS: It’s open-source, it’s interesting, intimidating at first, but very exciting to work on. It makes use of low-level APIs, clever optimizations, exotic high-performance data structures and various open-source third-party libraries. There is always something new to discover in its source code. On the other hand, that’s also why it is so difficult and time-consuming to become familiar with. But it surely is a rewarding process. There are many things I have learned from its code already.

 

Right now we focus mainly on fixing bugs, most of which have been detected by the project’s community (thanks for that). While doing it, we try to put some effort into increasing the code quality of the parts concerning the bugs, where necessary. After that, an idea I have is to parallelize the master implementation, which currently is the main bottleneck of LizardFS, to speed up filesystem operations performed on Lizard.

 

I envisage LizardFS in the not-too-distant future as being a well-known solution that people trust, are satisfied with and willingly use in their environments. I would love it to become one of the top players in the field of storage solutions, one that is identified with high availability and being open-source.

 

More about me:

GitHub

Linkedin

We have had some interest from the community to take part in this series of articles, they will come shortly.

Anybody else that would like to take part drop me an email:

mark.mulrainey@lizardfs.com

Introduction to the community

My name is Patryk, I was born and raised in Warsaw, Poland, not Indiana.

Warsaw is a fantastic city to live in and has a large amount of very talented programmers.

I speak Polish and English fluently, dabbled a bit in German, Spanish and Italian for a few years. Now, I could probably manage ordering in a restaurant (maybe). I concentrate more on other types of languages, C++, C, Python, Perl, Bash and a few others.

I did my masters of science in engineering and computer science at Warsaw University of Technology, along with my Bachelors degree in engineering and computer science. Most of the creators of LizardFS come from there.

During my studies I gained experience working for 2 Polish companies, Comarch and Gemius, mainly doing C++ stuff. After studies I moved to Intel as a Linux kernel developer working on the Intel puma project (google it if you like, it’s quite interesting stuff). Since March this year I have been working on the LizardFS project.

LizardFS is by far the most interesting and exciting project I have been involved with. Really! It’s a great Polish project (yes I am patriotic), started in Poland by talented MIM UW postgraduates. Unfortunately all of those brilliant minds left the project pursuing new challenges. Now I and our new team try our best to learn the code base, fix bugs left in 3.13-rc1 and develop new features. As most of you understand, learning someone else’s code is never an easy task, but we are coping. We have the benefit of some sessions with the original developers to help us get acquainted with the code faster and push the project back to its former glory.

Currently, as you have seen from the last press release we are working on fixing 3.13-rc2, we succeeded in getting the first parts of that done on schedule, now we are pushing to finish it and have a bullet proof new 3.13 version by the end of the year. LizardFS was always known for its stability, we want that back. After that we will work on improving performance by adding the Agama project mount and Kubernetes support (seems to be the most frequently requested direction for us to take). Besides that I would like to add a security layer to the LizardFS protocol, so that it can be used even in untrusted subnets.

In the future, I would like LizardFS to be an open-source standard for storing files on corporate servers and small businesses, but also on nerds’ homemade server racks. 😉

If you’re interested, you can see more of me below:

StackOverflow

GitHub

LinkedIn

We have already had several of the community step forward and show interest in taking part in these articles, would be great to have many more.

Remember the idea is to try and pull the community together and make the project stronger, hopefully growing along the way.

Drop me an email if you would like to take part: mark.mulrainey@lizardfs.com