zfsonlinux and gluster so far….

Recently I started to revisit the idea of using zfs and linux (zfsonlinux) as the basis for a server that will eventually be the foundation of our gluster storage infrastructure.  At this point we are using the Opensolaris version of zfs and an older (but stable) version of gluster (3.0.5).

The problem with staying with Opensolaris (besides the fact that it is no longer being actively supported itself),  is that we would be unable to upgrade gluster….and thus we would be unable to take advantage of some of the new and upcoming features that exist in the later versions (such as geo-replication, snapshots, active-active geo-replication and various other bugfixes, performance enhancements, etc).


Here are the specs for the current hardware I am using to test:

  • 2 x Intel Xeon E5410 @ 2.33GHz:CPU
  • 48 X 2TB Western Digital SATA II:HARD DRIVES
  • Ubuntu 11.10
  • Glusterfs version 3.2.5
  • 1 Gbps interconnects (LAN)

ZFS installation:

I decided to use Ubuntu 11.10 for this round of testing, currently the daliy ppa has a lot of bugfixes and performance improvements that do not exist in the latest stable release ( 0.6.0-rc6) so the daily ppa is the version that should be used until either v0.6.0-rc7 or v0.6.0 final are released.

Here is what you will need to get zfs installed and running:

# apt-add-repository ppa:zfs-native/daily
# apt-get update
# apt-get install debootstrap ubuntu-zfs

At this point we can create our first zpool. Here is the syntax used to create a 6 disk raidz2 vdev:

# zpool create -f tank raidz2 sdc sdd sde sdf sdg sdh

Now let’s check the status of the zpool:

# zpool status tank
pool: tank
state: ONLINE
scan: none requested
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sdh ONLINE 0 0 0errors: No known data errors

ZFS Benchmarks:

I ran a few tests to see what kind of performance I could expect out of zfs first, before I added gluster on top, that way I would have better idea about where the bottleneck (if any) existed.

linux 3.3-rc5 kernel untar:

single ext4 disk: 3.277s
zfs 2 disk mirror: 19.338s
zfs 6 disk raidz2: 8.256s

dd using block size of 4096:

single ext4 disk: 204 MB/s
zfs 2 disk mirror: 7.5 MB/s
zfs 6 disk raidz2: 174 MB/s

dd using block size of 1M:

single ext4 disk: 153.0 MB/s
zfs 2 disk mirror: 99.7 MB/s
zfs 6 disk raidz2: 381.2 MB/s

Gluster + ZFS Benchmarks

Next I added gluster (version 3.2.5) to the mix to see how they performed together:

linux 3.3-rc5 kernel untar:

zfs 6 disk raidz2 + gluster (replication): 4m10.093s
zfs 6 disk raidz2 + gluster (geo replication): 1m12.054s

dd using block size of 4096:

zfs 6 disk raidz2 + gluster (replication): 53.6 MB/s
zfs 6 disk raidz2 + gluster (geo replication): 53.7 MB/s

dd using block size of 1M:

zfs 6 disk raidz2 + gluster (replication): 45.7 MB/s
zfs 6 disk raidz2 + gluster (geo replication): 155 MB/s


Well so far so good, I have been running the zfsonlinux port for two weeks now without any real issues. From what I understand there is still a decent amount of work left to do around dedup and compression (neither of which I necessarily require for this particular setup).

The good news is that the zfsonlinux developers have not even really started looking into improving performance at this point, since their main focus thus far has been overall stability.

A good deal of development is also taking place in order to allow linux to boot using a zfs ‘/boot’ partition.  This is currently an option on several disto’s including Ubuntu and Gentoo, however the setup requires a fair amount of effort to get going, so it will be nice when this style setup is supported out of the box.

In terms of Gluster specifically, it performs quite well using geo-replication with larger file sizes. I am really looking forward to the active-active geo-replication feature currently planned for v3.4 to become fully implemented and available. Our current production setup (currently using two node replication) has a T3 (WAN) interconnect, so having the option to use geo-replication in the future should really speed up our write throughput, which is currently hampered by the throughput of the T3 itself.

14 thoughts on “zfsonlinux and gluster so far….

  1. Pingback: Native Linux ZFS kernel module and stability. - ShainMiley.com

  2. Pingback: Native Linux ZFS kernel module goes GA. - ShainMiley.com

  3. Pingback: ZFS kernel module for Linux - ShainMiley.com

  4. witalis


    could you write some performance benchmark between linux zfs port vs native-zfs on opensolaris ?

    best regards,

  5. shainmiley Post author

    I guess my answer to the question about production use would be this:

    After following the mailing list and lurking around on the irc channel it appears as though there are a decent amount of people who are using zfsonlinux as an everyday filesystem without too many problems.

    What does this mean exactly? Well it means that for most part people are not having any issues related to data loss at this point, most of the work is going into features such as dedup, compression, using zfs on root partitions, performance enhancements, etc. A lot of people seem to complain about performance related issues running on systems with a limited amount of RAM (especially when trying to use it on low memory systems with dedup enabled), we are currently running with 32GB per node so I don’t expect this to be an issue for us on these systems.

    With all this being said, no matter what filesystem you are using, if you have data on it that you care about you should take care to have some sort of backups available, just in case something goes wrong.

    We are using Gluster to achieve this level of redundancy (replication or geo-replication), however I have seen references to people using zfs send/receive to do this as well.

    I am currently trying to run a bunch of tests using linux+zfs+gluster and I am hopeful that we will be using this in production by years end. There is a chance that I would be willing to switch even sooner, however we currently have a decent amount of space left on our OpenSolaris/zfs setup so I am in no real rush to switch, until that storage infrastructure reaches about the 80% capacity threshold.

  6. dirk adamsky

    Hi Shain,

    Thank you for your extensive answer.
    I will also settle for linux+zfs+gluster(+kvm).
    I have looked at illumian/opensolaris based alternatives:
    nexenta – too expensive
    openindiana – small user base
    clearos – small user base

    The purpose is running windows vm’s for my customers.
    Still have to make some choices now:
    1. which linux distro (centos, ubuntu)
    2. which hardware (normally use hp but supermicro has more flexibility)
    3. which vm management software (openqrm, ovirt, etc.)
    I will let you know when everything is up and running.

    Best regards,

    dirk adamsky

  7. shainmiley Post author

    In terms of which distro to use, as a general rule I use whatever one I am most comfortable with, in this case that would be Debian or Ubuntu. After asking around, it became clear that Debian was not an option for me (zfs packages were not up to date, small number of core dev’s using it as a primary OS, etc).

    There is a lot of work being done on Gentoo and there are developers available on irc if issues arise, however I have not used Gentoo much at all, and although it is really flexible, I often times prefer the simplicity of a distro such as Ubuntu, so that is what I choose to go with.

    In terms of hardware, we are using the NAS50 from e-racks, however I did see this blog post here, which sparked my interest, and I maybe looking at supermicro in the future.

    I have not used openqrm or ovrit, however Proxmox offers a nice set a features and you can use it to create and manage both kvm and openvz vm’s.

  8. shainmiley Post author


    I have been trying to find time to come up with some, I can say that I ran a quick set up tests and the performance seemed about even on both systems (using dd and tar as benchmarks), however if I would have to do a bit more work before I posted any specific numbers.


  9. Dietrich T. Schmitz

    Hi Shain,

    Thanks for the great article.
    Clearly, ZFS is the best.

    What I’d like to try is making my /home partition zfs

    What’s the best way to do that, given I am currently running 12.04 beta 2 with KDE 4.8.1 and /home is ext4 with encryptfs?


  10. shainmiley Post author

    Well I am not aware of your specific setup, however if you have have 2 or more free drives available on that system, I would suggest that you:

    1)Install the zfs kernel modules using the daily ppa’s.
    2)Create a zpool mirror volume and mount it someplace such as ‘/new_home’
    3)Copy over all the necessary files and directories from ‘/home’.
    4)Unmount the new and the old location; Mount the new zfs location as ‘/home’.

    If you are going to use it for ‘/home’ you should also make sure that the zfs filesystems are mounted on startup, this is controlled in ‘/etc/default/zfs’.

    If you do not have full disks available, you can use space on existing free partitions (or files even, although this is only really good for testing purposes).

    I hope this helps…if you have any specific questions feel free to let me know.


  11. shainmiley Post author

    There are a number of reason I am sure:

    1) Adding another software layer (Gluster) on top of anything adds some bit of overhead, so things are generally slower…even if it is just marginally.

    2) It sounds like people who are really looking to get maximum performance out of Gluster, etc are using things like 10Gig networks or even infiniband interconnects. I have found in other tests (testing zfs with iscsi, etc for vm storage for example) that yes, the 1Gig network becomes the bottle neck at some point.

    3) The folks at Gluster have released several versions (as have the zfsonlinux folks) since this writeup, I suspect that running these tests again with updated software might yield better results.

Leave a Reply

Your email address will not be published. Required fields are marked *