Category Archives: Zfs

Native Linux ZFS kernel module goes GA.

UPDATE: If you are interested in ZFS on linux you have two options at this point:

I have been actively following the  zfsonlinux project because once stable and ready it should offer superior performance due to the extra overhead that would be incurred by using fuse with the zfs-fuse project.

You can read more about using zfsonlinux in another one of my posts here.

————————————————————————————————————————————————————-
Earlier this week  KQInfotech released the latest latest build of their ZFS kernel modules for Linux. This version has been labeled GA and ready for wider testing (and maybe ready for production).

KQStor has been setup as a place where you can go to sign-up for an account, download the software and get additional support.

The source code for the module can be found here:

https://github.com/zfs-linux

Currently mounting of the root filesystem is not supported, however a post here, describes a procedure that can be used to do it.

The users guide also hints at possible problems using ‘zfs rollback’ under certain circumstances.  I have asked for more specific information on this issue, and I will pass along any other information I can uncover.

After looking around the various mailing lists, this looks like it might be an issue that exists with zfs-fuse, and thus the current version of the kernel module as well, since they share a lot of the same code.

Installation and usage:

Installation of the module is fairly simple, I downloaded the pre-packaged .deb packages for Ubuntu 10.10 server.

root@server1:/root/Deb_Package_Ubuntu10.10_2.6.35-22-server# dpkg -i *.deb

If all goes well you should be able to list the loaded modules:

root@server1:/root/Deb_Package_Ubuntu10.10_2.6.35-22-server# lsmod |grep zfs
lzfs                   36377  3
zfs                   968234  1 lzfs
zcommon                42172  1 zfs
znvpair                47541  2 zfs,zcommon
zavl                    6915  1 zfs
zlib_deflate           21866  1 zfs
zunicode              323430  1 zfs
spl                   116684  6 lzfs,zfs,zcommon,znvpair,zavl,zunicode

Now I can create a test pool:

root@server1:/root#zpool create test-mirror mirror sdc sdd

Now check the status of the zpool:

root@server1:/root# zpool status
pool: test-mirror
state: ONLINE
scan: none requested
config:

NAME        STATE     READ WRITE CKSUM
test-mirror  ONLINE    0     0     0
mirror-0  ONLINE       0     0     0
sdc1   ONLINE          0     0     0
sdd1   ONLINE          0     0     0

Gluster on OpenSolaris so far…part 1.

We have been running Gluster in our production environment for about 1 month now, so I figured I would post some details about our setup and our experiences with Gluster and OpenSolaris so far.

Overview:

Currently we have a 2 node Gluster cluster, we are using the replicate translator in order to provide Raid-1 type mirroring of the filesystem.  The initial requirements involved providing  a solution that would house our digital media archive (audio, video, etc), would scale up to around 150TB, support exports such as CIFS and NFS, and be extremely stable.

It was decided that we would use ZFS as our underlying filesystem, due to it’s data integerity features as well as it’s support for taking filesystem snapshots, both considered very high on the requirement list for this project as well.

Although FreeBSD has had ZFS support for quite some time, there were some known issues (with 32 vs 64 bit inode numbers) at the time of my research that prevented us from going that route.

Just this week  KQstor released their native ZFS kernel module for Linux, which as of this latest release is supposed to fully support extended filesystem attributes, these are requirement in order for Gluster to function properly.  This software was Beta at the time,  and did not support extended attributes, so we were unable to consider and/or test this configuration either.

The choice was then made to go with ZFS on OpenSolaris (2008.11 specifically due to the 3ware drivers available at the time).  Currently there is no FUSE support under Solaris, so although you can use it without a problem on the server side,  if you choose to use a Solaris variant for your storage nodes,  you will be required to use a head node with an OS that does support it on the client side.

The latest version of Gluster to be fully supported on the Solaris platform is version 3.0.5. 3.1.x introduced some nice new features, however we will have to either port our storage nodes to Linux, or wait until the folks at Gluster decide to release 3.1.x for Solaris (which I am not sure will happen anytime soon).

Here is the current hardware/software configuration:

  • 2 x Intel Xeon E5410 @ 2.33GHz:CPU
  • 32 GB DDR2 DIMMS:RAM
  • 48 X 2TB Western Digital SATA II:HARD DRIVES
  • 2 x 3WARE 9650SE-24M8 PCIE:RAID CONTROLLER
  • Opensolaris version 2008.11
  • Glusterfs version 3.0.5
  • Samba version 3.2.5 (Gluster1)

ZFS Setup:

Setup for the two OS drives was pretty straight forward, we created a two disk mirrored rpool.  This will allow us to have a disk failure in the root pool and still be able to boot the system.

Since we have 48 disks to work with for our data pool, we created a total of 6 Raid-z2 vdevs, each consisting of 7 physical disks.  This setup gives up 75TB of space (53TB usable) per node, while leaving 6 disks available to use as spares.

user@server1:/# zpool list
NAME       SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rpool     1.81T  19.6G  1.79T     1%  ONLINE  -
datapool  75.8T  9.01T  66.7T    11%  ONLINE  -

Gluster setup:

Creating the Gluster .vol configuration files is easily done via the glusterfs-volgen command:

user1@host1:/#glusterfs-volgen --name cluster01 --raid 1 server1.hostname.com:/data/path server2.hostname.com:/data/path

That command will produce 2 volume files, one is called ‘glusterfsd.vol’ used on the server side and one called ‘glusterfs.vol’ used on the client.

Starting glusterd on the serverside is straightforward:

user1@host1:/# /usr/glusterfs/sbin/glusterfsd

Starting gluster on the client side is straightforward as well:

user1@host2:/#/usr/glusterfs/sbin/glusterfs --volfile=/usr/glusterfs/etc/glusterfs/glusterfs.vol /mnt/glusterfs/

In a later blog post I plan to talk more about issues that we have encountered running this specific setup in a production environment.

More native Linux ZFS benchmarks

Phoronix has published a nice 5 page article, which includes some in-depth file system benchmarks. They tested file systems such as Btrfs, Ext4, Xfs, ZFS-Fuse and the ZFS kernel module from KQ Infotech.

Here is an excerpt taken from the conclusion section of the article:

“In terms of our ZFS on Linux benchmarks if you have desired this Sun-created file-system on Linux, hopefully it is not because of the performance expectations for this file-system. As these results illustrate, this ZFS file-system implementation for Linux is not superior to the Linux popular file-systems like EXT4, Btrfs, and XFS. There are a few areas where the ZFS Linux disk performance was competitive, but overall it was noticeably slower than the big three Linux file-systems in a common single disk configuration. That though is not to say ZFS on Linux will be useless as the performance is at least acceptable and clearly superior to that of ZFS-FUSE. More importantly, there are a number of technical merits to the ZFS file-system that makes it one of the most interesting file-systems around.”

With that being said…I believe that a lot of times when people are choosing to use ZFS as an underlying filesystem for a project, they are not doing so due to it’s reputation as a wonderfully fast file system.  ZFS features such as data integrity, large capacity, snapshotting and deduplication are more likely going to drive your rational for using ZFS as part of your backend storage solution.

Another thing to note about these benchmarks  is that these tests were run on the beta version of the kernel module, and I assume that once the GA version (and source code) is released, there will be plenty of opportunities to try and mitigate some of these concerns as much as possible, however on the other hand you are going to have to live with some of the overhead that comes with using ZFS if you want to take advantage of it’s large feature set.

Ext4 vs Zfs Kernel Module:benchmarks so far.

Well I have finally set aside some time to try and test performance using the zfs kernel module that I blogged about a bit ago.

Overall the zfs kernel module produced results that were similar to the ones I saw while using ext4, however most real world zfs setups are not limited to a single disk, so it will be very interesting to see what kind of performance numbers we will see when we start benchmarking on setups that have many disks.

Although the zfs results were slower in almost every single case, ext4 was not too much faster in most of those cases and I suspect that there are lots of people out there who would be more then willing to take a tiny hit in speed, in order to gain the substantial benefits that comes with having zfs as your underlying filesystem.

Here are some of the benchmarks I got doing the following:

a)create 10,000 files using touch
b)create 10,000 directories using mkdir
c)untar the latest stable linux kernel
d)create a 1GB file using dd
e)find 10,000 files
f)delete 10,000 files
g)find 10,000 directories
h)delete 10,000 directories

At some point soon I plan to add values for raid2z, btrfs, iozone results, etc.

[easychart type=”vertbar” height=”10″ width=”10″ title=”Various File Operations in Seconds” groupnames=”Ext4,Zfs,Zfs-mirror” valuenames=”Touch x 10000,Mkdir x 10000,Untar kernel,Create 1 GB file” group1values=”12.669,14.276,4.997,1.110″ group2values=”13.009,13.015,6.577,6.084″ group3values=”13.044,13.352,9.787,12.208″] [easychart type=”vertbar” height=”10″ width=”10″ title=”Various File Operations in Seconds” groupnames=”Ext4,Zfs,Zfs-mirror” valuenames=”Delete files,Find files,Delete directories,Find directories” group1values=”0.122,0.036,0.163,0.295″ group2values=”0.577,0.096,0.247,0.764″ group3values=”0.526,0.141,0.261,0.690″ ]

ZFS kernel module for Linux

UPDATE: If you are interested in ZFS on linux you have two options at this point:

I have been actively following the  zfsonlinux project because once stable and ready it should offer surperior performance due to the extra overhead that would be incurred by using fuse with the zfs-fuse project.

You can see another one of my posts concerning zfsonlinux here.

————————————————————————————————————————————————————-

KQ Infotech has released (currently in closed beta) code that brings ZFS to Linux via a loadable kernel module.

Here is a link to the current and future feature set.  The reason that this is exciting is that although other ZFS implementations for Linux have traditionally existed, each of the available options have significant drawbacks.  For example  ZFS-FUSE is  implemented in userspace using FUSE, which has additional overhead due to the context switching that is required while switching back and forth between kernel-space and user -space. .

Another option is ZFS on Linux which provides a stable SPA, DMU and ZVOL layer, but does not however provide a Posix layer (ZPL) that would enable you to actually mount a ZFS filesystem from inside Linux.  From what I understand, KQ Infotech has basically taken some of the ZFS on Linux code that was developed by the Lawrence Livermore National Laboratory (LLNL), and actually implemented  the missing ZPL layer.

NPR was recently accepted into the closed beta program,  and I took some time last week to get this module installed on a Dell Poweredge 2950 running a 64 bit version of Ubuntu 10.04.  We are currently testing ZFS under  kernel version  2.6.32-24.  I have not had a ton of time to test things out, but I would say so far so good.  I plan on posting some ZFS and Btrfs benchmarks in the next few weeks after I get some time to better test performance, throughput, etc.

ZFS Resources

Recently I was given the task of putting together  a storage solution that would be used to  house a large amount of our digital assets.  I was also asked to make sure there would be enough space to meet our needs over the next few years.  The project called for a solution that could scale up to around 120TB of usable space.  Depending on the price, this solution might also be used to store a majority of our digital archive (audio and video).

I will go into the specific hardware and software details of the project in another post, however after about a month of research, we decided to go with a solution that was able to take advantage of the ZFS filesystem.

Here are a few documents that I found invaluable during my setup and overall planning:

ZFS Best Practices Guide

ZFS Configuration Guide

ZFS Troubleshooting Guide

ZFS Troubleshooting and Cheatsheet Guide

These links can be a starting point for anyone who wants to gain a better overall understanding of how to best administer a server running ZFS.  The ‘best practices guide’ is also a great resource to consult during the initial project planning stages.

Nexenta

I came across an interesting project last week while doing some research on OpenSolaris and zfs.  The distribution is called Nexenta.  The kernel of Nexenta is based on opensolaris, however the userspace tools are based on Debian/Ubuntu.

There is also a commercial offshoot called the Nexenta storage appliance which is a the Nexentra distribution packaged as a zfs based storage server.  Pricing is dependant on the maxium size of the storage pool.

I have downloaded the free version and am currently planning testing this distro with Gluster as well.  The FUSE project (which is required by a Gluster client to mount the filesystem) is currently not stable on opensolaris.  However I plan on using Nexenta as the server bricks of the Gluster cluster and using Linux as the client, since FUSE has no issues running on Linux.