Monthly Archives: January 2011

Updated Native Linux ZFS benchmarks just released some updated numbers from benchmarks they took using the recently released GA version of the native ZFS kernel module for Linux. They conducted a total of 10 tests using the ZFS kernel module, Ext4, Btrfs and XFS.

The tests were performed using Ubuntu 10.10 and kernel version 2.6.35 for the ZFS tests,  kernel version 2.6.37 was used when testing the other three filesystems.

It appears that these tests were all run using single disk setups, I think it would be really great if Phornix would also look into providing benchmarks on multi-disk setups such as ZFS mirrored disks vs hardware or software RAID1 on Linux. I would also like to see benchmarks comparing RAID5 on Linux vs RAIDZ on ZFS.  I think these kinds of tests might provide a more realistic comparison of real world enterprise level storage configurations.

SUNWattr_ro error:Permission denied on OpenSolaris using Gluster 3.0.5

Last week I noticed an apparently obscure error message in my glusterfsd logfile. I was getting errors similar to this:

[2011-01-15 18:59:45] E [compat.c:206:solaris_setxattr] libglusterfs: Couldn’t set extended attribute for /datapool/glusterfs/other_files (13)
[2011-01-15 18:59:45] E [posix.c:3056:handle_pair] posix1: /datapool/glusterfs/other_files: key:SUNWattr_ro error:Permission denied

on several directories as well as on the files that resided underneath those directories. These errors only occurred when an attempt was made by Gluster to stat the file or directory (ls -l vs ls) in question.

After reviewing the entire logfile, I was unable to see any real pattern to the error messages, the errors were not very widespread given that I was only seeing these one maybe 75 or so files out of our total 3TB of data.

A google search yielded very few results on the topic, with or without Gluster as a search term. What I was able to find out was this:

SUNWattr_ro and SUNWattr_rw are Solaris ‘system extended attributes’, these attributes cannot be removed from a file or directory, you can however prevent users from being able to set them at all, by setting xattr=off, either during the creation of the zpool or changing the parameter after the fact.

This was not a viable solution for me due to the fact that several of Gluster’s translators require extended attributes be enabled on the underling filesystem.

I was able to list the extended attributes using the following command:

user@solaris1# touch test.file
user@solaris1# runat test.file ls -l
total 2
-r–r–r– 1 root root 84 Jan 15 11:58 SUNWattr_ro
-rw-r–r– 1 root root 408 Jan 15 11:58 SUNWattr_rw

I also learned that some people were having problems with these attributes on Solaris 10 systems, this is due to the fact that the kernels that are used by those versions of Solaris do not include, nor do they understand how to translate these ‘system extended attributes’, that were introduced in new versions of Solaris . This has caused a headache for some people who have been trying to share files between Solaris 10 and Solaris 11 based servers.

In the end, the solution was not overly complex, I had to recursively copy the directories to a temporary location, delete the original folder and rename the new one:

(cp -r folder;rm -rf folder;mv folder)

These commands must be done from a Gluster client mount point, so that Gluster can set or reset the necessary extended attributes.

Native Linux ZFS kernel module and stability.

UPDATE: If you are interested in ZFS on linux you have two options at this point:

I have been actively following the  zfsonlinux project because once stable and ready it should offer surperior performance due to the extra overhead that would be incurred by using fuse with the zfs-fuse project.

You can see another one of my posts concerning zfsonlinux here.


There was a question posted in response to my previous blog post found here, about the stability of the native Linux ZFS kernel module release. I thought I would just make a post out of my response:

So far I have been able to perform some limited testing (given that the GA code was just released earlier this week), some time ago I had been given access to the beta builds,  so I had done some initial testing using those, I configured two mirrored vdevs consisting of two drives each. It seemed relatively stable as far as I was concerned, as I stated in my previous post…there is a known issue with the ‘zfs rollback’ command…which I tested using the GA release,  and I did in fact have problems with.

The work around at this point seems to be to perform a reboot after the rollback and then a ‘zfs scrub’ on the pool after the reboot. Personally I am hoping this gets fixed soon, because not everyone has the same level of flexibility, when it comes to rebooting their servers and storage nodes.

As far as I understand it, this module really consists of three pieces:

1)SPL -  a Linux kernel module which provides many of the Solaris kernel APIs. This layer makes it possible to run Solaris kernel code in the Linux kernel with relatively minimal modification.
2)ZFS – a Linux kernel module which provides a fully functional and stable SPA, DMU, and ZVOL layer.
3)LZFS – a Linux kernel module which provides the necessary POSIX layer.

Pieces #1 and #2 have been available for a while and are derived from code taken from the ZFS on Linux project found here. The folks at KQ Infotech are really building on that and providing piece #3, the missing POSIX layer.

Only time will tell how stable the code really is, my opinion at this point is that most software projects have some number of known bugs that exist (and even more have some unknown number of bugs as well), I know I am going to continue to test in a non production environment for the next few months.  At this point I have not experienced any instability (other then what was discussed above) or crashing, all the commands seem to work as advertised, there are a lot of features I have not been able to test yet, such as dedup, compression, etc, so there is lots more to look at in the upcoming weeks.

KQStor’s business model seems to be one where the source code is provided and support is charged for.  So far I have been able to have an open and productive dialog with their developers, and they have been very responsive to my inquiries, however it does not appear that they are going to be setting public tools such as mailing lists or forums, due to their current business model.  I am hoping that this will change in the near future, as I truly believe that everyone will be able to benefit from those kinds of public repositories, and there is no doubt in my mind that such tools will only lead to a more stable product in the long run.

Native Linux ZFS kernel module goes GA.

UPDATE: If you are interested in ZFS on linux you have two options at this point:

I have been actively following the  zfsonlinux project because once stable and ready it should offer superior performance due to the extra overhead that would be incurred by using fuse with the zfs-fuse project.

You can read more about using zfsonlinux in another one of my posts here.

Earlier this week  KQInfotech released the latest latest build of their ZFS kernel modules for Linux. This version has been labeled GA and ready for wider testing (and maybe ready for production).

KQStor has been setup as a place where you can go to sign-up for an account, download the software and get additional support.

The source code for the module can be found here:

Currently mounting of the root filesystem is not supported, however a post here, describes a procedure that can be used to do it.

The users guide also hints at possible problems using ‘zfs rollback’ under certain circumstances.  I have asked for more specific information on this issue, and I will pass along any other information I can uncover.

After looking around the various mailing lists, this looks like it might be an issue that exists with zfs-fuse, and thus the current version of the kernel module as well, since they share a lot of the same code.

Installation and usage:

Installation of the module is fairly simple, I downloaded the pre-packaged .deb packages for Ubuntu 10.10 server.

root@server1:/root/Deb_Package_Ubuntu10.10_2.6.35-22-server# dpkg -i *.deb

If all goes well you should be able to list the loaded modules:

root@server1:/root/Deb_Package_Ubuntu10.10_2.6.35-22-server# lsmod |grep zfs
lzfs                   36377  3
zfs                   968234  1 lzfs
zcommon                42172  1 zfs
znvpair                47541  2 zfs,zcommon
zavl                    6915  1 zfs
zlib_deflate           21866  1 zfs
zunicode              323430  1 zfs
spl                   116684  6 lzfs,zfs,zcommon,znvpair,zavl,zunicode

Now I can create a test pool:

root@server1:/root#zpool create test-mirror mirror sdc sdd

Now check the status of the zpool:

root@server1:/root# zpool status
pool: test-mirror
state: ONLINE
scan: none requested

test-mirror  ONLINE    0     0     0
mirror-0  ONLINE       0     0     0
sdc1   ONLINE          0     0     0
sdd1   ONLINE          0     0     0

Gluster on OpenSolaris so far…part 1.

We have been running Gluster in our production environment for about 1 month now, so I figured I would post some details about our setup and our experiences with Gluster and OpenSolaris so far.


Currently we have a 2 node Gluster cluster, we are using the replicate translator in order to provide Raid-1 type mirroring of the filesystem.  The initial requirements involved providing  a solution that would house our digital media archive (audio, video, etc), would scale up to around 150TB, support exports such as CIFS and NFS, and be extremely stable.

It was decided that we would use ZFS as our underlying filesystem, due to it’s data integerity features as well as it’s support for taking filesystem snapshots, both considered very high on the requirement list for this project as well.

Although FreeBSD has had ZFS support for quite some time, there were some known issues (with 32 vs 64 bit inode numbers) at the time of my research that prevented us from going that route.

Just this week  KQstor released their native ZFS kernel module for Linux, which as of this latest release is supposed to fully support extended filesystem attributes, these are requirement in order for Gluster to function properly.  This software was Beta at the time,  and did not support extended attributes, so we were unable to consider and/or test this configuration either.

The choice was then made to go with ZFS on OpenSolaris (2008.11 specifically due to the 3ware drivers available at the time).  Currently there is no FUSE support under Solaris, so although you can use it without a problem on the server side,  if you choose to use a Solaris variant for your storage nodes,  you will be required to use a head node with an OS that does support it on the client side.

The latest version of Gluster to be fully supported on the Solaris platform is version 3.0.5. 3.1.x introduced some nice new features, however we will have to either port our storage nodes to Linux, or wait until the folks at Gluster decide to release 3.1.x for Solaris (which I am not sure will happen anytime soon).

Here is the current hardware/software configuration:

  • 2 x Intel Xeon E5410 @ 2.33GHz:CPU
  • 48 X 2TB Western Digital SATA II:HARD DRIVES
  • Opensolaris version 2008.11
  • Glusterfs version 3.0.5
  • Samba version 3.2.5 (Gluster1)

ZFS Setup:

Setup for the two OS drives was pretty straight forward, we created a two disk mirrored rpool.  This will allow us to have a disk failure in the root pool and still be able to boot the system.

Since we have 48 disks to work with for our data pool, we created a total of 6 Raid-z2 vdevs, each consisting of 7 physical disks.  This setup gives up 75TB of space (53TB usable) per node, while leaving 6 disks available to use as spares.

user@server1:/# zpool list
rpool     1.81T  19.6G  1.79T     1%  ONLINE  -
datapool  75.8T  9.01T  66.7T    11%  ONLINE  -

Gluster setup:

Creating the Gluster .vol configuration files is easily done via the glusterfs-volgen command:

user1@host1:/#glusterfs-volgen --name cluster01 --raid 1

That command will produce 2 volume files, one is called ‘glusterfsd.vol’ used on the server side and one called ‘glusterfs.vol’ used on the client.

Starting glusterd on the serverside is straightforward:

user1@host1:/# /usr/glusterfs/sbin/glusterfsd

Starting gluster on the client side is straightforward as well:

user1@host2:/#/usr/glusterfs/sbin/glusterfs --volfile=/usr/glusterfs/etc/glusterfs/glusterfs.vol /mnt/glusterfs/

In a later blog post I plan to talk more about issues that we have encountered running this specific setup in a production environment.