Monthly Archives: June 2011

brtfs: Overview and Performance video

Here is a link to a nice video presentation that was given by Douglas Fuller at LUG 2011. The 25 minute video provides an updated overview of the btrfs feature set, areas in which btrfs still needs some further development, and then goes on to provide some detailed discussion on various benchmarks Doug was able to gather during his time spent with the file system.

The second half of the video features Johann Lombardi going into even further detail about btfrs internals.

Overall a very good talk for anyone who is new to btrfs and wants to get an overview of the features, or anyone who is simply is looking for some specifics in terms of btrfs stability and performance.

Finding and installing pre-created images on OpenStack

Once you have your OpenStack cluster up and running you will need to either find some pre-created image templates or you may decide that you want to roll your own.  I’ll leave the details of creating images from scratch for a different post, this post will focus on providing links to both image files and instructions for installing pre-created Linux templates on OpenStack infrastructure.

First, if you are looking to install any version of Ubuntu, you should visit

http://uec-images.ubuntu.com/releases/

and download the file that corresponds to your desired version and architecture.

Once you have that file, you can follow the instructions here.

If you are looking to install a version of Debian, CentOS or Fedora, you should visit

http://open.eucalyptus.com/wiki/EucalyptusUserImageCreatorGuide_v1.6,

and download one of pre-created images that the folks over at Eucalyptus have provided.

Once you have are ready to install one of those files, you can follow the instructions here.

OpenStack made easy

On the heels of my previous post on StackOps and OpenStack, I thought I would quickly share two of the most valuable links that I came across in my search for good ‘getting started’ documentation.

First, this link provides an excellent architectural overview of OpenStack, which can be quite confusing initially, if you are not a regular user of Amazon EC2 type cloud services.

Secondly, CSS Corp’s beginner’s guide provides an almost invaluable resource to anyone who is getting started and wants access to very easy to read and well written documentation on the subject.

Meet StackOps

While looking into what it would take to setup a development instance of OpenStack, I came across a bare-metal distro that makes it much easier to setup OpenStack nodes especially if (but not limited to) you are simply looking to setup a single node environment for dev or testing.

This distribution is called StackOps.

According to their wiki StackOps is:

a complete, ready to use Openstack Nova distribution verified, tested and designed to reach as many users as possible thanks to a new and simple installation process. Stackops democratizes the cloud computing technology to companies of all sizes and sectors. You only need to download the ISO image with the distro from our site and install it on as many servers as you require. In a few minutes you will be able to enjoy the power of the Cloud for your own!’

Now let’s take a little closer look into what OpenStack is exactly.

According their wiki, OpenStack is:

open source software to build public and private clouds. OpenStack is a community and a project as well as a stack of open source software to help organizations run clouds for virtual computing or storage. OpenStack contains a collection of open source projects that are community-maintained including OpenStack Compute (code-named Nova), OpenStack Object Storage (code-named Swift), and OpenStack Imaging Service (code-named Glance). OpenStack provides an operating platform, or toolkit, for orchestrating clouds.

OpenStack is more easily defined once the concepts of cloud computing become apparent, but we are on a mission: to provide scalable, elastic cloud computing for both public and private clouds, large and small. At the heart of our mission is a pair of basic requirements: clouds must be simple to implement and massively scalable.

Here is a link to the StackOps confluence page, which helps provide all necessary documentation you need get get started.  At this point I do not have enough first hand experience to comment too much more, except to say that after burning the .iso, I was able to have a single node installation setup and running virtual machines within a couple of hours.

I do think that the beauty of this product is that once you go through the install process, which simply involves filling in a series of answers about your architectural preferences, you are then free to focus almost completely on learning the ins and out of OpenStack without having to spend too much time worrying about the StackOps side of things.

Slow ZFS resilvering on OpenSolaris

Two weeks ago I started receiving automated messages from one of our 3ware 9650SE raid cards concerning an increase in the number of SMART errors on one of the 2TB hard drives attached to the card. Within a few days of the raid card starting to generate these messages, ZFS was nice enough to take the drive in question out of service, and replace it with one of the drives we had set aside as an ‘online spare’ for that specific pool.

So far so good.

Two terabytes of data is a decent amount, so I assumed that the resilvering might take some time, and i was able to confirm that after logging in and looking at the output from the ‘zpool status’ command. The output indicated that it was going to take several more hours before the resilvering process would be totally complete.

So far so good.

The next day I logged into to server to check on the progress, not only did I find that the job had not yet been completed, but I also discovered that now the ‘zpool status’ command had almost doubled the amount of time that it estimated would be required to fully resilver the drive.

It was at this point that I started to suspect that maybe our automated snapshotting policy (which runs hourly, daily, weekly and monthly via cron) may be hampering the resilvering progress. A quick google search indicated that at some point in the past, bug number ‘6343667‘ had in fact been associated with degraded scrub and resilvering performance during periods in which snapshots were being taken. It appears that some older versions of ZFS used to require a restart of the entire resilvering process after a snapshot was initiated.

According to bug number ‘6343667‘, this issue was resolved with the release of ZFS pool version 11. I double checked the version we are running on the server in question and discovered that we were running version 13.

At this point I am unsure if the problem that I experienced had anything to do with that specific bug number, however what I do know is that after commenting out the automated snapshot entries from the crontab on that server, the drive resilvering finished quickly and without error, and I have not had any problems since.

Just remember to re-enable to snapshots after the resilver and you should be all set.

GlusterFS and Extended Attributes

Jeff Darcy over at CloudFS.org has posted a very informative and in depth writeup detailing GlusterFS and it’s use of Extended Filesystem Attributes.  One of the areas in which I believe that many Gluster community members would like to see some improvement in, is the area of documentation.

You do not need to spend too much time on the Gluster-users mailing list in order to find frequent calls for more in depth and up to date Gluster documentation.

There is no doubt that the situation has improved greatly over the last 6 to 12 months, however room for improvement still exists, and this is an example of someone outside the inner circle at Gluster providing such a document.

For more information on CloudFS and how it relates to GlusterFS, you can check out these two postings: Why? and What?

Slow backup on Proxmox using vzdump

I was recently attempting to backup one of our Proxmox VE’s using OpenVZ’s backup tool ‘vzdump’. In the past when using vzdump, a complete backup of a 100GB VE, for example could be obtained in under and hour or so. This time however, after leaving the process running and returning several hours later, the .tar file was a mere 2.3GB in size.

At first I thought that there might be an issue with one or more nodes in the shared storage cluster, so I decided I would direct vzdump to store the .tar file on one of the server’s local partitions instead. Once again I started the backup, returned several hours later, only to find a file similar in size to the previous one.

Next I decided I would attempt to ‘tar up’ the contents of the VE up manually, that combined with the ‘nohup’ command would allow me to find out at what point this whole process was stalling.

As it turns out, I had thousands of files in my ‘/var/spool/postfix/incoming/’ directory on that VE, and although almost every single file in that directory was small, and the overall directly size was not large at all, the result was that file operations inside that folder had come to a screeching halt.

Luckily for me, I knew for a fact that we did not need any of these particular email messages, so I was simply able to delete the ‘incoming’ folder and then recreate it once all the files had been removed, after that, vzdump was once again functioning as expected.