5 Feb, 2012  |  Written by  |  under BTRFS, Linux

I have been waiting for the video presentation of a talk given by Chris Mason at this year’s Scale 10x to finally be posted online.  The original Scale 10x talks were streamed live, and the website claims that the videos will be posted online soon, however at this point no date has been provided.

In the meantime however, I found a link to another talk given by Chirs, this time hostsed at linuxfoundation.org. In order to view the full video you do need to provide your name and email address, but the process is painless and well worth the 30 seconds it takes to fill in the form.

It appears as though this was put together in December 2011, so it is relatively new and up to date, provides a nice introduction to btrfs, a look at the upcoming feature set, and a list of work that still needs to be done in order to make btrfs production ready.

Here is a link to the first few minutes of the talk:

5 Feb, 2012  |  Written by  |  under BTRFS, Linux, Xfs

There was another file system talk to come out of the recent Linux.conf.au conference, this one was given by Dave Chinner and was entitled ‘XFS: Recent and Future Adventures in Filesystem Scalability’.

Here Dave discusses some of the historical roadblocks which prevented XFS from scaling as well as it could have, provides some in depth details about how these issue were eventually overcome, shows off some benchmarks comparing throughput and overall scaling using XFS, EXT4 and BTRFS.

Dave finishes up the talk with some discussion about what you can expect next from XFS and then takes some questions from the audience.

30 Jan, 2012  |  Written by  |  under BTRFS, Linux

Here is a Youtube video of a presentation from this years Linux.conf.au conference given by Avi Miller.  The video talks about the current state of btrfs, some of the upcoming features, and Avi also provides a demonstration of one of the filesystem recovery tools in action.

Here are a a few of the highlights:

  • Lots of performance and stability fixes
  • Lots of code cleanup
  • New compression options (LZO and snappy)
  • Auto file defrag
  • Kernel 3.3 will allow larger block sizes (4k,8k,16k) for better meta-data throughput
  • A ZFS like send/receive is in the works
  • New filesystem checker (btrfsck) should be released by Feb 14th
  • Raid 5/6 code (from Intel) will go into mainline kernel after the release of btrfsck
  • Options exist/will exist to do mixed raid modes for data and meta-data
  • Btrfs will be production filesystem in next version of Oracle Unbreakable Linux

No doubt about it, if you are interested in the current state of btrfs you should check out this talk.

17 Jan, 2012  |  Written by  |  under Debian, Linux

I have spent some time over the last few weeks getting familiar with mdadm and software RAID on Linux, so I thought I would write down some of the commands and example syntax that I have used while getting started.

1)If we would like to create a new RAID array from scratch we can use the following example commands:
RAID1-with 2 Drives:
mdadm --create --verbose /dev/md0 --level=1 /dev/sda1 /dev/sdb1

RAID5-with 5 Drives:
mdadm --create --verbose /dev/md0 --level=5 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

RAID6-with 4 Drives with 1 spare:
mdadm --create --verbose /dev/md0 --level=6 --raid-devices=4 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

2)If we would like to add a disk to an existing array:
mdadm --add /dev/md0 /dev/sdf1 (only added as a spare)

mdadm --grow /dev/md0 -n [new number of active disks - spares] (grow the size of the array)

3)If we would like to remove a disk from an existing array:
First we need to ‘fail’ the drive:
mdadm --fail /dev/md0 /dev/sdc1

Next it can be safely removed from the array:
mdadm --remove /dev/md0 /dev/sdc1

4)In order to make the array survive a reboot, you need to add the details to ‘/etc/mdadm/mdadm.conf’
mdadm --detail --scan >> /etc/mdadm/mdadm.conf (Debian)
mdadm --detail --scan >> /etc/mdadm.conf (Everyone else)

5)In order to delete and remove the entire array:
First we need to ‘stop’ the array:
mdadm --stop /dev/md0

Next it can be removed:
mdadm --remove /dev/md0

6)Examining the status of your RAID array:
There are two options here:
a)cat /proc/mdstat
b)mdadm --detail /dev/md0

Here is a quick tip for anyone who needs to access files that exists underneath an already mounted filesystem mount point.  For example suppose that you have some files located in a directory called ‘/tmp/docs’.

At some point someone might decide to accidentally  take that same directory, and create an NFS or CIFS mount,  if you need to access the original files that existed before the new mount point was put into place, you have two options.

  1. Unmount the NFS or CIFS filesystem and access your files and then remount.
  2. However, you may find yourself in a situation (such as I did), where it is extremely inconvenient or impossible for you have the downtime associated with the umount/remount process.  In that case you have another option…you can use a ‘bind’ mount.

All you need to do is something like the following:

mount --bind /tmp /tmp/new_location

Now you should be able to access the original files here:

‘/tmp/new_location/docs’

**UPDATE**

Dane (see comments) pointed out that ATI has in fact released the 11.10 version of their drivers, I went ahead and gave them a try and using them broke most things for me.

Once I booted back in to Gnome…I had some of the Gnome3 look and feel…but everything else (menus, icons, etc) were clearly from Gnome2.  I reinstalled version 11.9 and everything was back to normal.  This update might work for some other setups…but for now I’ll just stick with the version that is working 95% of the time.

——————————————————————————————————————————————————————

I was finally able to get a working desktop using Ubuntu 11.10, Gnome Shell, Gnome 3.2 along with my Radeon HD 2400 XT video card.  The adventure started a few weeks ago when I tried to setup my existing Ubuntu 11.04 desktop using some PPA repositories I found online.

I was able to successfully upgrade from Ubuntu 11.04 to 11.10 beta, and  since the 11.10 final release was right around the corner I figured it was safe to go ahead and give it a try.  The upgrade went well, but I spent the next day fighting to try and get gnome-shell to play nicely with my Radeon card using the existing ATI drivers.

I ended up starting from scratch a few days later, by backing up some important files in my home directory and doing a clean install of 11.10 once the final version was released.

After doing an update and installing some other packages such as  ubuntu-restricted-extras, vlc, pidgin, etc installing gnome-shell was painless:

apt-get install gnome-shell

After rebooting, I logged in to find some of the same problems as before with this desktop install (screen tearing, blurry icons, multicolored menus, etc). I found some posts around the net that alluded to the fact that I might be able to solve some of my problems if I used the latest drivers (version 11.9) off the ATI website.

On the other hand, I found other posts by people claiming that even using the latest drivers had not completely solved all their problems and that ATI would be releasing version 11.10 sometime within the next 2 to 3 weeks, and that this new version would be specifically tested against Gnome 3.x (and fix the remaining bugs).

Anyway, I decided that I had nothing to lose at this point and decided to grab the latest version from the web:

mkdir ati-11.9; cd ati-11.9
wget http://www2.ati.com/drivers/linux/ati-driver-installer-11-9-x86.x86_64.run
sh ati-driver-installer-11-9-x86.x86_64.run --buildpkg Ubuntu/oneiric
dpkg -i fglrx*.deb
aticonfig --initial -f

After rebooting my machine again, I was pleasantly surprised to see that everything was looking good, no more problems with screen tearing and all my icons and menus were seemingly in order.

The only thing I needed to do now was to setup my multiple monitors correctly, since at that point I was staring at two cloned spaces instead of one large desktop spread across both my two 24″ monitors.

First I launched the Catalyst control panel:

gksu amdcccle

Under the ‘Display Manager’ page I had to select ‘Multi-display desktop with display’

***FOR EACH OF MY TWO MONITORS****

After a reboot I went into the Gnome ‘System Settings’ and choose ‘Displays’….I was finally able to uncheck ‘Mirror displays’ and hit ‘Apply’ without error.

The final two steps required for me to getting everything working %100 correctly was to install the gnome-tweak-tool:

apt-get install gnome-tweak-tool

and disable the ‘Have file manager handle the desktop’ option in the ‘Desktop’ section (that did away with the extra menu I was seeing).

The final step in the process involved installing a new theme…I really liked the Elementary them found here. So that is the one I choose….now everything is working as it should be!

12 Oct, 2011  |  Written by  |  under Linux, OpenVZ, Proxmox

Ever since we upgraded from Proxmox 1.8 to version 1.9 we have had users who have periodically complained about receiving out of memory errors when attempting to start or restart their java apps.

The following two threads contain a little bit more information about the problems people are seeing:

1)Proxmox mailing list thread
2)Openvz mailing list thread

At least one of the threads suggest you allocate a minimum of 2 cpu’s per VM in order to remedy the issue.  We already have 2 cpu’s per VM, so that was not a possible workaround for us.

Another suggestion made by one of the posters was to  revert back to using a previous version of the kernel, or downgrade Proxmox 1.9 to Proxmox 1.8 altogether.

I decided I would try to figure out a work around that did not involving downgrading software versions.

At first I tried to allocate additional memory to the VM’s and that seemed to resolve the issue for a short period of time, however after several days I once again started to hear about out of memory errors with Java.

After checking ‘/proc/user_beancounters’ on several of the VM’s,  I noticed that the failcnt numbers on the  ‘privvmpages’ parameter was increasing steadily over time.

The solution so far for us has been to increase the ‘privvmpages’ parameter (in my case I simply doubled it) to such a level that these errors are no longer incrementing the ‘failcnt’ counter.

If you would like to learn more about the various UBC parameters that can be modified inside openvz you can check out this link.

4 Oct, 2011  |  Written by  |  under Debian, Linux

After spending the last two weeks upgrading various versions of Debian to Squeeze, I figured I would post the details of how to upgrade each version, starting from Debian 3.1 to Debian 6.0.

The safest way to upgrade to Debian Squeeze is to upgrade from the prior version until you reach version 6.x.  In order words, if you are upgrading from Debian 4.x, need to upgrade to Debian 5.x and THEN to Debian 6.x.  Direct upgrades are not at all recommended.

Here are the steps that I took when I upgrading between various versions.

Sarge to Etch:

I was able to upgrade all of our Debian 3.1 machines to Debian 4.0 using the following commands.  I did not encounter any real surprises when I upgraded any of our physical of virtual machines.

You can upgrade using apt and the following commands:

apt-get update
apt-get dist-upgrade

Etch to Lenny:

The only real issue to note when upgrading from Debian 4.0 to 5.0, is that Lenny does not provide the drivers by default for any of the Broadcom network adapter drivers used by a majority of our Dell servers.  This caused some stress for me since I was doing the upgrades without physical access to the servers, so after I completed the upgrade to 5.0 and rebooted the server, of course I was not able to access the server because the NIC cards were no longer recognised by Debian.

In order to resolve this issue you will need to install the ‘firmware-bnx2‘ package after you do the upgrade but BEFORE you reboot the server.

The reason that the Debian team does not include these drivers by default is due to license restrictions placed on the firmware.  If you want to read more about this issue you can view the very short bug report here.

The best tool for upgrading to Debian 5 is aptitude:

aptitude update
aptitude install apt dpkg aptitude
aptitude full-upgrade

Lenny to Squeeze:

Upgrading Debian 5.o to 6.0 was also relatively painless as well.  One issue that I did run into revolved around the new version of udev and kernel versions prior to 2.6.26.  We had a few servers that were using kernel versions in the 2.6.18 range and if don’t upgrade the kernel version before you reboot, you may have issues with certain devices not being recognized or named correctly and thus you may have issues that prevent a successful bootup.

You can use the following apt commands to complete the upgrade process:

apt-get update
apt-get dist-upgrade -u

4 Oct, 2011  |  Written by  |  under Gluster, Linux, Redhat

Redhat released a statement today in which they announced their plans to acquire Gluster, the company behind the open source scalable filesystem GlusterFS.

Only time will tell exactly what this means for the project, community, etc, but based on the fact that Redhat has a fairly good track record with the open source community, and given the statements they made in their FAQ, I can only assume that we will continue to see GlusterFS grow and mature into a tool that extends reliably into the enterprise environment.

Gluster also provided several statements via their website today as well, you can read a statement from the founders here, as well as an additional Gluster press release here.

Recently one of our 3ware 9650SE raid cards started spitting out errors indicating that the unit was repeatedly issuing a bunch of soft resets. The lines in the log look similar to this:

WARNING: tw1: tw_aen_task AEN 0×0039 Buffer ECC error corrected address=0xDF420
WARNING: tw1: tw_aen_task AEN 0x005f Cache synchronization failed; some data lost unit=22
WARNING: tw1: tw_aen_task AEN 0×0001 Controller reset occurred resets=13

I downloaded and installed the latest firmware for the card (version 4.10.00.021), which the release notes claimed had several fixes for cards experiencing soft resets.  Much to my disappointment the resets continued to occur despite the new revised firmware.

The card was under warranty, so I contacted 3ware support and had a new one sent overnight.  The new card seemed to resolve the issues associated with random soft resets, however the resets and the downtime had left this node little out of sync with the other Gluster server.

After doing a ‘zfs replace’ on two bad disks (at this point I am still unsure whether the bad drives where a symptom or the cause of the issues with the raid card, however what I do know is that the Cavier Geen Western Digital drives that are populating this card have a very high error rate, and we are currently in the process of replacing all 24 drives with hitachi ones), I set about trying to initiate a ‘self-heal’ on the known up to date node using the following command:

server2:/zpool/glusterfs# ls -laR *

After some time I decided to tail the log file to see if there were any errors that might indicate a problem with the self heal. Once again the Gluster error log begun to fill up with errors associated with setting extended attributes on SUNWattr_ro.

At that point I began to worry whether or not the AFR (Automatic File Replication) portion of the Replicate/AFR translator was actually working correctly or not.  I started running some tests to determine what exactly was going on.  I began by copying over a few files to test replication.  All the files showed up on both nodes, so far so good.

Next it was time to test AFR so I began deleting a few files off one node and then attempting to self heal those same deleted files.  After a couple of minutes, I re-listed the files and the deleted files had in fact been restored. Despite the successful copy, the errors continued to show up every single time the file/directory was accessed (via stat).  It seemed that even though AFR was able to copy all the files to the new node correctly, gluster for some reason continued to want to self heal the files over and over again.

After finding the function that sets the extended attributes on Solaris, the following patch was created:

--- compat.c    Tue Aug 23 13:24:33 2011
+++ compat_new.c        Tue Aug 23 13:24:49 2011
@@ -193,7 +193,7 @@
 {
        int attrfd = -1;
        int ret = 0;
-
+
        attrfd = attropen (path, key, flags|O_CREAT|O_WRONLY, 0777);
        if (attrfd >= 0) {
                ftruncate (attrfd, 0);
@@ -200,13 +200,16 @@
                ret = write (attrfd, value, size);
                close (attrfd);
        } else {
-               if (errno != ENOENT)
-                       gf_log ("libglusterfs", GF_LOG_ERROR,
+               if(!strcmp(key,"SUNWattr_ro")&&!strcmp(key,"SUNWattr_rw")) {
+
+                       if (errno != ENOENT)
+                               gf_log ("libglusterfs", GF_LOG_ERROR,
                                "Couldn't set extended attribute for %s (%d)",
                                path, errno);
-               return -1;
+                       return -1;
+               }
+               return 0;
        }
-
        return 0;
 }

 

The patch simply ignores the two Solaris specific extended attributes (SUNWattr_ro and SUNWattr_rw), and returns a ’0′ to the posix layer instead of a ‘-1′ if either of these is encountered.

We’ve been running this code change on both Solaris nodes for several days and so far so good, the errors are gone and replicate and AFR both seem to be working very well.