4 Oct, 2011  |  Written by  |  under Debian, Linux

After spending the last two weeks upgrading various versions of Debian to Squeeze, I figured I would post the details of how to upgrade each version, starting from Debian 3.1 to Debian 6.0.

The safest way to upgrade to Debian Squeeze is to upgrade from the prior version until you reach version 6.x.  In order words, if you are upgrading from Debian 4.x, need to upgrade to Debian 5.x and THEN to Debian 6.x.  Direct upgrades are not at all recommended.

Here are the steps that I took when I upgrading between various versions.

Sarge to Etch:

I was able to upgrade all of our Debian 3.1 machines to Debian 4.0 using the following commands.  I did not encounter any real surprises when I upgraded any of our physical of virtual machines.

You can upgrade using apt and the following commands:

# apt-get update
# apt-get dist-upgrade

Etch to Lenny:

The only real issue to note when upgrading from Debian 4.0 to 5.0, is that Lenny does not provide the drivers by default for any of the Broadcom network adapter drivers used by a majority of our Dell servers.  This caused some stress for me since I was doing the upgrades without physical access to the servers, so after I completed the upgrade to 5.0 and rebooted the server, of course I was not able to access the server because the NIC cards were no longer recognised by Debian.

In order to resolve this issue you will need to install the ‘firmware-bnx2‘ package after you do the upgrade but BEFORE you reboot the server.

The reason that the Debian team does not include these drivers by default is due to license restrictions placed on the firmware.  If you want to read more about this issue you can view the very short bug report here.

The best tool for upgrading to Debian 5 is aptitude:

# aptitude update
# aptitude install apt dpkg aptitude
# aptitude full-upgrade

Lenny to Squeeze:

Upgrading Debian 5.o to 6.0 was also relatively painless as well.  One issue that I did run into revolved around the new version of udev and kernel versions prior to 2.6.26.  We had a few servers that were using kernel versions in the 2.6.18 range and if don’t upgrade the kernel version before you reboot, you may have issues with certain devices not being recognized or named correctly and thus you may have issues that prevent a successful bootup.

You can use the following apt commands to complete the upgrade process:

# apt-get update
# apt-get dist-upgrade -u
4 Oct, 2011  |  Written by  |  under Gluster, Linux, Redhat

Redhat released a statement today in which they announced their plans to acquire Gluster, the company behind the open source scalable filesystem GlusterFS.

Only time will tell exactly what this means for the project, community, etc, but based on the fact that Redhat has a fairly good track record with the open source community, and given the statements they made in their FAQ, I can only assume that we will continue to see GlusterFS grow and mature into a tool that extends reliably into the enterprise environment.

Gluster also provided several statements via their website today as well, you can read a statement from the founders here, as well as an additional Gluster press release here.

30 Sep, 2011  |  Written by  |  under Debian, Proxmox

Martin Maurer sent an email to the Proxmox-users mailing list this morning announcing that a version 2.0 beta ISO had been made available for download.

Here are some links that will provide further information on this latest release:

Roadmap and feature overview:
http://pve.proxmox.com/wiki/Roadmap#Roadmap_for_2.x

Preliminary 2.0 documentation:
http://pve.proxmox.com/wiki/Category:Proxmox_VE_2.0

Community tools (Bugzilla, Git, etc):
http://www.proxmox.com/products/proxmox-ve/get-involved

Proxmox VE 2.0 beta forum:
http://forum.proxmox.com/forums/16-Proxmox-VE-2.0-beta

Downloads:
http://www.proxmox.com/downloads/proxmox-ve/17-iso-images

I have not had a chance to install a test node using this latest 2.0 beta codebase, however I expect to have a two node cluster up and running in the next week or so, and after I do I will will follow up with another blog post detailing my thoughts.

Thanks again to Martin and Dietmar for all their hard work so far on this great open source project!

27 Aug, 2011  |  Written by  |  under Opensolaris, Zfs

After successfully completing a ‘zfs replace’ I was not so pleased to get the following error message back from ‘zfs detach’:

cannot detach c5t17d0: no valid replicas

I decided that I would upgrade this OpenSolaris 2008.11 instance to OpenSolaris 2009.06 in order to see if the obvious bug that I was encountering was resolved in the newest version. Since upgrading in OpenSolaris supports automatic boot environment creation, there really is not much danger at all in updating because you can always boot back into the other environment at any time.

The upgrade was a success, and after I booted into 2009.06 I was able to simply detach the failed drive from the pool and thus remove it from the system.

I recompiled gluster and I ran 2009.06 for a couple of days, until I started noticing that the server was rebooting during times of high I/O. A peek inside ‘/var/adm/messages’ revealed the following errors:

Aug 15 22:33:04 cybertron unix: [ID 836849 kern.notice]
Aug 15 22:33:04 cybertron ^Mpanic[cpu0]/thread=ffffff091060c900:
Aug 15 22:33:04 cybertron genunix: [ID 783603 kern.notice] Deadlock: cycle in blocking chain
Aug 15 22:33:04 cybertron unix: [ID 100000 kern.notice]
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9651f0 genunix:turnstile_block+795 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965250 unix:mutex_vector_enter+261 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9652f0 zfs:zfs_zget+be ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965380 zfs:zfs_zaccess+7c ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965400 zfs:zfs_lookup+333 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9654a0 genunix:fop_lookup+ed ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965550 genunix:xattr_dir_realdir+8b ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9655a0 genunix:xattr_dir_realvp+5e ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9655f0 genunix:fop_realvp+32 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965640 genunix:vn_compare+31 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965860 genunix:lookuppnvp+94c ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965900 genunix:lookuppnatcred+11b ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965990 genunix:lookuppnat+69 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965b30 genunix:vn_createat+13a ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965cf0 genunix:vn_openat+1fb ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965e50 genunix:copen+435 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965e80 genunix:openat64+25 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965ec0 genunix:fsat32+f5 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965f10 unix:brand_sys_sysenter+1e0 ()
Aug 15 22:33:04 cybertron unix: [ID 100000 kern.notice]
Aug 15 22:33:04 cybertron genunix: [ID 672855 kern.notice] syncing file systems…
Aug 15 22:33:04 cybertron genunix: [ID 904073 kern.notice] done

My efforts to find any further detailes about this bug are ongoing, so at this point I have booted back into 2008.11 and I will be running that until a fix or a workaround is found.

Recently one of our 3ware 9650SE raid cards started spitting out errors indicating that the unit was repeatedly issuing a bunch of soft resets. The lines in the log look similar to this:

WARNING: tw1: tw_aen_task AEN 0×0039 Buffer ECC error corrected address=0xDF420
WARNING: tw1: tw_aen_task AEN 0x005f Cache synchronization failed; some data lost unit=22
WARNING: tw1: tw_aen_task AEN 0×0001 Controller reset occurred resets=13

I downloaded and installed the latest firmware for the card (version 4.10.00.021), which the release notes claimed had several fixes for cards experiencing soft resets.  Much to my disappointment the resets continued to occur despite the new revised firmware.

The card was under warranty, so I contacted 3ware support and had a new one sent overnight.  The new card seemed to resolve the issues associated with random soft resets, however the resets and the downtime had left this node little out of sync with the other Gluster server.

After doing a ‘zfs replace’ on two bad disks (at this point I am still unsure whether the bad drives where a symptom or the cause of the issues with the raid card, however what I do know is that the Cavier Geen Western Digital drives that are populating this card have a very high error rate, and we are currently in the process of replacing all 24 drives with hitachi ones), I set about trying to initiate a ‘self-heal’ on the known up to date node using the following command:

server2:/zpool/glusterfs# ls -laR *

After some time I decided to tail the log file to see if there were any errors that might indicate a problem with the self heal. Once again the Gluster error log begun to fill up with errors associated with setting extended attributes on SUNWattr_ro.

At that point I began to worry whether or not the AFR (Automatic File Replication) portion of the Replicate/AFR translator was actually working correctly or not.  I started running some tests to determine what exactly was going on.  I began by copying over a few files to test replication.  All the files showed up on both nodes, so far so good.

Next it was time to test AFR so I began deleting a few files off one node and then attempting to self heal those same deleted files.  After a couple of minutes, I re-listed the files and the deleted files had in fact been restored. Despite the successful copy, the errors continued to show up every single time the file/directory was accessed (via stat).  It seemed that even though AFR was able to copy all the files to the new node correctly, gluster for some reason continued to want to self heal the files over and over again.

After finding the function that sets the extended attributes on Solaris, the following patch was created:

— compat.c Tue Aug 23 13:24:33 2011
+++ compat_new.c Tue Aug 23 13:24:49 2011
@@ -193,7 +193,7 @@
{
int attrfd = -1;
int ret = 0;
-
+
attrfd = attropen (path, key, flags|O_CREAT|O_WRONLY, 0777);
if (attrfd >= 0) {
ftruncate (attrfd, 0);
@@ -200,13 +200,16 @@
ret = write (attrfd, value, size);
close (attrfd);
} else {
- if (errno != ENOENT)
- gf_log (“libglusterfs”, GF_LOG_ERROR,
+ if(!strcmp(key,”SUNWattr_ro”)&&!strcmp(key,”SUNWattr_rw”)) {
+
+ if (errno != ENOENT)
+ gf_log (“libglusterfs”, GF_LOG_ERROR,
“Couldn’t set extended attribute for %s (%d)”,
path, errno);
- return -1;
+ return -1;
+ }
+ return 0;
}
-
return 0;
}

The patch simply ignores the two Solaris specific extended attributes (SUNWattr_ro and SUNWattr_rw), and returns a ’0′ to the posix layer instead of a ‘-1′ if either of these is encountered.

We’ve been running this code change on both Solaris nodes for several days and so far so good, the errors are gone and replicate and AFR both seem to be working very well.

30 Jul, 2011  |  Written by  |  under Netflix, NPR

Former Director of Application Development at NPR and current Director of Engineering for the Netflix API Daniel Jacobson gave a presentation at this years OSCON,  entitled ‘Redesigning the Netflix API’ and the slides have been posted here, for anyone who is interested in learning more about their API efforts.

Adrian Cockcroft also gave a talk at this years OSCON entitled ‘Data Flow at Netflix’.   The video for that presentation can be found here on the OSCON Youtube channel.

27 Jul, 2011  |  Written by  |  under Linux, Opensource

Here is a link to the OSCON YouTube channel for 2011 video presentations.  There are still two days of conference activities left, but there are currently 25 videos available to watch today.

25 Jul, 2011  |  Written by  |  under Opensource

Here is a link to the section of OSCON site that provides access to the 2011 live streaming keynote videos.  They have also provided a nice schedule that lists the various speakers, their affiliations and  topics that they will be discussing over the course of the entire week.

At some point in the future when the conference is over, I will hopefully be able to provide links to other interesting presentations given at this years conference as well.

27 Jun, 2011  |  Written by  |  under BTRFS, Linux

Here is a link to a nice video presentation that was given by Douglas Fuller at LUG 2011. The 25 minute video provides an updated overview of the btrfs feature set, areas in which btrfs still needs some further development, and then goes on to provide some detailed discussion on various benchmarks Doug was able to gather during his time spent with the file system.

The second half of the video features Johann Lombardi going into even further detail about btfrs internals.

Overall a very good talk for anyone who is new to btrfs and wants to get an overview of the features, or anyone who is simply is looking for some specifics in terms of btrfs stability and performance.

Once you have your OpenStack cluster up and running you will need to either find some pre-created image templates or you may decide that you want to roll your own.  I’ll leave the details of creating images from scratch for a different post, this post will focus on providing links to both image files and instructions for installing pre-created Linux templates on OpenStack infrastructure.

First, if you are looking to install any version of Ubuntu, you should visit

http://uec-images.ubuntu.com/releases/

and download the file that corresponds to your desired version and architecture.

Once you have that file, you can follow the instructions here.

If you are looking to install a version of Debian, CentOS or Fedora, you should visit

http://open.eucalyptus.com/wiki/EucalyptusUserImageCreatorGuide_v1.6,

and download one of pre-created images that the folks over at Eucalyptus have provided.

Once you have are ready to install one of those files, you can follow the instructions here.