Category Archives: Debian

All things Debian.

Syslog broken after upgrade to Debian Wheezy

We have run into an issue on several of of our Debian servers after upgradeing to Wheezy. The issue is one that tends to go unnoticed for a while, until you are looking through the files in ‘/var/log’ and notice that none of the files have updated entries since the date of the last upgrade.

The solution to the issue is to install the following package:

‘apt-get install inetutils-syslogd’

After installing this missing package, syslogd should once again be running, and you should start to see new entries show up in your messages, syslog, etc files.

Openmanage 7.3 on Proxmox 3.0 (Debian Wheezy)

I ran in to a few issues while trying to install Dell Openmanage on the latest version of Proxmox (3.0).

In order to get things working correctly on Proxmox 3.x, here are the steps that are required:

#echo “deb http://linux.dell.com/repo/community/ubuntu wheezy openmanage” > /etc/apt/sources.list.d/linux.dell.com.sources.list
#gpg –keyserver pool.sks-keyservers.net –recv-key 1285491434D8786F
#gpg -a –export 1285491434D8786F | sudo apt-key add –
#sudo apt-get update
#apt-get install libcurl3
#sudo apt-get install srvadmin-all
#sudo service dataeng start
#sudo service dsm_om_connsvc start

Once you get everything installed correctly you will be able to log in to the Openmanage web interface here:

https://<hostname or ip address>:1311

The first time you log in you should use the ‘root’ username and associated password.

Using strace to debug issues with apache

Today I had to track down the cause of an issue we were having with a server where shortly after restarting the server, requests would start to hang, and the number of Apache processes seemed to be growing rather large, rather quickly.

I started out using Apache’s mod_status to get some details about the state of each process.

I noticed that many of the processes ended up  in a ‘”W”  or “Sending Reply” state.  I choose a random Apache process and fired up ‘strace’ to try to get some more information:

server7:/root# strace -p 11574
Process 11574 attached – interrupt to quit
flock(26, LOCK_EX <unfinished …>

This process was stuck waiting for an exclusive lock on some file.  I used ‘readlink’ to find out the name of the file in question:

server7:/root# readlink /proc/11574/fd/26
/mnt/Pages/xml/0/1/list1055.xml

Once I had the name of the file I used ‘lsof’ to see if there were any other processes trying to access that file as well:

server7:/root#lsof |grep list1055.xml
httpd 11574 nobody 26w REG 0,31 4232 925874559 /mnt/Pages/xml/0/1/list1055.xml (storage1.npr.org:/files/data)
httpd 11579 nobody 26w REG 0,31 4232 925874559 /mnt/Pages/xml/0/1/list1055.xml (storage1.npr.org:/files/data)
httpd 11629 nobody 26w REG 0,31 4232 925874559 /mnt/Pages/xml/0/1/list1055.xml (storage1.npr.org:/files/data)

Here we have several other process waiting for an exclusive lock on the file as well.

At this point it appears as though a recent code change maybe the cause of this issue…however a closer look at the recent source code commits will be required to know for sure.

What’s new in GlusterFS 3.3?

Here is a link to a talk given by John Mark Walker at this year’s LinuxCon Japan, in which he discusses some of the internal details of the Gluster 3.3 release.

A few of the new features discussed during the presentation are:

  • UFO (universal file and object storage)
  • HDFS compatibility
  • Proactive self-heal
  • Granular locking
  • Quorum enforcement (for resolving split-brain scenarios)

Mdadm cheat sheet

I have spent some time over the last few weeks getting familiar with mdadm and software RAID on Linux, so I thought I would write down some of the commands and example syntax that I have used while getting started.

1)If we would like to create a new RAID array from scratch we can use the following example commands:

RAID1-with 2 Drives:

# mdadm –create –verbose /dev/md0 –level=1 /dev/sda1 /dev/sdb1

RAID5-with 5 Drives:

# mdadm –create –verbose /dev/md0 –level=5 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

RAID6-with 4 Drives with 1 spare:

# mdadm –create –verbose /dev/md0 –level=6 –raid-devices=4 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

2)If we would like to add a disk to an existing array:

# mdadm –add /dev/md0 /dev/sdf1 (only added as a spare)
# mdadm –grow /dev/md0 -n [new number of active disks – spares] (grow the size of the array)

3)If we would like to remove a disk from an existing array:

First we need to ‘fail’ the drive:

# mdadm –fail /dev/md0 /dev/sdc1

Next it can be safely removed from the array:

# mdadm –remove /dev/md0 /dev/sdc1

4)In order to make the array survive a reboot, you need to add the details to ‘/etc/mdadm/mdadm.conf’

# mdadm –detail –scan >> /etc/mdadm/mdadm.conf (Debian)
# mdadm –detail –scan >> /etc/mdadm.conf (Everyone else)

5)In order to delete and remove the entire array:

First we need to ‘stop’ the array:

# mdadm –stop /dev/md0

Next it can be removed:

# mdadm –remove /dev/md0

6)Examining the status of your RAID array:

There are two options here:

# cat /proc/mdstat
or
# mdadm –detail /dev/md0

Access files underneath an already mounted partition in Linux

Here is a quick tip for anyone who needs to access files that exists underneath an already mounted filesystem mount point.  For example suppose that you have some files located in a directory called ‘/tmp/docs’.

At some point someone might decide to accidentally take that same directory, and create an NFS or CIFS mount, if you need to access the original files that existed before the new mount point was put into place, you have two options.

  1. Unmount the NFS or CIFS filesystem and access your files and then remount.
  2. However, you may find yourself in a situation (such as I did), where it is extremely inconvenient or impossible for you have the downtime associated with the umount/remount process. In that case you have another option…you can use a ‘bind’ mount.

All you need to do is something like the following:

mount --bind /tmp /tmp/new_location

Now you should be able to access the original files here:

‘/tmp/new_location/docs’

Upgrading Debian

After spending the last two weeks upgrading various versions of Debian to Squeeze, I figured I would post the details of how to upgrade each version, starting from Debian 3.1 to Debian 6.0.

The safest way to upgrade to Debian Squeeze is to upgrade from the prior version until you reach version 6.x.  In order words, if you are upgrading from Debian 4.x, need to upgrade to Debian 5.x and THEN to Debian 6.x.  Direct upgrades are not at all recommended.

Here are the steps that I took when I upgrading between various versions.

Sarge to Etch:

I was able to upgrade all of our Debian 3.1 machines to Debian 4.0 using the following commands.  I did not encounter any real surprises when I upgraded any of our physical of virtual machines.

You can upgrade using apt and the following commands:

# apt-get update
# apt-get dist-upgrade

Etch to Lenny:

The only real issue to note when upgrading from Debian 4.0 to 5.0, is that Lenny does not provide the drivers by default for any of the Broadcom network adapter drivers used by a majority of our Dell servers.  This caused some stress for me since I was doing the upgrades without physical access to the servers, so after I completed the upgrade to 5.0 and rebooted the server, of course I was not able to access the server because the NIC cards were no longer recognised by Debian.

In order to resolve this issue you will need to install the ‘firmware-bnx2‘ package after you do the upgrade but BEFORE you reboot the server.

The reason that the Debian team does not include these drivers by default is due to license restrictions placed on the firmware.  If you want to read more about this issue you can view the very short bug report here.

The best tool for upgrading to Debian 5 is aptitude:

# aptitude update
# aptitude install apt dpkg aptitude
# aptitude full-upgrade

Lenny to Squeeze:

Upgrading Debian 5.o to 6.0 was also relatively painless as well.  One issue that I did run into revolved around the new version of udev and kernel versions prior to 2.6.26.  We had a few servers that were using kernel versions in the 2.6.18 range and if don’t upgrade the kernel version before you reboot, you may have issues with certain devices not being recognized or named correctly and thus you may have issues that prevent a successful bootup.

You can use the following apt commands to complete the upgrade process:

# apt-get update
# apt-get dist-upgrade -u

Here are the repo’s that used while doing the upgrades:

#Debian Etch-4deb http://archive.debian.org/debian/ etch main non-free contrib
deb-src http://archive.debian.org/debian/ etch main non-free contrib

deb http://archive.debian.org/debian-security/ etch/updates main non-free contrib
deb-src http://archive.debian.org/debian-security/ etch/updates main non-free contrib

# Debian Lenny-5
deb http://archive.debian.org/debian/ lenny main contrib non-free
deb-src http://archive.debian.org/debian/ lenny main contrib non-free

deb http://archive.debian.org/debian-security lenny/updates main contrib non-free
deb-src http://archive.debian.org/debian-security lenny/updates main contrib non-free

deb http://archive.debian.org/debian-volatile lenny/volatile main contrib non-free
deb-src http://archive.debian.org/debian-volatile lenny/volatile main contrib non-free

# Debian Squeeze-6
deb http://ftp.us.debian.org/debian squeeze main contrib non-free

deb http://ftp.debian.org/debian/ squeeze-updates main contrib non-free
deb http://security.debian.org/ squeeze/updates main contrib non-free

Proxmox 2.0 beta released

Martin Maurer sent an email to the Proxmox-users mailing list this morning announcing that a version 2.0 beta ISO had been made available for download.

Here are some links that will provide further information on this latest release:

Roadmap and feature overview:
http://pve.proxmox.com/wiki/Roadmap#Roadmap_for_2.x

Preliminary 2.0 documentation:
http://pve.proxmox.com/wiki/Category:Proxmox_VE_2.0

Community tools (Bugzilla, Git, etc):
http://www.proxmox.com/products/proxmox-ve/get-involved

Proxmox VE 2.0 beta forum:
http://forum.proxmox.com/forums/16-Proxmox-VE-2.0-beta

Downloads:
http://www.proxmox.com/downloads/proxmox-ve/17-iso-images

I have not had a chance to install a test node using this latest 2.0 beta codebase, however I expect to have a two node cluster up and running in the next week or so, and after I do I will will follow up with another blog post detailing my thoughts.

Thanks again to Martin and Dietmar for all their hard work so far on this great open source project!

SUNWattr_ro error:Permission denied on OpenSolaris using Gluster 3.0.5–PartII

Recently one of our 3ware 9650SE raid cards started spitting out errors indicating that the unit was repeatedly issuing a bunch of soft resets. The lines in the log look similar to this:

WARNING: tw1: tw_aen_task AEN 0x0039 Buffer ECC error corrected address=0xDF420
WARNING: tw1: tw_aen_task AEN 0x005f Cache synchronization failed; some data lost unit=22
WARNING: tw1: tw_aen_task AEN 0x0001 Controller reset occurred resets=13

I downloaded and installed the latest firmware for the card (version 4.10.00.021), which the release notes claimed had several fixes for cards experiencing soft resets.  Much to my disappointment the resets continued to occur despite the new revised firmware.

The card was under warranty, so I contacted 3ware support and had a new one sent overnight.  The new card seemed to resolve the issues associated with random soft resets, however the resets and the downtime had left this node little out of sync with the other Gluster server.

After doing a ‘zfs replace’ on two bad disks (at this point I am still unsure whether the bad drives where a symptom or the cause of the issues with the raid card, however what I do know is that the Cavier Geen Western Digital drives that are populating this card have a very high error rate, and we are currently in the process of replacing all 24 drives with hitachi ones), I set about trying to initiate a ‘self-heal’ on the known up to date node using the following command:

server2:/zpool/glusterfs# ls -laR *

After some time I decided to tail the log file to see if there were any errors that might indicate a problem with the self heal. Once again the Gluster error log begun to fill up with errors associated with setting extended attributes on SUNWattr_ro.

At that point I began to worry whether or not the AFR (Automatic File Replication) portion of the Replicate/AFR translator was actually working correctly or not.  I started running some tests to determine what exactly was going on.  I began by copying over a few files to test replication.  All the files showed up on both nodes, so far so good.

Next it was time to test AFR so I began deleting a few files off one node and then attempting to self heal those same deleted files.  After a couple of minutes, I re-listed the files and the deleted files had in fact been restored. Despite the successful copy, the errors continued to show up every single time the file/directory was accessed (via stat).  It seemed that even though AFR was able to copy all the files to the new node correctly, gluster for some reason continued to want to self heal the files over and over again.

After finding the function that sets the extended attributes on Solaris, the following patch was created:

— compat.c Tue Aug 23 13:24:33 2011
+++ compat_new.c Tue Aug 23 13:24:49 2011
@@ -193,7 +193,7 @@
{
int attrfd = -1;
int ret = 0;

+
attrfd = attropen (path, key, flags|O_CREAT|O_WRONLY, 0777);
if (attrfd >= 0) {
ftruncate (attrfd, 0);
@@ -200,13 +200,16 @@
ret = write (attrfd, value, size);
close (attrfd);
} else {
– if (errno != ENOENT)
– gf_log (“libglusterfs”, GF_LOG_ERROR,
+ if(!strcmp(key,”SUNWattr_ro”)&&!strcmp(key,”SUNWattr_rw”)) {
+
+ if (errno != ENOENT)
+ gf_log (“libglusterfs”, GF_LOG_ERROR,
“Couldn’t set extended attribute for %s (%d)”,
path, errno);
– return -1;
+ return -1;
+ }
+ return 0;
}

return 0;
}

The patch simply ignores the two Solaris specific extended attributes (SUNWattr_ro and SUNWattr_rw), and returns a ‘0’ to the posix layer instead of a ‘-1’ if either of these is encountered.

We’ve been running this code change on both Solaris nodes for several days and so far so good, the errors are gone and replicate and AFR both seem to be working very well.

Finding and installing pre-created images on OpenStack

Once you have your OpenStack cluster up and running you will need to either find some pre-created image templates or you may decide that you want to roll your own.  I’ll leave the details of creating images from scratch for a different post, this post will focus on providing links to both image files and instructions for installing pre-created Linux templates on OpenStack infrastructure.

First, if you are looking to install any version of Ubuntu, you should visit

http://uec-images.ubuntu.com/releases/

and download the file that corresponds to your desired version and architecture.

Once you have that file, you can follow the instructions here.

If you are looking to install a version of Debian, CentOS or Fedora, you should visit

http://open.eucalyptus.com/wiki/EucalyptusUserImageCreatorGuide_v1.6,

and download one of pre-created images that the folks over at Eucalyptus have provided.

Once you have are ready to install one of those files, you can follow the instructions here.