Ted Neykov, currently a hacker at Rackspace, pointed me in the direction of the new Open Stack Operations Guide. I have only had a chance to browse the .pdf at this point, however I believe this will end up being a very informative and useful book for me going forward.
Taken from the guide’s summery:
‘This book offers hard-earned experience from OpenStack operators who have run OpenStack in production for six months or longer. They’ve gathered their notes, shared their stories, and learned from each other in the room. We invite you to join in the quest for best practices in OpenStack cloud operations’
Here is a quick video that was released along with the guide, that briefly describes the process they used during it’s creation:
Today I had to track down the cause of an issue we were having with a server where shortly after restarting the server, requests would start to hang, and the number of Apache processes seemed to be growing rather large, rather quickly.
I started out using Apache’s mod_status to get some details about the state of each process.
I noticed that many of the processes ended up in a ‘”
W” or “Sending Reply” state. I choose a random Apache process and fired up ‘strace’ to try to get some more information:
server7:/root# strace -p 11574
Process 11574 attached -- interrupt to quit
flock(26, LOCK_EX <unfinished …>
This process was stuck waiting for an exclusive lock on some file. I used ‘readlink’ to find out the name of the file in question:
server7:/root# readlink /proc/11574/fd/26
Once I had the name of the file I used ‘lsof’ to see if there were any other processes trying to access that file as well:
server7:/root#lsof |grep list1055.xml
httpd 11574 nobody 26w REG 0,31 4232 925874559 /mnt/Pages/xml/0/1/list1055.xml (storage1.npr.org:/files/data)
httpd 11579 nobody 26w REG 0,31 4232 925874559 /mnt/Pages/xml/0/1/list1055.xml (storage1.npr.org:/files/data)
httpd 11629 nobody 26w REG 0,31 4232 925874559 /mnt/Pages/xml/0/1/list1055.xml (storage1.npr.org:/files/data)
Here we have several other process waiting for an exclusive lock on the file as well.
At this point it appears as though a recent code change maybe the cause of this issue…however a closer look at the recent source code commits will be required to know for sure.
On the heels of my previous post on StackOps and OpenStack, I thought I would quickly share two of the most valuable links that I came across in my search for good ‘getting started’ documentation.
First, this link provides an excellent architectural overview of OpenStack, which can be quite confusing initially, if you are not a regular user of Amazon EC2 type cloud services.
Secondly, CSS Corp’s beginner’s guide provides an almost invaluable resource to anyone who is getting started and wants access to very easy to read and well written documentation on the subject.
Ok I just found this video of Chris Mason giving a talk on Btrfs at Linuxcon 2010. It appears to be very similar to the webcast I linked to a few days ago, hosted on Oracle.com. This video however is hosted on linuxfoundation.org and there is no registration required which is nice.
UPDATE: If you are interested in ZFS on linux you have two options at this point:
I have been actively following the zfsonlinux project because once stable and ready it should offer surperior performance due to the extra overhead that would be incurred by using fuse with the zfs-fuse project.
You can see another one of my posts concerning zfsonlinux here.
KQ Infotech has released (currently in closed beta) code that brings ZFS to Linux via a loadable kernel module.
Here is a link to the current and future feature set. The reason that this is exciting is that although other ZFS implementations for Linux have traditionally existed, each of the available options have significant drawbacks. For example ZFS-FUSE is implemented in userspace using FUSE, which has additional overhead due to the context switching that is required while switching back and forth between kernel-space and user -space. .
Another option is ZFS on Linux which provides a stable SPA, DMU and ZVOL layer, but does not however provide a Posix layer (ZPL) that would enable you to actually mount a ZFS filesystem from inside Linux. From what I understand, KQ Infotech has basically taken some of the ZFS on Linux code that was developed by the Lawrence Livermore National Laboratory (LLNL), and actually implemented the missing ZPL layer.
NPR was recently accepted into the closed beta program, and I took some time last week to get this module installed on a Dell Poweredge 2950 running a 64 bit version of Ubuntu 10.04. We are currently testing ZFS under kernel version 2.6.32-24. I have not had a ton of time to test things out, but I would say so far so good. I plan on posting some ZFS and Btrfs benchmarks in the next few weeks after I get some time to better test performance, throughput, etc.
While doing research into poor write performance with Oracle I discovered that the server was using the LSI SAS1068E. We had a RAID1 setup with 300GB 10K RPM SAS drives. Google provided some possible insight into why we the write performance was so bad(1 2). The main problem with this card is that there is no battery backed write cache. This means that the write-cache is disabled by default. I was able to turn on the write cache using the LSI utility.
This change however did not seem to any difference on performance. At this point I came to the conclusion that the card itself is the blame. I believe that this is an inexpensive RAID card that is good for general use of RAID0 and Raid1, however for anything were write throughput is important, it might be better the spring for a something a little bit more expensive.
When it was all said and done we ended up replacing all the these LSI cards with Dell Perc 6i cards. These cards did come battery backed…which allowed us to then enable the write cache, needless to say the performance improved significantly.
Welcome to shainmiley.com. I plan to use this blog to discuss some of the technological issues that I encounter on a day to day basis. Topics will include Linux, scaling infrastructure, cloud computing, Mysql, open source, storage etc.