Using strace to debug issues with apache

Today I had to track down the cause of an issue we were having with a server where shortly after restarting the server, requests would start to hang, and the number of Apache processes seemed to be growing rather large, rather quickly.

I started out using Apache’s mod_status to get some details about the state of each process.

I noticed that many of the processes ended up  in a ‘”W”  or “Sending Reply” state.  I choose a random Apache process and fired up ‘strace’ to try to get some more information:

server7:/root# strace -p 11574
Process 11574 attached – interrupt to quit
flock(26, LOCK_EX <unfinished …>

This process was stuck waiting for an exclusive lock on some file.  I used ‘readlink’ to find out the name of the file in question:

server7:/root# readlink /proc/11574/fd/26
/mnt/Pages/xml/0/1/list1055.xml

Once I had the name of the file I used ‘lsof’ to see if there were any other processes trying to access that file as well:

server7:/root#lsof |grep list1055.xml
httpd 11574 nobody 26w REG 0,31 4232 925874559 /mnt/Pages/xml/0/1/list1055.xml (storage1.npr.org:/files/data)
httpd 11579 nobody 26w REG 0,31 4232 925874559 /mnt/Pages/xml/0/1/list1055.xml (storage1.npr.org:/files/data)
httpd 11629 nobody 26w REG 0,31 4232 925874559 /mnt/Pages/xml/0/1/list1055.xml (storage1.npr.org:/files/data)

Here we have several other process waiting for an exclusive lock on the file as well.

At this point it appears as though a recent code change maybe the cause of this issue…however a closer look at the recent source code commits will be required to know for sure.

Leave a Reply

Your email address will not be published. Required fields are marked *