Category Archives: Linux

All things Linux.

GlusterFS and Extended Attributes

Jeff Darcy over at CloudFS.org has posted a very informative and in depth writeup detailing GlusterFS and it’s use of Extended Filesystem Attributes.  One of the areas in which I believe that many Gluster community members would like to see some improvement in, is the area of documentation.

You do not need to spend too much time on the Gluster-users mailing list in order to find frequent calls for more in depth and up to date Gluster documentation.

There is no doubt that the situation has improved greatly over the last 6 to 12 months, however room for improvement still exists, and this is an example of someone outside the inner circle at Gluster providing such a document.

For more information on CloudFS and how it relates to GlusterFS, you can check out these two postings: Why? and What?

Slow backup on Proxmox using vzdump

I was recently attempting to backup one of our Proxmox VE’s using OpenVZ’s backup tool ‘vzdump’. In the past when using vzdump, a complete backup of a 100GB VE, for example could be obtained in under and hour or so. This time however, after leaving the process running and returning several hours later, the .tar file was a mere 2.3GB in size.

At first I thought that there might be an issue with one or more nodes in the shared storage cluster, so I decided I would direct vzdump to store the .tar file on one of the server’s local partitions instead. Once again I started the backup, returned several hours later, only to find a file similar in size to the previous one.

Next I decided I would attempt to ‘tar up’ the contents of the VE up manually, that combined with the ‘nohup’ command would allow me to find out at what point this whole process was stalling.

As it turns out, I had thousands of files in my ‘/var/spool/postfix/incoming/’ directory on that VE, and although almost every single file in that directory was small, and the overall directly size was not large at all, the result was that file operations inside that folder had come to a screeching halt.

Luckily for me, I knew for a fact that we did not need any of these particular email messages, so I was simply able to delete the ‘incoming’ folder and then recreate it once all the files had been removed, after that, vzdump was once again functioning as expected.

Connecting to Mssql database servers using PHP on Linux

I recently had the pleasure(!) of trying to get PHP on Debian working correctly with a Microsoft SQL server so that the data could be migrated from a Mssql instance into a Mysql one.

Previous to this attempt, the developers were using a Windows machine as a ‘broker’ between the two database. This setup was much too slow for importing and exporting large amounts of data,  so we decided to cut out the middle man (the Windows machine) and do all the processing on a single server.

First I needed to install a few prerequisite packages:

apt-get install unixodbc-dev
apt-get install libmysqlclient15-dev

Next we need to download and uncompress the FreeTDS source code:

wget ftp://ftp.linuxforum.hu/mirrors/frugalware/pub/frugalware/frugalware-testing/source/lib-extra/freetds/freetds-0.82.tar.gz

Next we use configure and install FreeTDS with the following options:

./configure --enable-msdblib --prefix=/usr/local/freetds --with-tdsver=7.0 --with-unixodbc=/usr
make
make install

Next we need to download and uncompress the PHP source code:

wget http://us.php.net/get/php-5.3.6.tar.bz2/from/www.php.net/mirror

Next we use configure and install PHP with the following options:

./configure  --with-mssql=/usr/local/freetds --with-mysql --with-mysqli
make
make install

Lastly we will need to create and install the mssql module for PHP:

cd ext/mssql
phpize
./configure --with-mssql=/usr/local/freetds
make
make install

Now you should be able to connect to any Microsoft SQL (and Mysql) server from PHP using the functions found here.

MP3 and H.264 playback with chromium

Recently I noticed that I was unable to play certain types of audio and video files directly from within Chromium,  I am using Chromium version 10.0.648.133 (77742) on Ubuntu 10.10.  It seems that due to the various licensing issues surrounding the codecs required to playback some of these media types, they are not supported without installing some extra packages.

In order to get MP3 playback support up and running you will need to install the necessary software package using the following command:

 apt-get install chromium-codecs-ffmpeg-extra

After a quick browser restart, you should be able to enjoy MP3 playback on Chromium.

There is a similar process required in order to get MP4 and H.264 playback enabled, however this time you will need to install the following instead:

 apt-get install chromium-codecs-ffmpeg-nonfree

Once again after a quick browser restart, you should be able to enjoy MP4 and H.264 playback on Chromium.

Proxmox 2.0 feature list

Martin Maurer sent an email to the Proxmox users mailing list detailing some of the features that we can expect from the next iteration of Proxmox VE. Martin expects that the first public beta release of the 2.x branch will be ready for use sometime around the second quarter of this year.

Here are some of the highlights currently slated for this release:

  • Complete new GUI
    • based on Ext JS 4 JavaScript framework
    • fast search-driven interface, capable of handling hundreds and probably thousands of VM´s
    • secure VNC console, supporting external VNC viewer with SSL support
    • role based permission management for all objects (VM´s, storages, nodes, etc.)
    • Support for multiple authenication sources (e.g. local, MS ADS, LDAP, …)
  • Based on Debian 6.0 Squeeze
    • longterm 2.6.32 Kernel with KVM and OpenVZ as default
    • second kernel branch with 2.6.x, KVM only
  • New cluster communication based on corosync, including:
    • Proxmox Cluster file system (pmcfs): Database-driven file system for storing configuration files, replicated in realtime on all nodes using corosync
    • creates multi-master clusters (no single master anymore!)
    • cluster-wide logging
    • basis for HA setup´s with KVM guests
  • RESTful web API
    • Ressource Oriented Architecture (ROA)
    • declarative API definition using JSON Schema
    • enable easy integration for third party management tools
  • Planned technology previews (CLI only)
    • spice protocol (remote display system for virtualized desktops)
    • sheepdog (distributed storage system)
  • Commitment to Free Software (FOSS): public code repository and bug tracker for the 2.x code base
    • Topics for future releases
      • Better resource monitoring
      • IO limits for VM´s
      • Extend pre-built Virtual Appliances downloads, including KVM appliances

    Recursive search and copy while keeping the directory structure intact.

    I recently needed to write a script that would search for a certain pattern in a file name and then copy that file from one directory to another.  If you use the ‘find’ command with the standard parameters you will end up with all the files matching the pattern, being placed into a single folder.

     In this case I needed the find command to maintain the directory structure (and create the folders if necessary) once a file matching the pattern was found.

    The key to making this happen was to use the ‘–parent’ flag with find.  Here is an example of the command I ended up using:

     find . -wholename "*search/pattern*" -exec cp -p --parent '{}' /new/folder/ ';'

    Native Linux ZFS kernel module and stability.

    UPDATE: If you are interested in ZFS on linux you have two options at this point:

    I have been actively following the  zfsonlinux project because once stable and ready it should offer surperior performance due to the extra overhead that would be incurred by using fuse with the zfs-fuse project.

    You can see another one of my posts concerning zfsonlinux here.

    ————————————————————————————————————————————————————-

    There was a question posted in response to my previous blog post found here, about the stability of the native Linux ZFS kernel module release. I thought I would just make a post out of my response:

    So far I have been able to perform some limited testing (given that the GA code was just released earlier this week), some time ago I had been given access to the beta builds,  so I had done some initial testing using those, I configured two mirrored vdevs consisting of two drives each. It seemed relatively stable as far as I was concerned, as I stated in my previous post…there is a known issue with the ‘zfs rollback’ command…which I tested using the GA release,  and I did in fact have problems with.

    The work around at this point seems to be to perform a reboot after the rollback and then a ‘zfs scrub’ on the pool after the reboot. Personally I am hoping this gets fixed soon, because not everyone has the same level of flexibility, when it comes to rebooting their servers and storage nodes.

    As far as I understand it, this module really consists of three pieces:

    1)SPL -  a Linux kernel module which provides many of the Solaris kernel APIs. This layer makes it possible to run Solaris kernel code in the Linux kernel with relatively minimal modification.
    2)ZFS – a Linux kernel module which provides a fully functional and stable SPA, DMU, and ZVOL layer.
    3)LZFS – a Linux kernel module which provides the necessary POSIX layer.

    Pieces #1 and #2 have been available for a while and are derived from code taken from the ZFS on Linux project found here. The folks at KQ Infotech are really building on that and providing piece #3, the missing POSIX layer.

    Only time will tell how stable the code really is, my opinion at this point is that most software projects have some number of known bugs that exist (and even more have some unknown number of bugs as well), I know I am going to continue to test in a non production environment for the next few months.  At this point I have not experienced any instability (other then what was discussed above) or crashing, all the commands seem to work as advertised, there are a lot of features I have not been able to test yet, such as dedup, compression, etc, so there is lots more to look at in the upcoming weeks.

    KQStor’s business model seems to be one where the source code is provided and support is charged for.  So far I have been able to have an open and productive dialog with their developers, and they have been very responsive to my inquiries, however it does not appear that they are going to be setting public tools such as mailing lists or forums, due to their current business model.  I am hoping that this will change in the near future, as I truly believe that everyone will be able to benefit from those kinds of public repositories, and there is no doubt in my mind that such tools will only lead to a more stable product in the long run.

    More native Linux ZFS benchmarks

    Phoronix has published a nice 5 page article, which includes some in-depth file system benchmarks. They tested file systems such as Btrfs, Ext4, Xfs, ZFS-Fuse and the ZFS kernel module from KQ Infotech.

    Here is an excerpt taken from the conclusion section of the article:

    “In terms of our ZFS on Linux benchmarks if you have desired this Sun-created file-system on Linux, hopefully it is not because of the performance expectations for this file-system. As these results illustrate, this ZFS file-system implementation for Linux is not superior to the Linux popular file-systems like EXT4, Btrfs, and XFS. There are a few areas where the ZFS Linux disk performance was competitive, but overall it was noticeably slower than the big three Linux file-systems in a common single disk configuration. That though is not to say ZFS on Linux will be useless as the performance is at least acceptable and clearly superior to that of ZFS-FUSE. More importantly, there are a number of technical merits to the ZFS file-system that makes it one of the most interesting file-systems around.”

    With that being said…I believe that a lot of times when people are choosing to use ZFS as an underlying filesystem for a project, they are not doing so due to it’s reputation as a wonderfully fast file system.  ZFS features such as data integrity, large capacity, snapshotting and deduplication are more likely going to drive your rational for using ZFS as part of your backend storage solution.

    Another thing to note about these benchmarks  is that these tests were run on the beta version of the kernel module, and I assume that once the GA version (and source code) is released, there will be plenty of opportunities to try and mitigate some of these concerns as much as possible, however on the other hand you are going to have to live with some of the overhead that comes with using ZFS if you want to take advantage of it’s large feature set.

    Replication improvements in Mysql 5.5

    As promised Rob Young over at Oracle’s Mysql blog has provided us with one more Mysql 5.5 writeup. This time the focus is on some of the new features that you can expect from Mysql 5.5.

    Here are the topics covered by Rob’s post:

    • Semi-synchronous Replication
    • Replication Heartbeat
    • Automatic Relay Log Recovery
    • Replication Per Server Filtering
    • Replication Slave Side Data Type Conversions

    These are exciting changes, that many people have been looking forward to for quite some time. Including these features in 5.5 will help make sure that replication is even more reliable and more manageable in the future.

    Calculating overall database size in Mysql

    Recently I had a server that was running low on free disk space, after a bit of digging around I found out that the Mysql database on that particular machine was taking up the bulk of the usable disk space.

    Given the fact that this was a shared Mysql instance, I needed to determine which databases were consuming the most amount of space. In order to calculate the total amount of space being used we need to take both the size of the data and all the indexes into account.

    I used the following SELECT query, which will return the size of all databases(data + indexes) in MB.

    SELECT table_schema "Database Name", sum( data_length + index_length) / 1024 / 1024
    "Database Size(MB)" FROM information_schema.TABLES GROUP BY table_schema ;

    Running the query above will result in output similar to this:

    +--------------------+-------------------+
    | Database Name      | Database Size(MB) |
    +--------------------+-------------------+
    | movies             |     3772.06922913 |
    | tmp                |      101.08132978 |
    | bikes              |       57.04234117 |
    | information_schema |        0.00781250 |
    | mysql              |        0.60790825 |
    +--------------------+-------------------+
    

    In this case we can clearly see that the ‘movies’ database is consuming the most space. At this point we may want to dig a little deeper and look at the size of each table within the ‘movies’ database, to see where in particular the space is being used.

    In order to get some more detail we can use the following SELECT query:

    SELECT table_name, table_rows, data_length, index_length,  round(((data_length + index_length) / 1024 / 1024),2) "Size(MB)" FROM information_schema.TABLES WHERE table_schema = "movies";

    Running the query above will result in output similar to this:

    +-----------------------------+------------+-------------+--------------+----------+
    | table_name                  | table_rows | data_length | index_length | Size(MB) |
    +-----------------------------+------------+-------------+--------------+----------+
    | Id                          |          1 |       16384 |            0 |     0.02 |
    | Teaser                      |          1 |       16384 |            0 |     0.02 |
    | TeaserLog                   |      21767 |  3177586576 |       392192 |  3030.76 |
    | TeaserChild                 |     912602 |    48873472 |     33112064 |    78.19 |
    | Director1                   |     460722 |    57229312 |     13156352 |    67.13 |
    | Director2                   |    2044044 |    87801856 |            0 |    83.73 |
    | City                        |     286134 |    17367040 |     17858560 |    33.59 |
    | City_alt_spelling           |       1086 |       65536 |        65536 |     0.13 |
    | City_backup                 |     148811 |    13123584 |            0 |    12.52 |
    | City_misspelling_log        |     166589 |     9977856 |            0 |     9.52 |
    | City_save                   |     148618 |    13123584 |            0 |    12.52 |
    +-----------------------------+------------+-------------+--------------+----------+
    11 rows in set (0.14 sec)
    

    Based on the output from this SQL query we are able to see that the ‘TeaserLog’ table is using up the majority of space within the ‘movies’ database.