Monthly Archives: November 2010

More native Linux ZFS benchmarks

Phoronix has published a nice 5 page article, which includes some in-depth file system benchmarks. They tested file systems such as Btrfs, Ext4, Xfs, ZFS-Fuse and the ZFS kernel module from KQ Infotech.

Here is an excerpt taken from the conclusion section of the article:

“In terms of our ZFS on Linux benchmarks if you have desired this Sun-created file-system on Linux, hopefully it is not because of the performance expectations for this file-system. As these results illustrate, this ZFS file-system implementation for Linux is not superior to the Linux popular file-systems like EXT4, Btrfs, and XFS. There are a few areas where the ZFS Linux disk performance was competitive, but overall it was noticeably slower than the big three Linux file-systems in a common single disk configuration. That though is not to say ZFS on Linux will be useless as the performance is at least acceptable and clearly superior to that of ZFS-FUSE. More importantly, there are a number of technical merits to the ZFS file-system that makes it one of the most interesting file-systems around.”

With that being said…I believe that a lot of times when people are choosing to use ZFS as an underlying filesystem for a project, they are not doing so due to it’s reputation as a wonderfully fast file system.  ZFS features such as data integrity, large capacity, snapshotting and deduplication are more likely going to drive your rational for using ZFS as part of your backend storage solution.

Another thing to note about these benchmarks  is that these tests were run on the beta version of the kernel module, and I assume that once the GA version (and source code) is released, there will be plenty of opportunities to try and mitigate some of these concerns as much as possible, however on the other hand you are going to have to live with some of the overhead that comes with using ZFS if you want to take advantage of it’s large feature set.

Replication improvements in Mysql 5.5

As promised Rob Young over at Oracle’s Mysql blog has provided us with one more Mysql 5.5 writeup. This time the focus is on some of the new features that you can expect from Mysql 5.5.

Here are the topics covered by Rob’s post:

  • Semi-synchronous Replication
  • Replication Heartbeat
  • Automatic Relay Log Recovery
  • Replication Per Server Filtering
  • Replication Slave Side Data Type Conversions

These are exciting changes, that many people have been looking forward to for quite some time. Including these features in 5.5 will help make sure that replication is even more reliable and more manageable in the future.

Calculating overall database size in Mysql

Recently I had a server that was running low on free disk space, after a bit of digging around I found out that the Mysql database on that particular machine was taking up the bulk of the usable disk space.

Given the fact that this was a shared Mysql instance, I needed to determine which databases were consuming the most amount of space. In order to calculate the total amount of space being used we need to take both the size of the data and all the indexes into account.

I used the following SELECT query, which will return the size of all databases(data + indexes) in MB.

SELECT table_schema "Database Name", sum( data_length + index_length) / 1024 / 1024
"Database Size(MB)" FROM information_schema.TABLES GROUP BY table_schema ;

Running the query above will result in output similar to this:

+--------------------+-------------------+
| Database Name      | Database Size(MB) |
+--------------------+-------------------+
| movies             |     3772.06922913 |
| tmp                |      101.08132978 |
| bikes              |       57.04234117 |
| information_schema |        0.00781250 |
| mysql              |        0.60790825 |
+--------------------+-------------------+

In this case we can clearly see that the ‘movies’ database is consuming the most space. At this point we may want to dig a little deeper and look at the size of each table within the ‘movies’ database, to see where in particular the space is being used.

In order to get some more detail we can use the following SELECT query:

SELECT table_name, table_rows, data_length, index_length,  round(((data_length + index_length) / 1024 / 1024),2) "Size(MB)" FROM information_schema.TABLES WHERE table_schema = "movies";

Running the query above will result in output similar to this:

+-----------------------------+------------+-------------+--------------+----------+
| table_name                  | table_rows | data_length | index_length | Size(MB) |
+-----------------------------+------------+-------------+--------------+----------+
| Id                          |          1 |       16384 |            0 |     0.02 |
| Teaser                      |          1 |       16384 |            0 |     0.02 |
| TeaserLog                   |      21767 |  3177586576 |       392192 |  3030.76 |
| TeaserChild                 |     912602 |    48873472 |     33112064 |    78.19 |
| Director1                   |     460722 |    57229312 |     13156352 |    67.13 |
| Director2                   |    2044044 |    87801856 |            0 |    83.73 |
| City                        |     286134 |    17367040 |     17858560 |    33.59 |
| City_alt_spelling           |       1086 |       65536 |        65536 |     0.13 |
| City_backup                 |     148811 |    13123584 |            0 |    12.52 |
| City_misspelling_log        |     166589 |     9977856 |            0 |     9.52 |
| City_save                   |     148618 |    13123584 |            0 |    12.52 |
+-----------------------------+------------+-------------+--------------+----------+
11 rows in set (0.14 sec)

Based on the output from this SQL query we are able to see that the ‘TeaserLog’ table is using up the majority of space within the ‘movies’ database.

Performance and scalability improvements in Mysql 5.5

Oracle’s Mysql Blog has a very good post that provides an overview of some of the improvements that you can expect in the upcoming Mysql 5.5 release.

This writeup focuses mainly on the changes as they relate to performance and scalability, however the author (Rob Young) expresses his plans to discuss other aspects as well, sometime in the near future.

Here are just a few of the topics covered by Rob:

  • Improved Default Thread Concurrency
  • Improved Recovery Performance
  • Multiple Buffer Pool Instances
  • Native Asynchronous I/O for Linux
  • Improved Metadata Locking Within Transactions
  • Better performance on Windows based installs

At some point I hope to continue my testing and benchmarking of several different versions of Mysql such as MariaDB, Percona and Mysql 5.5. However for production databases we will be sticking with the Mysql 5.1.x code branch for the foreseeable feature.