Sunday, December 02, 2007

Wishlist for partitioning

I love the way partitioning works in MySQL. I remember in the past how many projects I implemented using application logic to parallelize I/O. Partitioning makes this seamless now. But it's not without its share of problems and workarounds. So I compiled my own wishlist that hopefully might make it into a future version of MySQL.

1. Partition level table locking. Partitions should be treated like tables and locked individually rather than the who table and all of its partitions.
2. Ability to add partitions from existing tables. This is very ueful, especially when trying to perform bulk maintainance operations.
3. Ability to convert a partition to a table.
4. Be able to mix and match storage engines for partitions and subpartitions. How cool would it be to have an archive partition for older data reside using ARCHIVE tables while the remaining partitions are InnoDB or MyISAM.
5. More usuable datatypes for partition pruning. How many times can you use datetme when timestamp is available. Also when would functions other than TO_DAYS and YEAR be supported?

Tuesday, June 26, 2007

Version 3 of mysqlbackup - small bug fix

I just posted version 3 of mysqlbackup to MySQL Forge.

Small bugfix: Added option --add-drop-table to the default options for mysqldump. This was causing a failure in restoring views.

Sunday, June 17, 2007

Not all MySQL errors are visible to replication

This probably warrants a bug report to MySQL, but I want to let other people know about this first. There are situations where MySQL receives incomplete statements from replication relay logs, but does not trigger a replication error. Case in point is exceeding max_allowed_packet.

I recently had a situation where once of my machines was incorrectly configured with a different value for max_allowed_packet. What happened is not what I had expected. Instead of receiving a replication error (which we monitor for using Nagios), the MySQL error log was spewing with messages about exceeding max_allowed_packet. Instead, the only visible problem through our monitoring framework was that replication had fallen behind, and was continuing to fall behind.

Fixing the problem was rather easy: stop the slave, change the max_allowed_packet variable globally in the db server and in the configuration file, and then start the slave.

This is one of those things that falls under the category "MySQL annoyances and one-offs". Shouldn't this really trigger a true replication error, rather than spewage in log files? I will have to reproduce this and then file a bug report to MySQL, but I really shouldn't have to if there was some consistency in error reporting.

Script to backup binary logs on a master

I have recently posted a script on MySQL Forge to back up MySQL binary logs. One of the ideas that I had when I originally wrote the script was to take into account all of the slaves and what master log file & position that each one has executed. This way, only the relevant binary logs would get archived and then subsequently purged. You can find the script here.

Reload data quickly into MySQL InnoDB tables

As DBAs that manage large quantities of database servers, we are always looking for the fastest or most efficient way to load data into the database. Some DBAs have quarterly maintenance periods where they reload data into a database to refresh the indexes.

If you primarily use InnoDB tables in your MySQL database server, then these set of tricks will help in trying to make the reload process a bit faster than just a straight dump & reload.

my.cnf configuration
innodb_flush_log_at_trx_commmit = 0
innodb_support_xa = 0
skip-innodb-doublewrite
disable log-bin & log_slow_queries

Since the goal is to reload data quickly, we need to eliminate any potential bottlenecks. Setting innodb_flush_log_at_trx_commit = 0 this will reduce the amount of disk I/O by avoiding a flush to disk on each commit. If you are not using XA compliant transactions (multi system two-phase commits) then you won't need this option set. This will avoid an extra disk flush before the transaction starts. The skip-innodb-doublewrite option will turn off the use of this buffer for inserts, which will actually eek out a little bit more performance. Also if you don't need to use the binary log, turn it off during your reload period. Remember any excess disk I/O that is not needed will hurt in the performance of reloading the database.


Unloading the data
There are many ways to unload & reload the data using the standard MySQL tools or your own crafted toolset. Again the main idea is efficiency. The best advice here is while selecting the data to be unloaded, make sure that the select is in primary key order. If the data is sorted ahead of time, it loads pretty fast back into InnoDB as the primary key is a clustered index, meaning that the data is sorted based on the primary key as it is inserted into the database. Use the --order-by-primary option of the mysqldump utility while selecting the data.

I hope these small tips help you make the process a bit less painful.

Thursday, May 10, 2007

Why is QA so important?



Someone should really check the label before mass producing. Doesn't this look like SQL from SQL Server????

Wednesday, February 28, 2007

A case of no research before product rollout

I am digressing a bit from my usual topics, but this this pretty hilarious....

Wanted to get a bit of an after dinner snack and I find some corn dogs in the freezer. Yum!

Pop one out, read the instructions on the box (yes, RTFM!!!) and what do I see? Instructions that read Heat for 70 seconds.



When was the last time you have ever set the timer on a microwave oven for 70 seconds?????!!!!! Would it have taken just too much time to do some product research and realize that you should set the microwave for 1 minute 10 seconds?

So is there a moral to this story? Kind of. Documentation is not to be taken lightly, no matter how trivial it may seem to be.

Sunday, February 04, 2007

MySQL and iSCSI - a winning combo!

So it's been a very long time again between posts. So much has happened. Let me first begin by saying that I am very impressed with iSCSI performance and I believe that it is mature enough to actually run production workloads (but it really depends on the type of workload).

After all of the benchmarking and analysis, we finally decided on moving forward with a purchase of an iSCSI storage solution. For the types of queries we run (large amount of records to scan, small resultset returned) we had to tweak the schema just a bit in order to realize the performance that we desired (that plus good quality fibre-channel drives to get that extra oomph that's needed).

Bottom line is we had to make a significant investment in hardware in order to realize the benefits of having a proper storage solution in place. The benefits though outweigh the overwhelming maintenance required to keep all of the machines running. Backups using the storage provider's snapshot mechanism will be extremely beneficial as well. All in all, a good decision to ease our minds.