Saturday, April 23, 2011

Covering indexes in MySQL - revisited (with benchmark)

In the process of building a new benchmark tool for Yahoo, I needed a good "guinea pig." I think I found the one by showing how much more powerful covering indexes can be with InnoDB. A covering index is one where the index itself contains all of the necessary data field(s). In other words, no need to access the data page!

Here's a sample table:
CREATE TABLE entity (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`entity_type` enum('Advertiser','Publisher','Network') NOT NULL DEFAULT 'Advertiser',
`managing_entity_id` int(10) unsigned DEFAULT NULL,
....
PRIMARY KEY (`id`),
KEY `ix_entity_managing_entity_id` (`managing_entity_id`),
....
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

It's a wide table with 88 some odd columns with an average row length of 240 bytes.

Now to test the QPS difference between a query like this:
SELECT * FROM entity WHERE entity.managing_entity_id = X;

and a query that uses a covering index:
SELECT entity.managing_entity_id FROM entity WHERE entity.managing_entity_id = X;

I have an SQL file that contains 2000 SQL statements with fairly random ids, and I'll run my benchmark program with 4 clients a few times to get a good QPS reading (throwing out the low & high numbers).

Results with SELECT *:
Tot Q=2000,Tot T=1.757571,QPS=1137.93411475269,Min=0.0006665,Max=0.027022,90th Pctl=0.000946
Tot Q=2000,Tot T=1.770522,QPS=1129.61036349732,Min=0.0006635,Max=0.026858,90th Pctl=0.00096675
Tot Q=2000,Tot T=1.753147,QPS=1140.80564835693,Min=0.00066325,Max=0.026966,90th Pctl=0.00095175

Results with covering index:
Tot Q=2000,Tot T=0.703581,QPS=2842.60092299252,Min=0.00026625,Max=0.014821,90th Pctl=0.00035325
Tot Q=2000,Tot T=0.897792,QPS=2227.6874821785,Min=0.00026625,Max=0.04546425,90th Pctl=0.0003595
Tot Q=2000,Tot T=0.720616,QPS=2775.403266094,Min=0.000257,Max=0.01566475,90th Pctl=0.0003452

Fairly significant results. I've seen many times where SQL has been hastily written just to perform a simple check if a record exists or not. I've always tried to rationalize this with the developers I work with, but I never had the raw numbers to show the exact impact. If the query happens only a handful of times, not a huge deal. But, if it happens hundreds of thousands or millions of times per day, then that's a different story, and making a very simple rewrite of the SQL helps immensely, and could even save on cost, if you're thinking about adding another slave to spread out load.

4 comments:

Dimitri said...

Your test is not proper.. - to compare apples to apples, you have not to use "SELECT * ..." but rather a SELECT with a set of fields, and then run the test with and without an index including teh whole set of fields!..

On the current test it looks more likely overhead of "SELECT * " fetching rather covering index.

Rgds,
-Dimitri

Andy said...

Did the table fit within memory or did MySQL have to go to disk to fetch the data?

Anonymous said...

the correct and fastest way for checking existing is below. It works for all vendors

SELECT COUNT(*) or SELECT 1
FROM ...
WHERE ...

Partha Dutta said...

Dmitri,

The idea and point I'm trying to show is that there is overhead in SELECT *. You are right. This is not really a test, and not meant to do comparison, but more education.

Partha