Partha's Blog: Covering indexes in MySQL - revisited (with benchmark)

In the process of building a new benchmark tool for Yahoo, I needed a good "guinea pig." I think I found the one by showing how much more powerful covering indexes can be with InnoDB. A covering index is one where the index itself contains all of the necessary data field(s). In other words, no need to access the data page!

Here's a sample table:

CREATE TABLE entity (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`entity_type` enum('Advertiser','Publisher','Network') NOT NULL DEFAULT 'Advertiser',
`managing_entity_id` int(10) unsigned DEFAULT NULL,
....
PRIMARY KEY (`id`),
KEY `ix_entity_managing_entity_id` (`managing_entity_id`),
....
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

It's a wide table with 88 some odd columns with an average row length of 240 bytes.

Now to test the QPS difference between a query like this:

SELECT * FROM entity WHERE entity.managing_entity_id = X;

and a query that uses a covering index:

SELECT entity.managing_entity_id FROM entity WHERE entity.managing_entity_id = X;

I have an SQL file that contains 2000 SQL statements with fairly random ids, and I'll run my benchmark program with 4 clients a few times to get a good QPS reading (throwing out the low & high numbers).

Results with SELECT *:

Tot Q=2000,Tot T=1.757571,QPS=1137.93411475269,Min=0.0006665,Max=0.027022,90th Pctl=0.000946
Tot Q=2000,Tot T=1.770522,QPS=1129.61036349732,Min=0.0006635,Max=0.026858,90th Pctl=0.00096675
Tot Q=2000,Tot T=1.753147,QPS=1140.80564835693,Min=0.00066325,Max=0.026966,90th Pctl=0.00095175

Results with covering index:

Tot Q=2000,Tot T=0.703581,QPS=2842.60092299252,Min=0.00026625,Max=0.014821,90th Pctl=0.00035325
Tot Q=2000,Tot T=0.897792,QPS=2227.6874821785,Min=0.00026625,Max=0.04546425,90th Pctl=0.0003595
Tot Q=2000,Tot T=0.720616,QPS=2775.403266094,Min=0.000257,Max=0.01566475,90th Pctl=0.0003452

Fairly significant results. I've seen many times where SQL has been hastily written just to perform a simple check if a record exists or not. I've always tried to rationalize this with the developers I work with, but I never had the raw numbers to show the exact impact. If the query happens only a handful of times, not a huge deal. But, if it happens hundreds of thousands or millions of times per day, then that's a different story, and making a very simple rewrite of the SQL helps immensely, and could even save on cost, if you're thinking about adding another slave to spread out load.

4 comments:

Dimitri said...: Your test is not proper.. - to compare apples to apples, you have not to use "SELECT * ..." but rather a SELECT with a set of fields, and then run the test with and without an index including teh whole set of fields!..

On the current test it looks more likely overhead of "SELECT * " fetching rather covering index.

Rgds,
-Dimitri; 4/23/2011 4:12 AM
Andy said...: Did the table fit within memory or did MySQL have to go to disk to fetch the data?; 4/23/2011 5:46 AM
Anonymous said...: the correct and fastest way for checking existing is below. It works for all vendors

SELECT COUNT(*) or SELECT 1
FROM ...
WHERE ...; 4/23/2011 8:43 AM
Partha Dutta said...: Dmitri,

The idea and point I'm trying to show is that there is overhead in SELECT *. You are right. This is not really a test, and not meant to do comparison, but more education.

Partha; 4/24/2011 9:12 AM

Partha's Blog

Saturday, April 23, 2011

Covering indexes in MySQL - revisited (with benchmark)

4 comments:

Contributors

Blog Archive

Links