Sunday, December 17, 2006

Primary Key Order Does Matter!

There have been a few posts on PlanetMySQL regarding primary keys and the importance of choosing the right one. This is even more important when the table uses InnoDB. You've read different posts of why it is so important. Now, I'm all about benchmarks and showing the details. So I'll take a table from my previous posts about MySQL 5.1 partitioning and show what I found.

This table was created under MySQL 5.1.12-beta:

CREATE TABLE `big_table_test1` (
`entity_id` int(11) NOT NULL DEFAULT '0',
`col1` int(11) NOT NULL DEFAULT '0',
`col2` int(11) NOT NULL DEFAULT '0',
`col3` int(11) NOT NULL DEFAULT '0',
`col4` int(11) NOT NULL DEFAULT '0',
`col5` int(11) NOT NULL DEFAULT '0',
`col6` int(11) NOT NULL DEFAULT '0',
`ymdh` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`imps` bigint(20) NOT NULL DEFAULT '0',
`clicks` int(11) NOT NULL DEFAULT '0',
`convs` int(11) NOT NULL DEFAULT '0',
`id` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`,`ymdh`),
KEY `ix_big1` (`ymdh`,`entity_id`,`col3`) USING BTREE,
KEY `ix_big2` (`ymdh`,`entity_id`,`col4`) USING BTREE,
KEY `ix_big3` (`ymdh`,`entity_id`,`col2`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPACT


I loaded about 180 million records into this table (a small set of data for us!) and ran one of our really popular types of queries:



SELECT col1,col2,col3,SUM(imps),SUM(clicks),SUM(convs)
FROM big_table_test1
WHERE ymdh IN ('2006-10-01 00:00:00','2006-10-02 00:00:00','2006-10-03 00:00:00',
'2006-10-04 00:00:00','2006-10-05 00:00:00','2006-10-06 00:00:00',
'2006-10-07 00:00:00')
AND entity_id = 2
GROUP BY col1, col2, col3
ORDER BY col1, col2, col3;
Doesn't look terribly nasty does it? This query takes about 7 MINUTES to run!!! EXPLAIN on the query shows nothing out of the ordinary, as it uses one of the secondary indexes on the table. The cardinality of entity_id is not really high, so forcing one of the secondary indexes over another wouldn't yield any performance benefits. The id column is basically a numerical hash of the tables "real" primary key, which is entity_id plus col1 through col6, and is used for uniqueness. What's interesting is that throughout our application, there are no direct queries against this id column. It just exists. But, it can't be removed.

If this column really serves no really significant value, what if we swapped the order of the definition of the primary key? So the definition of the primary key looks like:

PRIMARY KEY (`ymdh`,`id`)

Logically, no difference so we do not break any uniqueness constraints in the application. If we run the query again, 4 SECONDS!!!! Wow! How do we explain this massive performance increase?

Remember that InnoDB uses a clustered index for the primary key. Clustered indexes are indexes that are built based on the same key by which the data is ordered on disk. They are very efficient during scanning, but have performance implications when inserting new data, as some re-ordering may need to be done. All of our data is inserted in ymdh column order, so it makes sense if the primary key was based on this column. There are a lot of efficiencies that can be obtained, such as sequential disk read-ahead. The previous index for the primary key needs lots of random disk I/O to read the data portion of the table.

9 comments:

Santosh said...

Remember that InnoDB uses a clustered index for the primary key. Clustered indexes are indexes that are built based on the same key by which the data is ordered on disk.

So, in the first case, the data on the disk would be ordered by "id" in a monotonically increasing/decreasing order? Would that be correct? If true, INSERT's would also take longer and involve more work.

Or have I misunderstood?

ctx2002 said...

hi:

i am not a expert on mysql , just use it on job.

you article is also suprised me, since change order of primay key will have such a big performance impact.

but since you did not write out how you actully test query, so i have to say have you check other things? before claim change order of primay key will do a such big improvement to query?

"other things" i mean, do you have any cache system running on same machine as mysql server, probabaly query result comes cache system, or any configuration setting will affect query speed.

if change order of primay key will have that big impact, i think this is defintely a bug. you should report it to innodb team.

regards,

ctx2002

Chip Turner said...

I think you're misunderstanding what is going on here. The problem is the indexes on ymdh do not include all of the columns you are querying. This means, for every row innodb fetches from the secondary index, it then must probe the primary key index to find the other columns of data. So you are doing scans on an index, but then random IO inside of the table itself.

In the case where you have the PK on (ymdh, id) then you trigger the innodb behavior of index organized tables, that is, the PK index *is* the table, and in scanning it, you get all of the columns for "free". In this case, you access the table by scanning ranges, which is very efficient.

You would see the same performance improvement if you made a secondary index leading with ymdh and containing all of the columns you're querying.

Santosh said...

Perhaps this might help:
Multi-column indexes in MySQL

MySQL can create composite indexes (that is, indexes on multiple columns). An index may consist of up to 15 columns. For certain data types, you can index a prefix of the column (see Section 7.4.3, “Column Indexes”).

A multiple-column index can be considered a sorted array containing values that are created by concatenating the values of the indexed columns.

MySQL uses multiple-column indexes in such a way that queries are fast when you specify a known quantity for the first column of the index in a WHERE clause, even if you do not specify values for the other columns.

Santosh said...

Here is another section from the same manual,

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

MySQL cannot use a partial index if the columns do not form a leftmost prefix of the index.

How MySQL uses indexes

coco0610 said...

wholesale jewelry
handmade jewelry
jewelry wholesale
fashion jewelry
costume jewelry

Anonymous said...

看房子,買房子,建商自售,自售,台北新成屋,台北豪宅,新成屋,豪宅,美髮儀器,美髮,儀器,髮型,EMBA,MBA,學位,EMBA,專業認證,認證課程,博士學位,DBA,PHD,在職進修,碩士學位,推廣教育,DBA,進修課程,碩士學位,網路廣告,關鍵字廣告,關鍵字,廣告,課程介紹,學分班,文憑,牛樟芝,段木,牛樟菇,日式料理, 台北居酒屋,燒肉,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,住宿,訂房,HOTEL,飯店,造型系列,學位,牛樟芝,腦磷脂,磷脂絲胺酸,SEO,婚宴,捷運,學區,美髮,儀器,髮型,牛樟芝,腦磷脂,磷脂絲胺酸,看房子,買房子,建商自售,自售,房子,捷運,學區,台北新成屋,台北豪宅,新成屋,豪宅,學位,碩士學位,進修,在職進修, 課程,教育,學位,證照,mba,文憑,學分班,網路廣告,關鍵字廣告,關鍵字,SEO,关键词,网络广告,关键词广告,SEO,关键词,网络广告,关键词广告,SEO,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,居酒屋,燒烤,美髮,儀器,髮型,美髮,儀器,髮型,美髮,儀器,髮型,美髮,儀器,髮型,小套房,小套房,進修,在職進修,留學,證照,MBA,EMBA,留學,MBA,EMBA,留學,進修,在職進修,牛樟芝,段木,牛樟菇,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,住宿,民宿,飯店,旅遊,美容,美髮,整形,造型,設計,室內設計,裝潢,房地產,進修,在職進修,MBA,EMBA,關鍵字排名,網路行銷,关键词排名,网络营销,網路行銷,關鍵字排名,关键词排名,网络营销,羅志祥,周杰倫,五月天,蔡依林,林志玲,羅志祥,周杰倫,五月天,蔡依林,林志玲,PMP,在職專班,研究所在職專班,碩士在職專班,PMP,證照,在職專班,研究所在職專班,碩士在職專班

huangtiao said...

Such was the case when second life linden labs decided to pull the plug on several banks in the game. Some of these banks offered the system to buy lindens what real life banks would an interest rate. Some people get their secondlife money out of the banks says that Second Life depositors may have US$ 750,000 in real life money. The new rules state that only banks with proof of an applicable government registration statement or financial institution character will be able to operate with cheap linden in the game. Second Life has proven to be a popular place for companies to set up virtual shop where many players can exchange their favorite something with all kinds of linden dollars.
Sometimes in the Shadow of Legend game, after winning the enemies, they will understand to share the shadow of legend Gold as their trophy. A beautiful and rich sol gold environment coupled with an engrossing storyline in this Shadow of Legend game. A once beautiful and lush world and the first process you have to buy shadow of legend Gold, the game in fact is full of creatures and inhabitants. He constant warring left the fate of the inhabitants with cheap shadow of legend Gold of Agnes at the hands of the gods. Shadow of legend and the necessary or useful shadow of legend money is the answer for you.

Anonymous said...

The devaluation of the fiesta Gold grows faster than we think of it. I hope it is reflects to the management department about the question of the fiesta money.
It is caused by the continuous influx of fiesta online gold market currencies. On the other hand, buy fiesta Gold is one of these questions. The fiesta online money is one of the causes.
So to have a good attack, you should level your skill in cabal online alz. Also, get the highest level jewelry that you can equip in thecabal alz. This step depends on your skill rank, for lower magic ranks get 4 fast notice arrow skills and your cabal gold. Of course, if you have more cabal money, you can get more in the game. By the way, you can buy pet from other players in buy cabal alz.