Sunday, December 03, 2006

MySQL 5.1 Partitioning - Part 3

Finally, it's time to start putting MySQL 5.1.12-beta through the wringer. First order of business, convert the existing table schema to one that supports partitioning...

I made some minor changes to the configuration for partitioning, namely innodb_file_per_table and innodb_open_files. I set innodb_open_files to 1000 based on the tables and partitions I plan on supporting.

This is what the new table schema looks like with partitioning:

CREATE TABLE `network_daily` (
`entity_id` int(11) NOT NULL default '0',
`buyer_entity_id` int(11) NOT NULL default '0',
`buyer_line_item_id` int(11) NOT NULL default '0',
`seller_entity_id` int(11) NOT NULL default '0',
`seller_line_item_id` int(11) NOT NULL default '0',
`size_id` int(11) NOT NULL default '0',
`pop_type_id` int(11) NOT NULL default '0',
`country_group_id` int(11) NOT NULL default '0',
`is_adjustment` tinyint(4) NOT NULL default '0',
`adv_learn_type` char(1) NOT NULL default '',
`pub_learn_type` char(1) NOT NULL default '',
`frequency` smallint(6) NOT NULL default '0',
`ymdh` datetime NOT NULL default '0000-00-00 00:00:00',
`imps` bigint(20) NOT NULL default '0',
`clicks` int(11) NOT NULL default '0',
`convs` int(11) NOT NULL default '0',
`id` int(10) unsigned NOT NULL default '0',
`checkpoint` int(11) default NULL,
PRIMARY KEY (`id`,`ymdh`),
KEY `ix_nsl_ymdh_buyerli` (`ymdh`,`buyer_line_item_id`),
KEY `ix_nsdl_ymdh_entity_buyer` (`ymdh`,`entity_id`,`buyer_entity_id`),
KEY `ix_nsdl_ymdh_entity_seller` (`ymdh`,`entity_id`,`seller_entity_id`)
) ENGINE=InnoDB
PARTITION BY RANGE (TO_DAYS(`ymdh`))
(
PARTITION p2005 VALUES LESS THAN (TO_DAYS('2006-01-01')),
PARTITION p200601 VALUES LESS THAN (TO_DAYS('2006-02-01')),
PARTITION P200602 VALUES LESS THAN (TO_DAYS('2006-03-01')),

...

PARTITION P200609 VALUES LESS THAN (TO_DAYS('2006-10-01')),
PARTITION P20061001 VALUES LESS THAN (TO_DAYS('2006-10-02')),
PARTITION P20061002 VALUES LESS THAN (TO_DAYS('2006-10-03')),
PARTITION P20061003 VALUES LESS THAN (TO_DAYS('2006-10-04')),

...

PARTITION P20061130 VALUES LESS THAN (TO_DAYS('2006-12-01'))
);

There are a lot of partitions that I have defined. I need to keep a rolling 60 days of daily partitions active. The plan is to use the ALTER TABLE REORGANIZE PARTITION statement to merge the older partitions together once per day, and to add a new partition once per day.

After the data was reloaded, it was time to test the performance, and the ability to add partitions and reorganize them (this was broken in version 5.1.11-beta).

Performance, surprisingly wasn't what I expected. Queries that ran on the partitions were about the same performance wise as those on the unpartitioned table. Hmm, I have to double check my results. I'll post all of the performance data.

The real good news is that the partition maintenance commands all worked with InnoDB! Dropping & Reorganizing partitions worked perfectly. I'll have to redo my testing and to see what happened to the performance.

7 comments:

Giuseppe Maxia said...

Hi,
very intriguing.
I would like to see the data and the actual code you used to get your results.
Can you post them?

Thanks

Giuseppe

Partha Dutta said...

Giuseppe, I will post Part 4 with all of the details on the testing and how it was done. You'll find very interesting results.

Partha

Mikael Ronstrom said...

Hi Partha,
Very interesting to read.
I see no problems why you shouldn't
get a boost on performance when there
are queries involving ranges on
ymdh. However EXPLAIN PARTITIONS will
show you whether you actually get
a boost. Also remember that the first
query against the table will be fairly
slow given that it has to open all
partitions.

Mikael Ronstrom

Alexandru Toth said...

Hi,

There is a new Open Source OLAP project, called Cubulus OLAP. It is in early-alpha, none the less it can already show aggregated measures across hierarchical dimensions.

OLAP aggregation engine uses "hierarchical range-clustering of keys" (read "V. Markl, F.Ramsak, R. Bayer: Improving OLAP Performance by Multidimensional Hierarchical Clustering. Proc. of the Intl. Database Engineering and Applications Symposium, pp. 165-177, 1999") . Dr Volker Markl expressed in mail his consent to use the range clustering in an OSS project.
The main difference to the closest Open Source OLAP project - Mondrian a.k.a Pentaho, is the fact that Cubulus does all the aggregations inside the relational database (by generating multiple queries).
Even if there are multiple tables, there is no "star schema", no GROUP BY and no JOINS. Due to the "Multidimensional Hierarchical Clustering" all SQL statements are simple SELECT SUM(measure) with a range WHERE clause for each dimension. There is always a full-table scan, so indexes provide only little help.

But partitioning gives good performance boost, especially when partitioning over 3 dimensions.

Theoretical background for simple SELECTS with WHERE clauses can be derived from the article "Anastassia Ailamaki , David J. DeWitt , Mark D. Hill , David A. Wood, DBMSs on a Modern Processor: Where Does Time Go?, Proceedings of the 25th International Conference on Very Large Data Bases", p.266-277, September 07-10, 1999 . A C language case study is presented "Conjunctive Selection Conditions in Main Memory", Kenneth A. Ross, 2002 (?)

Cheers, Alex
http://sourceforge.net/projects/cubulus
alxtoth at users . sourceforge . net

Alexandru Toth said...

Hi, there is a brief presentation material at http://cubulus.sourceforge.net/

I think it is relevant in this forum because Cubulus uses MySQL partitions (hundreds of them) and parallel query.

Cheers, Alexandru

Unknown said...

Hi,

I already have a huge database table which would benefit from partitition. I think so, as i expect this to avoid having full table scan.

By the way , no where i find how to alter a table to have partition. Please help me with a link if any.

Regards,

harish

Anonymous said...

So to have a good attack, you should level your skill in cabal online alz. Also, get the highest level jewelry that you can equip in thecabal alz. This step depends on your skill rank, for lower magic ranks get 4 fast notice arrow skills and your cabal gold. Of course, if you have more cabal money, you can get more in the game. By the way, you can buy pet from other players in buy cabal alz.
Many people play the game maybe feel that they were too poor, some of my friend often ask, why you play games so there is ghost gen.
Say good pull so much I do not pull hedge with the US to talk about the game experience to make ghost online gen. I want to see if there is nothing worth ghost gold things and a very low price. The most important thing is you have ghost online gold. Also we have a good time happy in the game and hope earned more ghost money in the game.