Table Type | Elapsed Time |
---|---|
Normal | 7 minutes, 41 seconds |
Partitioned | 5 minutes, 51 seconds |
By partitioning, we gain about 25% improvement in performance. Not bad, but not great. I would have expected more than 25%, given that I loaded approximately 110 million records for the test, and there are approximately 650,000 records per day for the range for the test.
Here are the details of testing:
- Machine is an HP DL385 with 16GB of memory, 1 73GB 15k RPM SCSI drive for boot, root and /var partitions. 3 146GB 15k RPM SCSI drives in RAID-0 stripe (Hardware RAID).
- Using CentOS 4.3 as operating system
- MySQL version 5.1.12-beta (Compiled from SRPM)
- InnoDB Parameters
innodb_buffer_pool_size = 11G
innodb_file_per_table
innodb_open_files = 1000
innodb_log_file_size = 2000M
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 1
innodb_support_xa = 0
- Prior to each test run, the database server was restarted, and mysql was run with the database name (It takes approximately 40 seconds for mysql to run the first time as it caches the table information)
- The test query was run 3 times (each with a db server restart) and the average of the 3 runs taken.
- SQL (Non-partitioned):
select entity_id,buyer_entity_id,buyer_line_item_id,sum(roi_cost) from network_daily_local_nonpart force index (ix_nsdl_ymdh_entity_seller) where ymdh in ('2006-11-01 00:00:00', '2006-11-02 00:00:00', '2006-11-03 00:00:00', '2006-11-04 00:00:00', '2006-11-05 00:00:00', '2006-11-06 00:00:00', '2006-11-07 00:00:00', '2006-11-08 00:00:00', '2006-11-09 00:00:00', '2006-11-10 00:00:00', '2006-11-11 00:00:00', '2006-11-12 00:00:00')and entity_id = 2 group by entity_id,buyer_entity_id,buyer_line_item_id order by entity_id,buyer_entity_id,buyer_line_item_id
- SQL (Partitioned):
select entity_id,buyer_entity_id,buyer_line_item_id,sum(roi_cost) from network_daily_local_part force index (ix_nsdl_ymdh_entity_seller) where ymdh in ('2006-11-01 00:00:00', '2006-11-02 00:00:00', '2006-11-03 00:00:00', '2006-11-04 00:00:00', '2006-11-05 00:00:00', '2006-11-06 00:00:00', '2006-11-07 00:00:00', '2006-11-08 00:00:00', '2006-11-09 00:00:00', '2006-11-10 00:00:00', '2006-11-11 00:00:00', '2006-11-12 00:00:00')and entity_id = 2 group by entity_id,buyer_entity_id,buyer_line_item_id order by entity_id,buyer_entity_id,buyer_line_item_id
Now, one thing at RightMedia, is we have LOTS of data. In fact there is some skew on the
entity_id
column and the majority of the records have the value listed in the SQL above. That still should not explain huge increases in performance, should it?Okay, one last experiment (It's 3:46AM EST!!) I am going to subpartition the partitioned table into 7 subpartitions, hash by the
entity_id
column.Result: 14.32 seconds! Holy friggin' turbo mode Batman! It's way too late to be trying to quantify performance gain, so I'm quantifying it by WTF units. I'd really like to get an explanation of what is really going on behind the scenes that yields this type of performance gain, and why I don't see at least 1 or 2 WTF units of gain on the original partitioned table, given that the amount of data is much less for the date range compared to the whole table.
3 comments:
It's hard to give an explanation without knowing the structure of
network_daily_local_nonpart and network_daily_local_part. In your previous post you showed only a partial structure.
Moreover, you are using the "force index" clause, which will override the optimizer.
To see if you are taking advantage of partitions, try
EXPLAIN PARTITIONS SELECT ....
If you see a partition pruning, then something good is going to happen.
About subpartitions boost, remember that partitioning by hash is beneficial when your query uses the "=" or "IN" operators on a partitioned column.
Please contact me by e-mail (my first name at mysql dot com) if you want to discuss this issue.
Cheers
Giuseppe
hey pratha, nice to see your blog on mysql and partitioning.
I am facing a problem with mysql partitions. Am using mysql-5.1.14-beta. I have a table having around 1.45 crore records. The structure of the table is
CREATE TABLE `SNEW` (
`ID` int(10) DEFAULT NULL,
`U1` varchar(250) NOT NULL,
`U2` varchar(250) NOT NULL,
`NO` int(10) NOT NULL,
`START` date NOT NULL,
`END` date NOT NULL,
KEY `U1` (`U1`),
KEY `U2` (`U2`),
KEY `NO` (`NO`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 PARTITION BY RANGE ( `NO`) (PARTITION p0 VALUES LESS THAN (2) ENGINE = MyISAM, PARTITION p1 VALUES LESS THAN MAXVALUE ENGINE = MyISAM)
When i explain a query :
explain select * from SNEW WHERE (U1= binary 'jayant' or U2= binary 'kumar') and `NO`>=(2)
I see that none of the indexes are being used. Can you help me out on this
wholesale jewelry
handmade jewelry
jewelry wholesale
discount jewelry
handcrafted jewelry
wholesale beads
cheap jewelry
Post a Comment