Monday, December 04, 2006

MySQL 5.1 Partitioning - Part 4 (Results)

In my previous post I started out by setting up MySQL 5.1.12 on a box in order to test the performance of the new partitioning option. During testing, I noted that I did not see any noticeable performance improvements from using partitioning. So I spent some time Saturday and Sunday (I guess I don't have anything else better to do!) to build the testing environment and perform the tests. So I was wrong, but only slightly. Partitioning does show better performance than standard tables, but not by as much as you would think. But, wait, there is light at the end of the tunnel (as well as a WTF). The numbers...














Table Type Elapsed Time
Normal 7 minutes, 41 seconds
Partitioned 5 minutes, 51 seconds


By partitioning, we gain about 25% improvement in performance. Not bad, but not great. I would have expected more than 25%, given that I loaded approximately 110 million records for the test, and there are approximately 650,000 records per day for the range for the test.

Here are the details of testing:

  • Machine is an HP DL385 with 16GB of memory, 1 73GB 15k RPM SCSI drive for boot, root and /var partitions. 3 146GB 15k RPM SCSI drives in RAID-0 stripe (Hardware RAID).
  • Using CentOS 4.3 as operating system
  • MySQL version 5.1.12-beta (Compiled from SRPM)
  • InnoDB Parameters

    • innodb_buffer_pool_size = 11G
    • innodb_file_per_table
    • innodb_open_files = 1000
    • innodb_log_file_size = 2000M
    • innodb_flush_method = O_DIRECT
    • innodb_flush_log_at_trx_commit = 1
    • innodb_support_xa = 0

  • Prior to each test run, the database server was restarted, and mysql was run with the database name (It takes approximately 40 seconds for mysql to run the first time as it caches the table information)
  • The test query was run 3 times (each with a db server restart) and the average of the 3 runs taken.
  • SQL (Non-partitioned): select entity_id,buyer_entity_id,buyer_line_item_id,sum(roi_cost) from network_daily_local_nonpart force index (ix_nsdl_ymdh_entity_seller) where ymdh in ('2006-11-01 00:00:00', '2006-11-02 00:00:00', '2006-11-03 00:00:00', '2006-11-04 00:00:00', '2006-11-05 00:00:00', '2006-11-06 00:00:00', '2006-11-07 00:00:00', '2006-11-08 00:00:00', '2006-11-09 00:00:00', '2006-11-10 00:00:00', '2006-11-11 00:00:00', '2006-11-12 00:00:00')and entity_id = 2 group by entity_id,buyer_entity_id,buyer_line_item_id order by entity_id,buyer_entity_id,buyer_line_item_id
  • SQL (Partitioned): select entity_id,buyer_entity_id,buyer_line_item_id,sum(roi_cost) from network_daily_local_part force index (ix_nsdl_ymdh_entity_seller) where ymdh in ('2006-11-01 00:00:00', '2006-11-02 00:00:00', '2006-11-03 00:00:00', '2006-11-04 00:00:00', '2006-11-05 00:00:00', '2006-11-06 00:00:00', '2006-11-07 00:00:00', '2006-11-08 00:00:00', '2006-11-09 00:00:00', '2006-11-10 00:00:00', '2006-11-11 00:00:00', '2006-11-12 00:00:00')and entity_id = 2 group by entity_id,buyer_entity_id,buyer_line_item_id order by entity_id,buyer_entity_id,buyer_line_item_id


Now, one thing at RightMedia, is we have LOTS of data. In fact there is some skew on the entity_id column and the majority of the records have the value listed in the SQL above. That still should not explain huge increases in performance, should it?

Okay, one last experiment (It's 3:46AM EST!!) I am going to subpartition the partitioned table into 7 subpartitions, hash by the entity_id column.

Result: 14.32 seconds! Holy friggin' turbo mode Batman! It's way too late to be trying to quantify performance gain, so I'm quantifying it by WTF units. I'd really like to get an explanation of what is really going on behind the scenes that yields this type of performance gain, and why I don't see at least 1 or 2 WTF units of gain on the original partitioned table, given that the amount of data is much less for the date range compared to the whole table.

3 comments:

Giuseppe Maxia said...

It's hard to give an explanation without knowing the structure of
network_daily_local_nonpart and network_daily_local_part. In your previous post you showed only a partial structure.

Moreover, you are using the "force index" clause, which will override the optimizer.
To see if you are taking advantage of partitions, try
EXPLAIN PARTITIONS SELECT ....
If you see a partition pruning, then something good is going to happen.

About subpartitions boost, remember that partitioning by hash is beneficial when your query uses the "=" or "IN" operators on a partitioned column.

Please contact me by e-mail (my first name at mysql dot com) if you want to discuss this issue.

Cheers
Giuseppe

gamegeek said...

hey pratha, nice to see your blog on mysql and partitioning.

I am facing a problem with mysql partitions. Am using mysql-5.1.14-beta. I have a table having around 1.45 crore records. The structure of the table is

CREATE TABLE `SNEW` (
`ID` int(10) DEFAULT NULL,
`U1` varchar(250) NOT NULL,
`U2` varchar(250) NOT NULL,
`NO` int(10) NOT NULL,
`START` date NOT NULL,
`END` date NOT NULL,
KEY `U1` (`U1`),
KEY `U2` (`U2`),
KEY `NO` (`NO`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 PARTITION BY RANGE ( `NO`) (PARTITION p0 VALUES LESS THAN (2) ENGINE = MyISAM, PARTITION p1 VALUES LESS THAN MAXVALUE ENGINE = MyISAM)

When i explain a query :

explain select * from SNEW WHERE (U1= binary 'jayant' or U2= binary 'kumar') and `NO`>=(2)

I see that none of the indexes are being used. Can you help me out on this

Anonymous said...

wholesale jewelry
handmade jewelry
jewelry wholesale
discount jewelry
handcrafted jewelry
wholesale beads
cheap jewelry