All of our adserving data is aggregated into our reporting database, which runs MySQL version 5.0.22. The database is so large, that we split it across 6 machines. Each of these machines has a similar configuration: 2 Dual core Opteron CPUs, 16GB of memory and 3 146GB 15k RPM SCSI drives in a RAID-0 set.
Here is what one of our typical tables looks like:
CREATE TABLE `network_daily` (
`entity_id` int(11) NOT NULL default '0',
`buyer_entity_id` int(11) NOT NULL default '0',
`buyer_line_item_id` int(11) NOT NULL default '0',
`seller_entity_id` int(11) NOT NULL default '0',
`seller_line_item_id` int(11) NOT NULL default '0',
`size_id` int(11) NOT NULL default '0',
`pop_type_id` int(11) NOT NULL default '0',
`country_group_id` int(11) NOT NULL default '0',
`is_adjustment` tinyint(4) NOT NULL default '0',
`adv_learn_type` char(1) NOT NULL default '',
`pub_learn_type` char(1) NOT NULL default '',
`frequency` smallint(6) NOT NULL default '0',
`ymdh` datetime NOT NULL default '0000-00-00 00:00:00',
`imps` bigint(20) NOT NULL default '0',
`clicks` int(11) NOT NULL default '0',
`convs` int(11) NOT NULL default '0',
`id` int(10) unsigned NOT NULL default '0',
`checkpoint` int(11) default NULL,
PRIMARY KEY (`id`,`ymdh`),
KEY `ix_nsl_ymdh_buyerli` (`ymdh`,`buyer_line_item_id`),
KEY `ix_nsdl_ymdh_entity_buyer` (`ymdh`,`entity_id`,`buyer_entity_id`),
KEY `ix_nsdl_ymdh_entity_seller` (`ymdh`,`entity_id`,`seller_entity_id`)
Each of the 6 machines holds a portiton of the data for this table (and others) Also, there are 2 other slaves per machine for queries. This is our biggest table, which each machine holding about 170 million records.
This set up was fine for a while, but started to degrade in performance over the past 3 months. Every tweak to InnoDB was done, but still we need more performance. Here is an example of one of these killer queries against this table:
select entity_id, ymdh, buyer_entity_id, buyer_line_item_id, sum(imps),sum(clicks)
where ymdh IN ('2006-11-01 00:00:00', '2006-11-02 00:00:00', '2006-11-03 00:00:00',
'2006-11-04 00:00:00', '2006-11-05 00:00:00', '2006-11-06 00:00:00',
'2006-11-07 00:00:00', '2006-11-08 00:00:00', '2006-11-09 00:00:00',
'2006-11-10 00:00:00', '2006-11-11 00:00:00', '2006-11-12 00:00:00')
and entity_id = 2
group by entity_id, ymdh, buyer_entity_id, buyer_line_item_id
order by entity_id, ymdh, buyer_entity_id, buyer_line_item_id
I ran this query against an idle machine, and the query took 2 minutes and 31 seconds to run. Yuck! If I were a customer waiting for this report (a month-to-date report by day) I would be hitting the reload button quite a few times!
How can we look at optimizing this query? It's hitting the right indexes. The problem is that each day, we load over 700,000 records into this particular table, so that equates to 8.4 million records for this date range. The
entity_idcolumn does help with the index scan, so the average number of records per day to look at goes down to 100,000, but that's still 1.2 million records. While it should not take too long to scan through 1.2 million records the rest of the time could be spent building the temporary table for the group by operation. How have we been solving this? Buying more machines and splitting the database! While this has been working, it has come tgo a point that the growth rate of the database will soon outweigh the amount of splitting that can be done. So it's now time for some creative alternatives.
2 that come to mind: Materialized Views and MySQL 5.1 with Partitioning. I'll talk about Materialized Views in another blog (This is one feature that I wish MySQL would implement! This is fairly crucial to large scale reporting). Partitioning in the upcoming MySQL 5.1 release looks very promising. By splitting the underlying storage of the table into smaller tables, the size of the index becomes smaller, and it takes less time and more importantly disk I/O to scan the index to retrieve the data. The previous release, 5.1.11-beta was pretty horrible, especially with InnoDB. About the only thing I could use this for was some minor testing. Any SQL that was used to modify partitions resulted in a server crash and a corrupted database. I just finished compiling the 5.1.12-beta version and installed the RPMS on one of my test systems (exactly the same configuration as that of a production machine). In the next part of this blog, I'll reveal my results.