Saturday, November 25, 2006

The joys of MySQL 5.1 (Part 2- The Current Problem)

I've been very busy testing and testing MySQL 5.1.12-beta and on the surface, it passes my tests. But first, let me set the stage as far as what we are trying to accomplish at RightMedia...

All of our adserving data is aggregated into our reporting database, which runs MySQL version 5.0.22. The database is so large, that we split it across 6 machines. Each of these machines has a similar configuration: 2 Dual core Opteron CPUs, 16GB of memory and 3 146GB 15k RPM SCSI drives in a RAID-0 set.

Here is what one of our typical tables looks like:

CREATE TABLE `network_daily` (
`entity_id` int(11) NOT NULL default '0',
`buyer_entity_id` int(11) NOT NULL default '0',
`buyer_line_item_id` int(11) NOT NULL default '0',
`seller_entity_id` int(11) NOT NULL default '0',
`seller_line_item_id` int(11) NOT NULL default '0',
`size_id` int(11) NOT NULL default '0',
`pop_type_id` int(11) NOT NULL default '0',
`country_group_id` int(11) NOT NULL default '0',
`is_adjustment` tinyint(4) NOT NULL default '0',
`adv_learn_type` char(1) NOT NULL default '',
`pub_learn_type` char(1) NOT NULL default '',
`frequency` smallint(6) NOT NULL default '0',
`ymdh` datetime NOT NULL default '0000-00-00 00:00:00',
`imps` bigint(20) NOT NULL default '0',
`clicks` int(11) NOT NULL default '0',
`convs` int(11) NOT NULL default '0',
`id` int(10) unsigned NOT NULL default '0',
`checkpoint` int(11) default NULL,
PRIMARY KEY (`id`,`ymdh`),
KEY `ix_nsl_ymdh_buyerli` (`ymdh`,`buyer_line_item_id`),
KEY `ix_nsdl_ymdh_entity_buyer` (`ymdh`,`entity_id`,`buyer_entity_id`),
KEY `ix_nsdl_ymdh_entity_seller` (`ymdh`,`entity_id`,`seller_entity_id`)

Each of the 6 machines holds a portiton of the data for this table (and others) Also, there are 2 other slaves per machine for queries. This is our biggest table, which each machine holding about 170 million records.

This set up was fine for a while, but started to degrade in performance over the past 3 months. Every tweak to InnoDB was done, but still we need more performance. Here is an example of one of these killer queries against this table:

select entity_id, ymdh, buyer_entity_id, buyer_line_item_id, sum(imps),sum(clicks)
from network_daily
where ymdh IN ('2006-11-01 00:00:00', '2006-11-02 00:00:00', '2006-11-03 00:00:00',
'2006-11-04 00:00:00', '2006-11-05 00:00:00', '2006-11-06 00:00:00',
'2006-11-07 00:00:00', '2006-11-08 00:00:00', '2006-11-09 00:00:00',
'2006-11-10 00:00:00', '2006-11-11 00:00:00', '2006-11-12 00:00:00')
and entity_id = 2
group by entity_id, ymdh, buyer_entity_id, buyer_line_item_id
order by entity_id, ymdh, buyer_entity_id, buyer_line_item_id

I ran this query against an idle machine, and the query took 2 minutes and 31 seconds to run. Yuck! If I were a customer waiting for this report (a month-to-date report by day) I would be hitting the reload button quite a few times!

How can we look at optimizing this query? It's hitting the right indexes. The problem is that each day, we load over 700,000 records into this particular table, so that equates to 8.4 million records for this date range. The entity_id column does help with the index scan, so the average number of records per day to look at goes down to 100,000, but that's still 1.2 million records. While it should not take too long to scan through 1.2 million records the rest of the time could be spent building the temporary table for the group by operation. How have we been solving this? Buying more machines and splitting the database! While this has been working, it has come tgo a point that the growth rate of the database will soon outweigh the amount of splitting that can be done. So it's now time for some creative alternatives.

2 that come to mind: Materialized Views and MySQL 5.1 with Partitioning. I'll talk about Materialized Views in another blog (This is one feature that I wish MySQL would implement! This is fairly crucial to large scale reporting). Partitioning in the upcoming MySQL 5.1 release looks very promising. By splitting the underlying storage of the table into smaller tables, the size of the index becomes smaller, and it takes less time and more importantly disk I/O to scan the index to retrieve the data. The previous release, 5.1.11-beta was pretty horrible, especially with InnoDB. About the only thing I could use this for was some minor testing. Any SQL that was used to modify partitions resulted in a server crash and a corrupted database. I just finished compiling the 5.1.12-beta version and installed the RPMS on one of my test systems (exactly the same configuration as that of a production machine). In the next part of this blog, I'll reveal my results.


chesterthetester said...

Isn't the IN (...) killing you? I had a similar situation a few months back. The dataset size was generally the same and was querying based on a date range.

Changing to "ymdh BETWEEN (date1, date2)" took me down from about 90 seconds to 3 or so per query.

Anonymous said...

Good post.
I was searching the internet for cheap internet phone service (Voip) and found a company called Via Talk. They are cheaper and just as good as Vonage.

They are now offering 1 year phone service Free when you purchase 1 year – for a limited time. Check it out at Via Talk

Anonymous said...

Nice blog.
If you’re interested in free weight loss tips please visit this site.

John Tiniakos said...

Hey you have a very good blog.

If by any chance you need weight loss help check out by blog: Weight Loss Tips

Anonymous said...

Car Auctions Nice blog. Check out my blog for the #1 source for car auctions.

Anonymous said...

FREE Business Advertising Tips The Most Powerful Internet Classified Advertising Methods On The Web! "TOP" Rated Money Making Website! A Must See!!!

Anonymous said...

Hand Crafted Dog Art Hi, Love dogs? if so I found this for you. I Tried the dog feeders.

Anonymous said...

This is a great Blog! Pardon if I landed on the wrong blog.
As a newbie I just can’t help sharing the following
url with you. It made al the difference to my financial
Situation. For only $7.00 you get your own
ATM Machine. Just follow the link below!

Anonymous said...

Domin8 MySpace with unique myspace layouts. Check out

Anonymous said...

myspace tutorials and huge profile customizations.

Anonymous said...

看房子,買房子,建商自售,自售,台北新成屋,台北豪宅,新成屋,豪宅,美髮儀器,美髮,儀器,髮型,EMBA,MBA,學位,EMBA,專業認證,認證課程,博士學位,DBA,PHD,在職進修,碩士學位,推廣教育,DBA,進修課程,碩士學位,網路廣告,關鍵字廣告,關鍵字,廣告,課程介紹,學分班,文憑,牛樟芝,段木,牛樟菇,日式料理, 台北居酒屋,燒肉,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,住宿,訂房,HOTEL,飯店,造型系列,學位,牛樟芝,腦磷脂,磷脂絲胺酸,SEO,婚宴,捷運,學區,美髮,儀器,髮型,牛樟芝,腦磷脂,磷脂絲胺酸,看房子,買房子,建商自售,自售,房子,捷運,學區,台北新成屋,台北豪宅,新成屋,豪宅,學位,碩士學位,進修,在職進修, 課程,教育,學位,證照,mba,文憑,學分班,網路廣告,關鍵字廣告,關鍵字,SEO,关键词,网络广告,关键词广告,SEO,关键词,网络广告,关键词广告,SEO,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,居酒屋,燒烤,美髮,儀器,髮型,美髮,儀器,髮型,美髮,儀器,髮型,美髮,儀器,髮型,小套房,小套房,進修,在職進修,留學,證照,MBA,EMBA,留學,MBA,EMBA,留學,進修,在職進修,牛樟芝,段木,牛樟菇,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,住宿,民宿,飯店,旅遊,美容,美髮,整形,造型,設計,室內設計,裝潢,房地產,進修,在職進修,MBA,EMBA,關鍵字排名,網路行銷,关键词排名,网络营销,網路行銷,關鍵字排名,关键词排名,网络营销,羅志祥,周杰倫,五月天,蔡依林,林志玲,羅志祥,周杰倫,五月天,蔡依林,林志玲,PMP,在職專班,研究所在職專班,碩士在職專班,PMP,證照,在職專班,研究所在職專班,碩士在職專班

Anonymous said...

Such was the case when second life linden labs decided to pull the plug on several banks in the game. Some of these banks offered the system to buy lindens what real life banks would an interest rate. Some people get their secondlife money out of the banks says that Second Life depositors may have US$ 750,000 in real life money. The new rules state that only banks with proof of an applicable government registration statement or financial institution character will be able to operate with cheap linden in the game. Second Life has proven to be a popular place for companies to set up virtual shop where many players can exchange their favorite something with all kinds of linden dollars.
Sometimes in the Shadow of Legend game, after winning the enemies, they will understand to share the shadow of legend Gold as their trophy. A beautiful and rich sol gold environment coupled with an engrossing storyline in this Shadow of Legend game. A once beautiful and lush world and the first process you have to buy shadow of legend Gold, the game in fact is full of creatures and inhabitants. He constant warring left the fate of the inhabitants with cheap shadow of legend Gold of Agnes at the hands of the gods. Shadow of legend and the necessary or useful shadow of legend money is the answer for you.