Partha's Blog: 2006

Sunday, December 17, 2006

Primary Key Order Does Matter!

There have been a few posts on PlanetMySQL regarding primary keys and the importance of choosing the right one. This is even more important when the table uses InnoDB. You've read different posts of why it is so important. Now, I'm all about benchmarks and showing the details. So I'll take a table from my previous posts about MySQL 5.1 partitioning and show what I found.

This table was created under MySQL 5.1.12-beta:


CREATE TABLE `big_table_test1` (
`entity_id` int(11) NOT NULL DEFAULT '0',
`col1` int(11) NOT NULL DEFAULT '0',
`col2` int(11) NOT NULL DEFAULT '0',
`col3` int(11) NOT NULL DEFAULT '0',
`col4` int(11) NOT NULL DEFAULT '0',
`col5` int(11) NOT NULL DEFAULT '0',
`col6` int(11) NOT NULL DEFAULT '0',
`ymdh` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`imps` bigint(20) NOT NULL DEFAULT '0',
`clicks` int(11) NOT NULL DEFAULT '0',
`convs` int(11) NOT NULL DEFAULT '0',
`id` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`,`ymdh`),
KEY `ix_big1` (`ymdh`,`entity_id`,`col3`) USING BTREE,
KEY `ix_big2` (`ymdh`,`entity_id`,`col4`) USING BTREE,
KEY `ix_big3` (`ymdh`,`entity_id`,`col2`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPACT

I loaded about 180 million records into this table (a small set of data for us!) and ran one of our really popular types of queries:


SELECT col1,col2,col3,SUM(imps),SUM(clicks),SUM(convs)
FROM big_table_test1
WHERE ymdh IN ('2006-10-01 00:00:00','2006-10-02 00:00:00','2006-10-03 00:00:00',
          '2006-10-04 00:00:00','2006-10-05 00:00:00','2006-10-06 00:00:00',
          '2006-10-07 00:00:00')
AND entity_id = 2
GROUP BY col1, col2, col3
ORDER BY col1, col2, col3;

Doesn't look terribly nasty does it? This query takes about 7 MINUTES to run!!! EXPLAIN on the query shows nothing out of the ordinary, as it uses one of the secondary indexes on the table. The cardinality of entity_id is not really high, so forcing one of the secondary indexes over another wouldn't yield any performance benefits. The id column is basically a numerical hash of the tables "real" primary key, which is entity_id plus col1 through col6, and is used for uniqueness. What's interesting is that throughout our application, there are no direct queries against this id column. It just exists. But, it can't be removed.

If this column really serves no really significant value, what if we swapped the order of the definition of the primary key? So the definition of the primary key looks like:


PRIMARY KEY (`ymdh`,`id`)

Logically, no difference so we do not break any uniqueness constraints in the application. If we run the query again, 4 SECONDS!!!! Wow! How do we explain this massive performance increase?

Remember that InnoDB uses a clustered index for the primary key. Clustered indexes are indexes that are built based on the same key by which the data is ordered on disk. They are very efficient during scanning, but have performance implications when inserting new data, as some re-ordering may need to be done. All of our data is inserted in ymdh column order, so it makes sense if the primary key was based on this column. There are a lot of efficiencies that can be obtained, such as sequential disk read-ahead. The previous index for the primary key needs lots of random disk I/O to read the data portion of the table.

Monday, December 04, 2006

MySQL Partitioning 5.1 - Part 5 Slowdown Problem Solved!

Finally figured out what was causing the lack of performance on the
partitions with the query. The use of FORCE INDEX was causing the slowdown
with the partitioned table. Once I removed the clause, the query ran in 1
minute 19 seconds, which is more in line with expectations. Sorry for any
inconvenience!

MySQL 5.1 Partitioning - Part 4 (Results)

In my previous post I started out by setting up MySQL 5.1.12 on a box in order to test the performance of the new partitioning option. During testing, I noted that I did not see any noticeable performance improvements from using partitioning. So I spent some time Saturday and Sunday (I guess I don't have anything else better to do!) to build the testing environment and perform the tests. So I was wrong, but only slightly. Partitioning does show better performance than standard tables, but not by as much as you would think. But, wait, there is light at the end of the tunnel (as well as a WTF). The numbers...

Table Type	Elapsed Time
Normal	7 minutes, 41 seconds
Partitioned	5 minutes, 51 seconds

By partitioning, we gain about 25% improvement in performance. Not bad, but not great. I would have expected more than 25%, given that I loaded approximately 110 million records for the test, and there are approximately 650,000 records per day for the range for the test.

Here are the details of testing:

Machine is an HP DL385 with 16GB of memory, 1 73GB 15k RPM SCSI drive for boot, root and /var partitions. 3 146GB 15k RPM SCSI drives in RAID-0 stripe (Hardware RAID).
Using CentOS 4.3 as operating system
MySQL version 5.1.12-beta (Compiled from SRPM)
InnoDB Parameters
- innodb_buffer_pool_size = 11G
- innodb_file_per_table
- innodb_open_files = 1000
- innodb_log_file_size = 2000M
- innodb_flush_method = O_DIRECT
- innodb_flush_log_at_trx_commit = 1
- innodb_support_xa = 0
Prior to each test run, the database server was restarted, and mysql was run with the database name (It takes approximately 40 seconds for mysql to run the first time as it caches the table information)
The test query was run 3 times (each with a db server restart) and the average of the 3 runs taken.
SQL (Non-partitioned): select entity_id,buyer_entity_id,buyer_line_item_id,sum(roi_cost) from network_daily_local_nonpart force index (ix_nsdl_ymdh_entity_seller) where ymdh in ('2006-11-01 00:00:00', '2006-11-02 00:00:00', '2006-11-03 00:00:00', '2006-11-04 00:00:00', '2006-11-05 00:00:00', '2006-11-06 00:00:00', '2006-11-07 00:00:00', '2006-11-08 00:00:00', '2006-11-09 00:00:00', '2006-11-10 00:00:00', '2006-11-11 00:00:00', '2006-11-12 00:00:00')and entity_id = 2 group by entity_id,buyer_entity_id,buyer_line_item_id order by entity_id,buyer_entity_id,buyer_line_item_id
SQL (Partitioned): select entity_id,buyer_entity_id,buyer_line_item_id,sum(roi_cost) from network_daily_local_part force index (ix_nsdl_ymdh_entity_seller) where ymdh in ('2006-11-01 00:00:00', '2006-11-02 00:00:00', '2006-11-03 00:00:00', '2006-11-04 00:00:00', '2006-11-05 00:00:00', '2006-11-06 00:00:00', '2006-11-07 00:00:00', '2006-11-08 00:00:00', '2006-11-09 00:00:00', '2006-11-10 00:00:00', '2006-11-11 00:00:00', '2006-11-12 00:00:00')and entity_id = 2 group by entity_id,buyer_entity_id,buyer_line_item_id order by entity_id,buyer_entity_id,buyer_line_item_id

Now, one thing at RightMedia, is we have LOTS of data. In fact there is some skew on the entity_id column and the majority of the records have the value listed in the SQL above. That still should not explain huge increases in performance, should it?

Okay, one last experiment (It's 3:46AM EST!!) I am going to subpartition the partitioned table into 7 subpartitions, hash by the entity_id column.

Result: 14.32 seconds! Holy friggin' turbo mode Batman! It's way too late to be trying to quantify performance gain, so I'm quantifying it by WTF units. I'd really like to get an explanation of what is really going on behind the scenes that yields this type of performance gain, and why I don't see at least 1 or 2 WTF units of gain on the original partitioned table, given that the amount of data is much less for the date range compared to the whole table.

Sunday, December 03, 2006

MySQL 5.1 Partitioning - Part 3

Finally, it's time to start putting MySQL 5.1.12-beta through the wringer. First order of business, convert the existing table schema to one that supports partitioning...

I made some minor changes to the configuration for partitioning, namely innodb_file_per_table and innodb_open_files. I set innodb_open_files to 1000 based on the tables and partitions I plan on supporting.

This is what the new table schema looks like with partitioning:


CREATE TABLE `network_daily` (
  `entity_id` int(11) NOT NULL default '0',
  `buyer_entity_id` int(11) NOT NULL default '0',
  `buyer_line_item_id` int(11) NOT NULL default '0',
  `seller_entity_id` int(11) NOT NULL default '0',
  `seller_line_item_id` int(11) NOT NULL default '0',
  `size_id` int(11) NOT NULL default '0',
  `pop_type_id` int(11) NOT NULL default '0',
  `country_group_id` int(11) NOT NULL default '0',
  `is_adjustment` tinyint(4) NOT NULL default '0',
  `adv_learn_type` char(1) NOT NULL default '',
  `pub_learn_type` char(1) NOT NULL default '',
  `frequency` smallint(6) NOT NULL default '0',
  `ymdh` datetime NOT NULL default '0000-00-00 00:00:00',
  `imps` bigint(20) NOT NULL default '0',
  `clicks` int(11) NOT NULL default '0',
  `convs` int(11) NOT NULL default '0',
  `id` int(10) unsigned NOT NULL default '0',
  `checkpoint` int(11) default NULL,
  PRIMARY KEY  (`id`,`ymdh`),
  KEY `ix_nsl_ymdh_buyerli` (`ymdh`,`buyer_line_item_id`),
  KEY `ix_nsdl_ymdh_entity_buyer` (`ymdh`,`entity_id`,`buyer_entity_id`),
  KEY `ix_nsdl_ymdh_entity_seller` (`ymdh`,`entity_id`,`seller_entity_id`)
) ENGINE=InnoDB
PARTITION BY RANGE (TO_DAYS(`ymdh`))
(
  PARTITION p2005 VALUES LESS THAN (TO_DAYS('2006-01-01')),
  PARTITION p200601 VALUES LESS THAN (TO_DAYS('2006-02-01')),
  PARTITION P200602 VALUES LESS THAN (TO_DAYS('2006-03-01')),

  ...

  PARTITION P200609 VALUES LESS THAN (TO_DAYS('2006-10-01')),
  PARTITION P20061001 VALUES LESS THAN (TO_DAYS('2006-10-02')),
  PARTITION P20061002 VALUES LESS THAN (TO_DAYS('2006-10-03')),
  PARTITION P20061003 VALUES LESS THAN (TO_DAYS('2006-10-04')),

  ...

  PARTITION P20061130 VALUES LESS THAN (TO_DAYS('2006-12-01'))
);

There are a lot of partitions that I have defined. I need to keep a rolling 60 days of daily partitions active. The plan is to use the ALTER TABLE REORGANIZE PARTITION statement to merge the older partitions together once per day, and to add a new partition once per day.

After the data was reloaded, it was time to test the performance, and the ability to add partitions and reorganize them (this was broken in version 5.1.11-beta).

Performance, surprisingly wasn't what I expected. Queries that ran on the partitions were about the same performance wise as those on the unpartitioned table. Hmm, I have to double check my results. I'll post all of the performance data.

The real good news is that the partition maintenance commands all worked with InnoDB! Dropping & Reorganizing partitions worked perfectly. I'll have to redo my testing and to see what happened to the performance.

Saturday, November 25, 2006

The joys of MySQL 5.1 (Part 2- The Current Problem)

I've been very busy testing and testing MySQL 5.1.12-beta and on the surface, it passes my tests. But first, let me set the stage as far as what we are trying to accomplish at RightMedia...

All of our adserving data is aggregated into our reporting database, which runs MySQL version 5.0.22. The database is so large, that we split it across 6 machines. Each of these machines has a similar configuration: 2 Dual core Opteron CPUs, 16GB of memory and 3 146GB 15k RPM SCSI drives in a RAID-0 set.

Here is what one of our typical tables looks like:


CREATE TABLE `network_daily` (
  `entity_id` int(11) NOT NULL default '0',
  `buyer_entity_id` int(11) NOT NULL default '0',
  `buyer_line_item_id` int(11) NOT NULL default '0',
  `seller_entity_id` int(11) NOT NULL default '0',
  `seller_line_item_id` int(11) NOT NULL default '0',
  `size_id` int(11) NOT NULL default '0',
  `pop_type_id` int(11) NOT NULL default '0',
  `country_group_id` int(11) NOT NULL default '0',
  `is_adjustment` tinyint(4) NOT NULL default '0',
  `adv_learn_type` char(1) NOT NULL default '',
  `pub_learn_type` char(1) NOT NULL default '',
  `frequency` smallint(6) NOT NULL default '0',
  `ymdh` datetime NOT NULL default '0000-00-00 00:00:00',
  `imps` bigint(20) NOT NULL default '0',
  `clicks` int(11) NOT NULL default '0',
  `convs` int(11) NOT NULL default '0',
  `id` int(10) unsigned NOT NULL default '0',
  `checkpoint` int(11) default NULL,
  PRIMARY KEY  (`id`,`ymdh`),
  KEY `ix_nsl_ymdh_buyerli` (`ymdh`,`buyer_line_item_id`),
  KEY `ix_nsdl_ymdh_entity_buyer` (`ymdh`,`entity_id`,`buyer_entity_id`),
  KEY `ix_nsdl_ymdh_entity_seller` (`ymdh`,`entity_id`,`seller_entity_id`)
) ENGINE=InnoDB

Each of the 6 machines holds a portiton of the data for this table (and others) Also, there are 2 other slaves per machine for queries. This is our biggest table, which each machine holding about 170 million records.

This set up was fine for a while, but started to degrade in performance over the past 3 months. Every tweak to InnoDB was done, but still we need more performance. Here is an example of one of these killer queries against this table:


select entity_id, ymdh, buyer_entity_id, buyer_line_item_id, sum(imps),sum(clicks)
from network_daily
where  ymdh IN ('2006-11-01 00:00:00', '2006-11-02 00:00:00', '2006-11-03 00:00:00',
'2006-11-04 00:00:00', '2006-11-05 00:00:00', '2006-11-06 00:00:00',
'2006-11-07 00:00:00', '2006-11-08 00:00:00', '2006-11-09 00:00:00',
'2006-11-10 00:00:00', '2006-11-11 00:00:00', '2006-11-12 00:00:00')
and entity_id = 2
group by entity_id, ymdh, buyer_entity_id, buyer_line_item_id
order by entity_id, ymdh, buyer_entity_id, buyer_line_item_id

I ran this query against an idle machine, and the query took 2 minutes and 31 seconds to run. Yuck! If I were a customer waiting for this report (a month-to-date report by day) I would be hitting the reload button quite a few times!

How can we look at optimizing this query? It's hitting the right indexes. The problem is that each day, we load over 700,000 records into this particular table, so that equates to 8.4 million records for this date range. The entity_id column does help with the index scan, so the average number of records per day to look at goes down to 100,000, but that's still 1.2 million records. While it should not take too long to scan through 1.2 million records the rest of the time could be spent building the temporary table for the group by operation. How have we been solving this? Buying more machines and splitting the database! While this has been working, it has come tgo a point that the growth rate of the database will soon outweigh the amount of splitting that can be done. So it's now time for some creative alternatives.

2 that come to mind: Materialized Views and MySQL 5.1 with Partitioning. I'll talk about Materialized Views in another blog (This is one feature that I wish MySQL would implement! This is fairly crucial to large scale reporting). Partitioning in the upcoming MySQL 5.1 release looks very promising. By splitting the underlying storage of the table into smaller tables, the size of the index becomes smaller, and it takes less time and more importantly disk I/O to scan the index to retrieve the data. The previous release, 5.1.11-beta was pretty horrible, especially with InnoDB. About the only thing I could use this for was some minor testing. Any SQL that was used to modify partitions resulted in a server crash and a corrupted database. I just finished compiling the 5.1.12-beta version and installed the RPMS on one of my test systems (exactly the same configuration as that of a production machine). In the next part of this blog, I'll reveal my results.

Saturday, November 11, 2006

Thye joys of MySQL 5.1

It's been a while since I blogged. I've been busier than you can imagine.
Anyways, I finally got the opportunity to start playing with MySQL 5.1 beta and installed 5.1.12-beta on one of our beta database srevers to startt testing functionality. I'm really excited abiut this version, particularly Partitioning with InnoDB. 5.1.11-beta didn;t work at all. I'm about to put it through its tests and really start testing the performance of this new version. Stay tuned!

Saturday, July 22, 2006

Looking at running queries SHOW FULL PROCESSLIST

One of the most common commands that a MySQL DBA performs is the SHOW FULL PROCESSLIST command. I can't tell you how many thousands of times I've run this. The problem is that there is too much information that I need to filter out before I can focus on the queries that matter. Fore xample, I don't want to see any idle connections nor do I want to see my own connection. The other problem is that if most queries run in under a second, I miss a lot of queries.

Problem solved!

I have posted a very handy script named proclist.pl to MySQL Forge.This script can be used to periodically poll the server for queries, with millisecond precision. Moreover, the output can be imported into a MySQL database for further analysis.

Saturday, July 08, 2006

More info on iSCSI benchmark

Looks like a lot of interest was generated from the results of RightMedia's iSCSI benchmarking. Peter Zaitsev is curious about the environment of the benchmark. So here's all the gory details
MySQL version : 5.0.21
System : GenToo Linux, kernel version 2.6.16-gentoo-r2-1
Hardware : 4GB memory, 2 Dual Core AMD Opteron 275 chips
MySQL configuration file (abridged version):


[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock

[mysqld]
basedir = /usr
datadir = /var/lib/mysql/data
port = 3306
pid-file = /var/run/mysqld/mysqld.pid
log-error = /var/log/mysql/mysqld.err
socket = /var/run/mysqld/mysqld.sock
user = mysql
max_connections = 1024
table_cache = 128
thread_case_size = 32
sort_buffer_size = 64M
innodb_additional_mem_pool_size = 20M
innodb_buffer_pool_size = 3000M
innodb_data_home_dir = /var/lib/mysql/inno
innodb_log_group_home_dir = /var/lib/mysql/log/inno
innodb_log_file_size = 2000M
innodb_log_files_in_group = 2
innodb_log_buffer_size = 8M
innodb_flush_trx_at_commit = 2
innodb_autoincrement_size = 32M
innodb_status_file
innodb_thread_concurrency = 4
innodb_flush_method = O_DIRECT
innodb_support_xa = 0

sysbench v0.4.3
Command used to prepare 100,000,000 record sbtest database:


sysbench --num-threads=1 --test=oltp --mysql-user=root --oltp-table-size=100000000 --mysql-table-type=innodb prepare

For each SysBench run:


sysbench --num-threads=# --test=oltp --mysql-user=root --oltp-table-size=100000000 --mysql-table-type=innodb --max-requests=10000 --oltp-test-mode=complex run

Before each SysBench run, the kernel pagecache was also cleaned out:


sync ; echo 1 > /proc/sys/vm/drop_caches

This was done to provide a fair & clean test. For each batch of runs by thread, 4 runs were made,with the best TPS used as the final number for the batch.

I am pretty happy with the results. I plan on putting for iSCSI benchmark results soon, as I think a lot of folks would be interested in the results, especially if the iSCSI device scales well.

Tuesday, July 04, 2006

MySQL sysbench benchmarking - iSCSI

Our company, RightMedia is in the process of evaluating a storage subsystem for our ever growing MySQL databases. It seems that our data is growing at an alarming rate (this is a good thing) and our internal SCSI disks aren't handling the load of massive inserts, plus large queries. So the time is now to evaluate a storage subsystem that will grow with us and scale for our performance needs.

I was really reluctant when I first heard iSCSI proposed as a potential solution. I am familiar with large storage subsystems all hooked in through fibre channel. It's an expensive solution, but performance is great. There are a few iSCSI vendors, and we had lots of problems with kernel version & driver version. But, we perservered, and have systems where we can benchmark the performance of iSCSI.

Now before I get to the actual numbers, let me give you the environment for the testing. I have tried to use as much identical hardware as possible for testing. All of the tests were done with Gentoo Linux, kernel version 2.6.16. The machines have at least 8GB memory (one has 16GB of memory), and all have 2 dual core Opteron CPUS. MySQL version 5.0.21 was used for testing, and InnoDB used as the storage engine. The following types of drives were used for benchmarking: 1) Single 7200 RPM 200GB SATA drive 2) 3 15,000 RPM 73GB SCSI drives in a RAID-0 configuration (128k stripe size) 3) 4 disk RAID-10 configuration on an iSCSI volume (actually using 8 disks) 4) 8 disk RAID-10 configuration on an iSCSI volume (16 disks,, 1M stripe size)

Without further ado, here is a chart of the results:

Shocking isn't it? The iSCSI disk outperforms even a local SCSI array? I found this hard to believe myself, but it gives me more hope that this might be a viable solution. I am still conducting lots of tests on teh device to make sure that the machine scales, and I'll be posting results soon.

Friday, April 28, 2006

Fun at the MySQL UC

What a blast! It was great to see most of my old colleagues from MySQL when I worked there as a Professional Consultant. Lots of cool sessions and great turnout. By far, the best session was Jim Starkey's intro to Falcon. Even Heikki was there listening intently! I am looking forward to all of the great things MySQL has in store in upcoming versions.

Partha's Blog