Tuesday, June 28, 2011

Effects of ALL_O_DIRECT on a TPCC workload

Recently Mark Callaghan blogged about using O_DIRECT for the InnoDB transaction log. He noted that there was not a significant performance gain if the number of concurrent connections increases. I've done my share of testing and retesting over the past months to determine how useful this is. Based on a small TPCC workload (100 warehouses, 64 connections, 1 hour test, 5 minute rampup), I've seen huge performance gains by setting ALL_O_DIRECT for the variable innodb_flush_method using Percona XtraDB.

Without using Direct I/O, the benchmark generated a TpmC score of approximately 24,500 (HP DL160 G6, 2 x Xeon E5620 2.40GHz, 16GB mem, 4x300 GB SAS, RAID-10). After setting the variable to ALL_O_DIRECT, TpmC score went up to 48,000. Huge increase. This deserves some more investigation and some more testing. I want to also try this out on some older hardware to see if similar performance gains can be achieved.

Friday, May 20, 2011

Effect of changing InnoDB Page Size

Was bored so I started playing with what the performance impact of changing the innodb page size was. Interesting results from XtraDB (which makes it oh so easy to change the page size). I won't go into the gory details but the testing performed was against a machine with 6x146GB SAS drives and a read-only sysbench test (50 million records, 10GB innodb buffer pool):


The graph shows that the good old defaults work quite nicely. I'll have to get my hands on some machines with SSD drives and try this again :)

Saturday, April 23, 2011

Covering indexes in MySQL - revisited (with benchmark)

In the process of building a new benchmark tool for Yahoo, I needed a good "guinea pig." I think I found the one by showing how much more powerful covering indexes can be with InnoDB. A covering index is one where the index itself contains all of the necessary data field(s). In other words, no need to access the data page!

Here's a sample table:
CREATE TABLE entity (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`entity_type` enum('Advertiser','Publisher','Network') NOT NULL DEFAULT 'Advertiser',
`managing_entity_id` int(10) unsigned DEFAULT NULL,
....
PRIMARY KEY (`id`),
KEY `ix_entity_managing_entity_id` (`managing_entity_id`),
....
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

It's a wide table with 88 some odd columns with an average row length of 240 bytes.

Now to test the QPS difference between a query like this:
SELECT * FROM entity WHERE entity.managing_entity_id = X;

and a query that uses a covering index:
SELECT entity.managing_entity_id FROM entity WHERE entity.managing_entity_id = X;

I have an SQL file that contains 2000 SQL statements with fairly random ids, and I'll run my benchmark program with 4 clients a few times to get a good QPS reading (throwing out the low & high numbers).

Results with SELECT *:
Tot Q=2000,Tot T=1.757571,QPS=1137.93411475269,Min=0.0006665,Max=0.027022,90th Pctl=0.000946
Tot Q=2000,Tot T=1.770522,QPS=1129.61036349732,Min=0.0006635,Max=0.026858,90th Pctl=0.00096675
Tot Q=2000,Tot T=1.753147,QPS=1140.80564835693,Min=0.00066325,Max=0.026966,90th Pctl=0.00095175

Results with covering index:
Tot Q=2000,Tot T=0.703581,QPS=2842.60092299252,Min=0.00026625,Max=0.014821,90th Pctl=0.00035325
Tot Q=2000,Tot T=0.897792,QPS=2227.6874821785,Min=0.00026625,Max=0.04546425,90th Pctl=0.0003595
Tot Q=2000,Tot T=0.720616,QPS=2775.403266094,Min=0.000257,Max=0.01566475,90th Pctl=0.0003452

Fairly significant results. I've seen many times where SQL has been hastily written just to perform a simple check if a record exists or not. I've always tried to rationalize this with the developers I work with, but I never had the raw numbers to show the exact impact. If the query happens only a handful of times, not a huge deal. But, if it happens hundreds of thousands or millions of times per day, then that's a different story, and making a very simple rewrite of the SQL helps immensely, and could even save on cost, if you're thinking about adding another slave to spread out load.

Thursday, April 07, 2011

Slamming the door shut on mysql server access

Has there been a situation when you needed to perform some serious maintenance work on your db server, and you just don't want anyone at all to connect to the db server?

Here's a one liner:
set global init_connect="kill connection_id();"

(I should have posted this 6 days ago)

Friday, March 18, 2011

Planet MySQL RSS feeds not working?

Not sure if its just me, but has anyone else been having trouble with the RSS feeds? From Safari, it looks like there are no new posts for the last 3 days. I thought it was Safari (after upgrading to 5.0.4) but I see the same issue with Google Reader?? Or have I suddenly developed a case of stupid-itus?

Saturday, February 26, 2011

Why the Facebook/Percona versions of MySQL are so much better

One of my roles at Yahoo is to provide a rock solid infrastructure for MySQL-based projects. In order to provide that infrastructure, I have to satisfy many groups at Yahoo: Developers need an easy way of deploying the db server in their environment. Software Engineering (Operations) is looking for features to help scale MySQL, and DBAs are looking for more knobs and gauges to help tune the server. After about one year of attempting to manage the infrastructure, and following the MySQL sagas, my job is getting easier thanks to companies like Facebook and Percona as well as one of my predecessors (jcole) for releasing quality code that extends MySQL in so many ways than just raw performance. I've been able to take the best code from each company and mold together an internal version that satisfies all crowds. MySQL/Oracle, you should do the same.

Tuesday, February 15, 2011

Are you using the deadline scheduler? (Part 2)

The deadline scheduler has many advantages over the cfq scheduler, which is the default in operating systems like RedHat Enterprise Linux. In my previous post, I quickly showed how much of a performance gain can be had by switching to the deadline scheduler. Now I will show some real performance numbers for different RAID configurations.

2 disk RAID-0 4 disk RAID-0 6 disk RAID-5 6 disk RAID-10
CFQ 461.467 947.067 845.943 862.763
Deadline 851.067 2876.933 2145.866 2580.430

All of the tests were performed on a Dell PowerEdge 2950 with 2xQuad Core Xeon,16GB of memory and 6x146Gb SAS drives on a Perc/5 RAID controller, and all filesystems were standard EXT3. The TPCC benchmarks were conducted with a smallish buffer pool (2GB) and a 1GB log file size. The database is approximately 7GB in size (100 warehouses). I wanted to show what performance an I/O bound test would yield. The numbers here show that its possible just to get that added boost without resorting to re-creating the entire database with a different filesystem (like XFS). I will come back to XFS later as it provides the best performance.