Recently Mark Callaghan blogged about using O_DIRECT for the InnoDB transaction log. He noted that there was not a significant performance gain if the number of concurrent connections increases. I've done my share of testing and retesting over the past months to determine how useful this is. Based on a small TPCC workload (100 warehouses, 64 connections, 1 hour test, 5 minute rampup), I've seen huge performance gains by setting ALL_O_DIRECT for the variable innodb_flush_method using Percona XtraDB.
Without using Direct I/O, the benchmark generated a TpmC score of approximately 24,500 (HP DL160 G6, 2 x Xeon E5620 2.40GHz, 16GB mem, 4x300 GB SAS, RAID-10). After setting the variable to ALL_O_DIRECT, TpmC score went up to 48,000. Huge increase. This deserves some more investigation and some more testing. I want to also try this out on some older hardware to see if similar performance gains can be achieved.
7 comments:
I am happy to learn this makes this faster but I too want to understand why. Can you provide more details? What were the sizes of the innodb buffer pool? How many InnoDB transaction log files and what size? With 16GB of RAM it is more likely that your setup benefits from not wasting RAM for transaction log files courtesy of ALL_O_DIRECT. Do you have HW RAID with battery backed write cache?
innodb_buffer_pool_size=12G
innodb_log_file_size=2G
innodb_log_files_in_group=2
With only 100 warehouses, the db fits in memory.
The HW RAID has battery backed write cache enabled, 256K stripe size, 100% memory for write cache. Underlying filesystem is XFS.
Sounds strange for me anyway.. - did you really observe x2 times more writes on the device where your REDO logs are placed?..
Also, what are exactly the rules for DIRECT I/O on XFS, is there any conditions to have write operations aligned to the block size to be accepted as DIRECT?.. - I mean as REDO log records may vary in size, your writes then will be mostly not aligned to the block size and DIRECT flag will be ignored; but as DIRECT is not supposing to involve fsync() you may simply have classic buffered I/O writes with a background flushing (likely you're using innodb_flush_log_at_trx_commit=0) which may explain the gain..
Anyway, it'll be great to see more details about this test.
Thank you!
Rgds,
-Dimitri
Dmitri, not using innodb_flush_log_at_trx_commit=0. Its set to 1. Also no binary logs, so all disk access on teh XFS filesystem (.ibd & tx log) is using Direct I/O.
I'll have to gather all of the details once I get a free moment (likely very very late this evening).
Partha, you've missed my point :-)
my supposition is:
- when innodb_flush_log_at_trx_commit=1 us used InnoDB will do:
write();
fsync();
for each transaction.
Now, when you're using O_DIRECT, fsync() is no more required as with O_DIRECT write() is supposed to really write to the storage bypassing FS cache.
Then, if XFS is requiring a block size alignment for O_DIRECT operation it'll in this case use buffered I/O for any non-aligned write requests. And as REDO log writes are not aligned to any block size, your DIRECT writes o log files will be transformed to the buffered (cached) writes - exactly the same when the innodb_flush_log_at_trx_commit=0 option is used..
That was my supposition as most of FS are requiring block size alignment to match DIRECT I/O operation.
Rgds,
-Dimitri
In the Facebook patch and XtraDB we don't fsync after O_DIRECT writes unless they extend the file. Transaction log writes are multiples of 512 bytes. Our assumption that is still being tested is that as long as the file system has been setup properly, that such writes on XFS won't require a read prior to the write which also implies that the buffered IO path won't be used. The easy way to confirm that is put the transaction log on its own file system and look at iostat.
Almost forgot, after more testing, this behavior is not the same across CPU types. Xeon E5620 shows this boost in performance while older L5320 does not show this performance boost. Trying tests with L5420 soon.
Post a Comment