To assess the performance impact of this, I developed a simple benchmark that attempts to initialize and flush many pages.
--source include/have_innodb.inc
|
--source include/have_sequence.inc
|
|
CREATE TABLE t1 (
|
a BIGINT PRIMARY KEY,
|
b CHAR(255) NOT NULL DEFAULT '',
|
c CHAR(255) NOT NULL DEFAULT '',
|
d CHAR(255) NOT NULL DEFAULT '',
|
e CHAR(255) NOT NULL DEFAULT '',
|
f CHAR(255) NOT NULL DEFAULT '',
|
g CHAR(255) NOT NULL DEFAULT '',
|
h CHAR(255) NOT NULL DEFAULT ''
|
) ENGINE=InnoDB;
|
INSERT INTO t1 (a) SELECT seq FROM seq_1_to_1000000;
|
FLUSH TABLES t1 FOR EXPORT;
|
UNLOCK TABLES;
|
DROP TABLE t1;
|
Notes:
- Without the FLUSH TABLES t1 FOR EXPORT, the test would complete very fast, because no pages would have been flushed before the DROP TABLES executes.
- The first-time run of the test is slower, because undo log pages will be allocated for the huge transaction. On subsequent runs, the undo pages will be reused.
- Due to
MDEV-12288, the pages backed by t1.ibd will be modified twice: first by the initial INSERT, and then by purge, which will reset the DB_TRX_ID.
I made mysql-test/var a symlink to a directory on a Linux ext4 file system residing on an Intel Optane 960 NVMe SSD and ran the test:
for opt in '' --mysqld=--skip-innodb-{log-optimize-ddl,doublewrite}
|
do ./mtr --repeat=10 $opt --mysqld=--innodb-page-size=4k innodb.writebench
|
done
|
Here are the results:
10.5 d46db415cef910c65b0e550a9d00fe188cfd8910 configuration |
first round |
minimum |
maximum |
average |
baseline (doublewrite buffer enabled) |
36175ms |
26611ms |
28623ms |
27673ms |
--skip-log-optimize-ddl (optimization enabled) |
35855ms |
26415ms |
28976ms |
27508ms |
--skip-innodb-doublewrite |
33331ms |
24392ms |
25865ms |
25166ms |
Because the difference is so small, I repeated the test. On the subsequent test, the average times of the 9 subsequent runs were 27975ms, 26353ms, 25786ms. So, the patch seems to give some benefit. I got similar numbers on /dev/shm (only a little faster for the skip-innodb-doublewrite case).
Based on some additional instrumentation, it turned out that this test case is causing pages to be flushed twice, both before and after the purge:
diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
|
index aa44a1c5e4c..3534f846211 100644
|
--- a/storage/innobase/buf/buf0flu.cc
|
+++ b/storage/innobase/buf/buf0flu.cc
|
@@ -1043,6 +1043,8 @@ buf_flush_write_block_low(
|
const bool use_doublewrite = !bpage->init_on_flush
|
&& space->use_doublewrite();
|
|
+ ib::info() << "flush " << use_doublewrite << bpage->id;
|
+
|
if (!use_doublewrite) {
|
ulint type = IORequest::WRITE | IORequest::DO_NOT_WAKE;
|
|
diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
|
index e6bddd32438..5a5682236ed 100644
|
--- a/storage/innobase/handler/ha_innodb.cc
|
+++ b/storage/innobase/handler/ha_innodb.cc
|
@@ -15462,6 +15462,7 @@ ha_innobase::external_lock(
|
DBUG_RETURN(HA_ERR_NO_SUCH_TABLE);
|
}
|
|
+ ib::info() << "flush " << m_prebuilt->table->name;
|
row_quiesce_table_start(m_prebuilt->table, trx);
|
|
/* Use the transaction instance to track UNLOCK
|
I revised the test case accordingly (explicitly wait for purge before initiating the page flush) and reverted the patch for running tests.
--source include/have_innodb.inc
|
--source include/have_sequence.inc
|
|
SET @saved_frequency = @@GLOBAL.innodb_purge_rseg_truncate_frequency;
|
SET GLOBAL innodb_purge_rseg_truncate_frequency=1;
|
|
CREATE TABLE t1 (
|
a BIGINT PRIMARY KEY,
|
b CHAR(255) NOT NULL DEFAULT '',
|
c CHAR(255) NOT NULL DEFAULT '',
|
d CHAR(255) NOT NULL DEFAULT '',
|
e CHAR(255) NOT NULL DEFAULT '',
|
f CHAR(255) NOT NULL DEFAULT '',
|
g CHAR(255) NOT NULL DEFAULT '',
|
h CHAR(255) NOT NULL DEFAULT ''
|
) ENGINE=InnoDB;
|
INSERT INTO t1 (a) SELECT seq FROM seq_1_to_1000000;
|
--source include/wait_all_purged.inc
|
|
FLUSH TABLES t1 FOR EXPORT;
|
UNLOCK TABLES;
|
DROP TABLE t1;
|
SET GLOBAL innodb_purge_rseg_truncate_frequency=@saved_frequency;
|
I also ensured that the buffer pool is large enough:
for opt in '' --mysqld=--skip-innodb-{log-optimize-ddl,doublewrite}
|
do ./mtr --repeat=5 $opt --mysqld=--innodb-buffer-pool-size=32m \
|
--mysqld=--skip-innodb-buffer-pool-{load-at-startup,dump-at-shutdown} \
|
--mysqld=--innodb-page-size=4k innodb.writebench
|
done
|
With this test, we are seeing a similar difference:
10.5 d46db415cef910c65b0e550a9d00fe188cfd8910 configuration |
first round |
minimum |
maximum |
average |
baseline (doublewrite buffer enabled) |
68613ms |
38739ms |
39035ms |
38871ms |
--skip-log-optimize-ddl (optimization enabled) |
67784ms |
35784ms |
38682ms |
37828ms |
--skip-innodb-doublewrite |
64150ms |
33085ms |
34142ms |
33524ms |
Apparently, the test time is dominated by something else than file writes. Finally, I gathered some counters to confirm that the time savings are due to I/O. I changed the interesting part of the test to
INSERT INTO t1 (a) SELECT seq FROM seq_1_to_1000;
|
--source include/wait_all_purged.inc
|
|
show status like 'innodb_d%_writes';
|
FLUSH TABLES t1 FOR EXPORT;
|
show status like 'innodb_d%_writes';
|
(yes, also reduced the number of rows to make it run faster), and got the following numbers for running each configuration with --repeat=2:
configuration |
innodb_data_writes+innodb_dblwr_writes |
baseline (doublewrite buffer enabled) |
7+0,1024+508,1032+508,2049+1016 |
--skip-log-optimize-ddl (optimization enabled) |
7+0,516+0,524+0,1033+0 |
--skip-innodb-doublewrite |
7+0,516+0,524+0,1033+0 |
The numbers are cumulative from the start of the server instance. So, in this artificial example, the doublewrite buffer is being skipped altogether thanks to our optimization. (Apparently we do not get to flush pages in the system tablespace, because the test runs for such a short duration, too fast for a log checkpoint to be executed.)
The reason why disabling the doublewrite buffer altogether is improving performance further could be that we can combine writes in that case. This can be improved further in MDEV-16526.
It seems that this revealed a problem in
MDEV-12699(we are unnecessarily reading a data page that is being completely initialized by redo log records). I will analyze a dataset of innodb.innodb-64k-crash failure deeper tomorrow. I repeated it with this patch on top of both 10.4 and 10.5.