Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.2.33, 10.3.24, 10.4.14
Description
Write-heavy benchmarks show a regression in latency and throughput for latest releases in the 10.2, 10.3 and 10.4 series.
Example:
----------------------------------------------------------------------------
|
Test 't_oltp-innodb-multi' - sysbench OLTP read/write
|
32 InnoDB tables with 10 mio rows total
|
numbers are queries per second
|
|
#thread count 1 8 16 32 64 128 256
|
mariadb-10.2.32 5262.8 28886 53535 95712 158049 189909 192668
|
mariadb-10.2.33 4866.7 22651 42352 78750 133580 173461 191125
|
|
mariadb-10.3.23 5126.4 28716 52785 95466 156936 185473 187136
|
mariadb-10.3.24 4772.0 22959 42713 80442 133492 171252 186200
|
|
mariadb-10.4.13 5015.7 28263 52033 93771 154690 178179 178076
|
mariadb-10.4.14 4980.9 23496 43145 79795 128977 166470 179285
|
I examined this more closely for 10.2, bisecting the commits between 10.2.32 and 10.2.33. The first faulty commit is
commit fe39d02f51b96536dccca7ff89faf05e13548877
|
Author: Thirunarayanan Balathandayuthapani <thiru@mariadb.com>
|
Date: Thu Jul 23 16:23:20 2020 +0530
|
|
MDEV-20638 Remove the deadcode from srv_master_thread() and srv_active_wake_master_thread_low()
|
Attachments
- 10.2.bad.svg
- 700 kB
- 10.2.good.svg
- 684 kB
- 10.2.pdf
- 21 kB
- 10.2-2nd.pdf
- 22 kB
- 10.2-binlog.png
- 11 kB
- 10.2-binlog4.png
- 9 kB
- 10.2-revert.pdf
- 23 kB
- 10.3.pdf
- 20 kB
- 10.4.pdf
- 21 kB
- 10.5-2nd.pdf
- 20 kB
- 10.5-4th.pdf
- 21 kB
- MDEV-23475.png
- 6 kB
- MDEV-23475-final.pdf
- 86 kB
Issue Links
- is caused by
-
MDEV-20638 Remove the deadcode from srv_master_thread() and srv_active_wake_master_thread_low()
-
- Closed
-
Activity
If I remember correctly, MDEV-20638 included also a small functional change.
Because the master thread is what triggers periodic log checkpoints in InnoDB, a change to how often the master thread is woken up could affect the page flushing activity.
thiru, please try to revert the parts that affect the way how often the master thread is woken up.
MDEV-20638 removed many calls to innobase_active_small() and srv_active_wake_master_thread() without any replacement. The calls to srv_wake_master_thread() were replaced with the equivalent srv_inc_activity_count().
We must try to see if we can avoid introducing a replacement for the innobase_active_small() calls, and only add something in place of some srv_active_master_thread() calls. I think that a minimal change could be the following:
diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
|
index 0dbd985b6c3..cd2d381760a 100644
|
--- a/storage/innobase/trx/trx0trx.cc
|
+++ b/storage/innobase/trx/trx0trx.cc
|
@@ -1804,6 +1804,7 @@ trx_commit_in_memory(
|
}
|
|
trx->commit_lsn = lsn;
|
+ srv_inc_activity_count();
|
}
|
|
ut_ad(!trx->rsegs.m_noredo.undo); |
marko I just tried that patch ontop of 10.2 HEAD. It restores the original performance. See attached MDEV-23475.png
I just tried that patch ontop of 10.2 HEAD. It restores the original performance.
Unfortunately not for all tests. It has no, or even the opposite effect for the test with synchronous binlog. All tests run with full durability (innodb_flush_log_at_trx_commit = 1) and have plenty of buffer pool and redo log space. The writes-binlog-multi test additionally enables the binlog with sync-binlog = 1. This probably takes another route in the redo-log flushing.
Attached results: 10.2.pdf
axel, thank you. I think that I will next try removing all other invocations of srv_inc_activity_count() and ensure that we will invoke it once for each commit of a transaction that modifies persistent tables.
In 10.4 this area of the code has changed a lot. I'm wondering why we see the same behavior, though the bad commit from 10.2 cannot be in there. Does it make sense to look for the bad commit in 10.4 too?
axel, in 10.3 or 10.4 you would probably find a larger merge that included many other changes than the 10.2 commit that caused the regression. Maybe the merge commit is canceling out some effects? I slightly revised the patch. I hope that it will work also with sync_binlog=1, but I did not test it. The patch should apply to 10.2, 10.3, 10.4, 10.5 without problems:
diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc
|
index 7c61ad9b45b..7544f120284 100644
|
--- a/storage/innobase/row/row0mysql.cc
|
+++ b/storage/innobase/row/row0mysql.cc
|
@@ -3820,8 +3820,6 @@ row_drop_table_for_mysql(
|
|
trx->op_info = "";
|
|
- srv_inc_activity_count();
|
-
|
DBUG_RETURN(err);
|
}
|
|
diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
|
index 0dbd985b6c3..117ff64761c 100644
|
--- a/storage/innobase/trx/trx0trx.cc
|
+++ b/storage/innobase/trx/trx0trx.cc
|
@@ -1537,6 +1537,7 @@ trx_flush_log_if_needed_low(
|
case 1:
|
/* Write the log and optionally flush it to disk */
|
log_write_up_to(lsn, flush);
|
+ srv_inc_activity_count();
|
return;
|
case 0:
|
/* Do nothing */ |
I think that the call during DROP TABLE is unnecessary, because in that function we did invoke a transaction commit a little earlier. Other invocations did not seem to be that safe to remove.
Hi marko. Indeed that new patch applies to both 10.2 and 10.5 (I didn't test the others). For 10.2 it behaves very similar to the first version. See 10.2-2nd.pdf
On the positive side: the new build (green line) is fastest for all tests except t_writes-binlog-multi. On the other hand it's not better than the first patch for t_writes-binlog-multi and does not match the performance before commit fe39d02f51 (purple line) for medium thread counts.
The graph shows data points for 1, 8, 16, 32, 64, 128 and 256 threads (on 32-thread capable hardware). The middle 3 data points is where I expect customer systems to run.
I think that the remaining regression should go away if we additionally apply a subset of the following patch:
diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
|
index 8b58ca57a74..d0d739f4ae5 100644
|
--- a/storage/innobase/handler/ha_innodb.cc
|
+++ b/storage/innobase/handler/ha_innodb.cc
|
@@ -6607,6 +6607,8 @@ ha_innobase::close()
|
|
MONITOR_INC(MONITOR_TABLE_CLOSE);
|
|
+ srv_inc_activity_count();
|
+
|
DBUG_RETURN(0);
|
}
|
|
@@ -8302,6 +8304,8 @@ ha_innobase::write_row(
|
error_result = convert_error_code_to_mysql(
|
error, m_prebuilt->table->flags, m_user_thd);
|
|
+ srv_inc_activity_count();
|
+
|
#ifdef WITH_WSREP
|
if (!error_result && trx->is_wsrep()
|
&& wsrep_thd_exec_mode(m_user_thd) == LOCAL_STATE
|
@@ -8991,6 +8995,8 @@ ha_innobase::update_row(
|
error, m_prebuilt->table->flags, m_user_thd);
|
}
|
|
+ srv_inc_activity_count();
|
+
|
#ifdef WITH_WSREP
|
if (error == DB_SUCCESS && trx->is_wsrep() &&
|
wsrep_thd_exec_mode(m_user_thd) == LOCAL_STATE &&
|
@@ -9045,6 +9051,7 @@ ha_innobase::delete_row(
|
error = row_update_for_mysql(m_prebuilt);
|
|
innobase_srv_conc_exit_innodb(m_prebuilt);
|
+ srv_inc_activity_count();
|
|
#ifdef WITH_WSREP
|
if (error == DB_SUCCESS && trx->is_wsrep() |
The ha_innobase::close() should be executed rather rarely if at all, because the SQL layer is caching table handles. FLUSH TABLES should empty that cache. I do not know why that call was there in the first place. Maybe it was more meaningful before MySQL 5.5, when the master thread was responsible for more tasks, such as running the purge of history?
The other functions are invoked by INSERT, UPDATE, DELETE. Maybe we should instrument only those functions, and not add the call to trx_flush_log_if_needed_low(). That is, maybe we should apply only this patch (I would recommend to exclude ha_innobase::close()) and nothing else?
For 10.2 the second patch seems to give the best performance in the t_writes-binlog-multi test. See 10.2-binlog.png
Reverting the bad commit in 10.2 gives good resuts in all tests. And it's the only change that I have tested so far, that restores performance for t_writes-binlog-multi
Patch 4 looks okeyish on 10.5. It's very similar to patch 2, but for the binlog test patch 2 still gives the best result. I think for 10.5 we should just apply patch 2.
For the record, patch 4 was similar to this fix, with 2 differences:
- In ha_innobase::write_row(), patch 4 added the call innobase_active_small() before and not after the func_exit: label
- In row_drop_table_for_mysql(), patch 4 (similar to patch 2) removed the call to srv_inc_activity_count(). (This call, executed during DDL operations only, should be redundant due to the call in normal transaction commit, and I would prefer to remove it.)
On a closer look, there only is one goto func_exit in ha_innobase::write_row(), which can only be executed if calc_row_difference() is returning an error. Errors can only be returned when using explicit FTS_DOC_ID column for FULLTEXT INDEX or when attempting to update a geometry column to length 0. I do not think that either case would apply in the benchmark.
I maintain that patch 4 should revert the functional changes that were made in MDEV-20638. But, because that patch did not improve performance in the t_writes-binlog-multi benchmark, we must conclude that some of the dead code is helping the performance in the 10.2 series due to some unknown reason.
I would propose that we revert MDEV-20638 in 10.2 to 10.4, but keep it in 10.5, and apply patch 2 to 10.5.
I reverted MDEV-20638 in 10.2. For 10.3 and 10.4, I think that it would be good to evaluate whether patch 2 would restore the performance.
For 10.3 the situation is very much like for 10.2. Patch 2 restores performance for all tests except t_writes-binlog-multi. If we want to fix performance for that scenario, we would have to rollback MDEV-20638
10.4 is again very similar to 10.2 and 10.3. In addition both patch 1 and 2 fix the 90:10 performance only partly. I suggest to rollback MDEV-20638 for 10.4 as well.
I reverted MDEV-20638 in 10.2 and merged that up to 10.4. In 10.5, instead of reverting the changes, I applied a simple refinement that was called Patch 2:
diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc
|
index 7c61ad9b45b..7544f120284 100644
|
--- a/storage/innobase/row/row0mysql.cc
|
+++ b/storage/innobase/row/row0mysql.cc
|
@@ -3820,8 +3820,6 @@ row_drop_table_for_mysql(
|
|
trx->op_info = "";
|
|
- srv_inc_activity_count();
|
-
|
DBUG_RETURN(err);
|
}
|
|
diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
|
index 0dbd985b6c3..117ff64761c 100644
|
--- a/storage/innobase/trx/trx0trx.cc
|
+++ b/storage/innobase/trx/trx0trx.cc
|
@@ -1537,6 +1537,7 @@ trx_flush_log_if_needed_low(
|
case 1:
|
/* Write the log and optionally flush it to disk */
|
log_write_up_to(lsn, flush);
|
+ srv_inc_activity_count();
|
return;
|
case 0:
|
/* Do nothing */ |
I tested the final fix for this issue on 10.2 .. 10.5. The fix looks very good for most test cases. In 10.2 there is still a slight regression for t_writes-binlog-multi. But all in all the problem is solved. Details: MDEV-23475-final.pdf
Attached: flame graphs for OLTP read/write @ 32 threads.
bad=fe39d02f51b96536dccca7ff89faf05e13548877
good=bad~