[MDEV-23475] InnoDB performance regression for write-heavy workloads Created: 2020-08-13  Updated: 2021-03-10  Resolved: 2020-08-20

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.33, 10.3.24, 10.4.14
Fix Version/s: 10.2.35, 10.3.26, 10.4.16

Type: Bug Priority: Blocker
Reporter: Axel Schwenke Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: not-10.5, performance, regression

Attachments: PDF File 10.2-2nd.pdf     PNG File 10.2-binlog-revert.png     PNG File 10.2-binlog.png     PNG File 10.2-binlog4.png     PDF File 10.2-revert.pdf     File 10.2.bad.svg     File 10.2.good.svg     PDF File 10.2.pdf     PDF File 10.3.pdf     PDF File 10.4.pdf     PDF File 10.5-2nd.pdf     PDF File 10.5-4th.pdf     PDF File MDEV-23475-final.pdf     PNG File MDEV-23475.png    
Issue Links:
Blocks
Problem/Incident
is caused by MDEV-20638 Remove the deadcode from srv_master_t... Closed

 Description   

Write-heavy benchmarks show a regression in latency and throughput for latest releases in the 10.2, 10.3 and 10.4 series.

Example:

----------------------------------------------------------------------------
Test 't_oltp-innodb-multi' - sysbench OLTP read/write
32 InnoDB tables with 10 mio rows total
numbers are queries per second
 
#thread count           1       8       16      32      64      128     256
mariadb-10.2.32         5262.8  28886   53535   95712   158049  189909  192668
mariadb-10.2.33         4866.7  22651   42352   78750   133580  173461  191125
 
mariadb-10.3.23         5126.4  28716   52785   95466   156936  185473  187136
mariadb-10.3.24         4772.0  22959   42713   80442   133492  171252  186200
 
mariadb-10.4.13         5015.7  28263   52033   93771   154690  178179  178076
mariadb-10.4.14         4980.9  23496   43145   79795   128977  166470  179285

I examined this more closely for 10.2, bisecting the commits between 10.2.32 and 10.2.33. The first faulty commit is

commit fe39d02f51b96536dccca7ff89faf05e13548877
Author: Thirunarayanan Balathandayuthapani <thiru@mariadb.com>
Date:   Thu Jul 23 16:23:20 2020 +0530
 
 MDEV-20638 Remove the deadcode from srv_master_thread() and srv_active_wake_master_thread_low()



 Comments   
Comment by Axel Schwenke [ 2020-08-13 ]

Attached: flame graphs for OLTP read/write @ 32 threads.

bad=fe39d02f51b96536dccca7ff89faf05e13548877
good=bad~

Comment by Marko Mäkelä [ 2020-08-14 ]

If I remember correctly, MDEV-20638 included also a small functional change.

Because the master thread is what triggers periodic log checkpoints in InnoDB, a change to how often the master thread is woken up could affect the page flushing activity.

thiru, please try to revert the parts that affect the way how often the master thread is woken up.

Comment by Marko Mäkelä [ 2020-08-14 ]

MDEV-20638 removed many calls to innobase_active_small() and srv_active_wake_master_thread() without any replacement. The calls to srv_wake_master_thread() were replaced with the equivalent srv_inc_activity_count().

We must try to see if we can avoid introducing a replacement for the innobase_active_small() calls, and only add something in place of some srv_active_master_thread() calls. I think that a minimal change could be the following:

diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
index 0dbd985b6c3..cd2d381760a 100644
--- a/storage/innobase/trx/trx0trx.cc
+++ b/storage/innobase/trx/trx0trx.cc
@@ -1804,6 +1804,7 @@ trx_commit_in_memory(
 		}
 
 		trx->commit_lsn = lsn;
+		srv_inc_activity_count();
 	}
 
 	ut_ad(!trx->rsegs.m_noredo.undo);

Comment by Axel Schwenke [ 2020-08-14 ]

marko I just tried that patch ontop of 10.2 HEAD. It restores the original performance. See attached MDEV-23475.png

Comment by Axel Schwenke [ 2020-08-15 ]

I just tried that patch ontop of 10.2 HEAD. It restores the original performance.

Unfortunately not for all tests. It has no, or even the opposite effect for the test with synchronous binlog. All tests run with full durability (innodb_flush_log_at_trx_commit = 1) and have plenty of buffer pool and redo log space. The writes-binlog-multi test additionally enables the binlog with sync-binlog = 1. This probably takes another route in the redo-log flushing.

Attached results: 10.2.pdf

Comment by Marko Mäkelä [ 2020-08-17 ]

axel, thank you. I think that I will next try removing all other invocations of srv_inc_activity_count() and ensure that we will invoke it once for each commit of a transaction that modifies persistent tables.

Comment by Axel Schwenke [ 2020-08-17 ]

In 10.4 this area of the code has changed a lot. I'm wondering why we see the same behavior, though the bad commit from 10.2 cannot be in there. Does it make sense to look for the bad commit in 10.4 too?

Comment by Marko Mäkelä [ 2020-08-17 ]

axel, in 10.3 or 10.4 you would probably find a larger merge that included many other changes than the 10.2 commit that caused the regression. Maybe the merge commit is canceling out some effects? I slightly revised the patch. I hope that it will work also with sync_binlog=1, but I did not test it. The patch should apply to 10.2, 10.3, 10.4, 10.5 without problems:

diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc
index 7c61ad9b45b..7544f120284 100644
--- a/storage/innobase/row/row0mysql.cc
+++ b/storage/innobase/row/row0mysql.cc
@@ -3820,8 +3820,6 @@ row_drop_table_for_mysql(
 
 	trx->op_info = "";
 
-	srv_inc_activity_count();
-
 	DBUG_RETURN(err);
 }
 
diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
index 0dbd985b6c3..117ff64761c 100644
--- a/storage/innobase/trx/trx0trx.cc
+++ b/storage/innobase/trx/trx0trx.cc
@@ -1537,6 +1537,7 @@ trx_flush_log_if_needed_low(
 	case 1:
 		/* Write the log and optionally flush it to disk */
 		log_write_up_to(lsn, flush);
+		srv_inc_activity_count();
 		return;
 	case 0:
 		/* Do nothing */

I think that the call during DROP TABLE is unnecessary, because in that function we did invoke a transaction commit a little earlier. Other invocations did not seem to be that safe to remove.

Comment by Axel Schwenke [ 2020-08-17 ]

Hi marko. Indeed that new patch applies to both 10.2 and 10.5 (I didn't test the others). For 10.2 it behaves very similar to the first version. See 10.2-2nd.pdf

On the positive side: the new build (green line) is fastest for all tests except t_writes-binlog-multi. On the other hand it's not better than the first patch for t_writes-binlog-multi and does not match the performance before commit fe39d02f51 (purple line) for medium thread counts.

The graph shows data points for 1, 8, 16, 32, 64, 128 and 256 threads (on 32-thread capable hardware). The middle 3 data points is where I expect customer systems to run.

Comment by Marko Mäkelä [ 2020-08-18 ]

I think that the remaining regression should go away if we additionally apply a subset of the following patch:

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index 8b58ca57a74..d0d739f4ae5 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -6607,6 +6607,8 @@ ha_innobase::close()
 
 	MONITOR_INC(MONITOR_TABLE_CLOSE);
 
+	srv_inc_activity_count();
+
 	DBUG_RETURN(0);
 }
 
@@ -8302,6 +8304,8 @@ ha_innobase::write_row(
 	error_result = convert_error_code_to_mysql(
 		error, m_prebuilt->table->flags, m_user_thd);
 
+	srv_inc_activity_count();
+
 #ifdef WITH_WSREP
 	if (!error_result && trx->is_wsrep()
 	    && wsrep_thd_exec_mode(m_user_thd) == LOCAL_STATE
@@ -8991,6 +8995,8 @@ ha_innobase::update_row(
 			error, m_prebuilt->table->flags, m_user_thd);
 	}
 
+	srv_inc_activity_count();
+
 #ifdef WITH_WSREP
 	if (error == DB_SUCCESS && trx->is_wsrep() &&
 	    wsrep_thd_exec_mode(m_user_thd) == LOCAL_STATE &&
@@ -9045,6 +9051,7 @@ ha_innobase::delete_row(
 	error = row_update_for_mysql(m_prebuilt);
 
 	innobase_srv_conc_exit_innodb(m_prebuilt);
+	srv_inc_activity_count();
 
 #ifdef WITH_WSREP
 	if (error == DB_SUCCESS && trx->is_wsrep()

The ha_innobase::close() should be executed rather rarely if at all, because the SQL layer is caching table handles. FLUSH TABLES should empty that cache. I do not know why that call was there in the first place. Maybe it was more meaningful before MySQL 5.5, when the master thread was responsible for more tasks, such as running the purge of history?

The other functions are invoked by INSERT, UPDATE, DELETE. Maybe we should instrument only those functions, and not add the call to trx_flush_log_if_needed_low(). That is, maybe we should apply only this patch (I would recommend to exclude ha_innobase::close()) and nothing else?

Comment by Axel Schwenke [ 2020-08-18 ]

The second patch looks good with 10.5: 10.5-2nd.pdf

Comment by Axel Schwenke [ 2020-08-18 ]

For 10.2 the second patch seems to give the best performance in the t_writes-binlog-multi test. See 10.2-binlog.png

Comment by Axel Schwenke [ 2020-08-18 ]

Patch 4 does not help at all: 10.2-binlog4.png

Comment by Axel Schwenke [ 2020-08-18 ]

Reverting the bad commit in 10.2 gives good resuts in all tests. And it's the only change that I have tested so far, that restores performance for t_writes-binlog-multi

10.2-revert.pdf

Comment by Axel Schwenke [ 2020-08-18 ]

Patch 4 looks okeyish on 10.5. It's very similar to patch 2, but for the binlog test patch 2 still gives the best result. I think for 10.5 we should just apply patch 2.

10.5-4th.pdf

Comment by Marko Mäkelä [ 2020-08-19 ]

For the record, patch 4 was similar to this fix, with 2 differences:

  • In ha_innobase::write_row(), patch 4 added the call innobase_active_small() before and not after the func_exit: label
  • In row_drop_table_for_mysql(), patch 4 (similar to patch 2) removed the call to srv_inc_activity_count(). (This call, executed during DDL operations only, should be redundant due to the call in normal transaction commit, and I would prefer to remove it.)

On a closer look, there only is one goto func_exit in ha_innobase::write_row(), which can only be executed if calc_row_difference() is returning an error. Errors can only be returned when using explicit FTS_DOC_ID column for FULLTEXT INDEX or when attempting to update a geometry column to length 0. I do not think that either case would apply in the benchmark.

I maintain that patch 4 should revert the functional changes that were made in MDEV-20638. But, because that patch did not improve performance in the t_writes-binlog-multi benchmark, we must conclude that some of the dead code is helping the performance in the 10.2 series due to some unknown reason.

I would propose that we revert MDEV-20638 in 10.2 to 10.4, but keep it in 10.5, and apply patch 2 to 10.5.

Comment by Marko Mäkelä [ 2020-08-19 ]

I reverted MDEV-20638 in 10.2. For 10.3 and 10.4, I think that it would be good to evaluate whether patch 2 would restore the performance.

Comment by Axel Schwenke [ 2020-08-19 ]

For 10.3 the situation is very much like for 10.2. Patch 2 restores performance for all tests except t_writes-binlog-multi. If we want to fix performance for that scenario, we would have to rollback MDEV-20638

10.3.pdf

Comment by Axel Schwenke [ 2020-08-20 ]

10.4 is again very similar to 10.2 and 10.3. In addition both patch 1 and 2 fix the 90:10 performance only partly. I suggest to rollback MDEV-20638 for 10.4 as well.

10.4.pdf

Comment by Marko Mäkelä [ 2020-08-20 ]

I reverted MDEV-20638 in 10.2 and merged that up to 10.4. In 10.5, instead of reverting the changes, I applied a simple refinement that was called Patch 2:

diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc
index 7c61ad9b45b..7544f120284 100644
--- a/storage/innobase/row/row0mysql.cc
+++ b/storage/innobase/row/row0mysql.cc
@@ -3820,8 +3820,6 @@ row_drop_table_for_mysql(
 
 	trx->op_info = "";
 
-	srv_inc_activity_count();
-
 	DBUG_RETURN(err);
 }
 
diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
index 0dbd985b6c3..117ff64761c 100644
--- a/storage/innobase/trx/trx0trx.cc
+++ b/storage/innobase/trx/trx0trx.cc
@@ -1537,6 +1537,7 @@ trx_flush_log_if_needed_low(
 	case 1:
 		/* Write the log and optionally flush it to disk */
 		log_write_up_to(lsn, flush);
+		srv_inc_activity_count();
 		return;
 	case 0:
 		/* Do nothing */

Comment by Axel Schwenke [ 2020-08-25 ]

I tested the final fix for this issue on 10.2 .. 10.5. The fix looks very good for most test cases. In 10.2 there is still a slight regression for t_writes-binlog-multi. But all in all the problem is solved. Details: MDEV-23475-final.pdf

Comment by jaehyun1148.lee [ 2021-03-10 ]

Could that bug occur in 10.3.23?

Generated at Thu Feb 08 09:22:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.