[MDEV-33213] History list is not shrunk unless there is a pause in the workload Created: 2024-01-10  Updated: 2024-02-07  Resolved: 2024-01-17

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6, 10.11, 11.0, 11.1, 11.2, 11.3, 11.4
Fix Version/s: 10.10.7, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2, 11.4.1

Type: Bug Priority: Blocker
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: performance

Attachments: PDF File 10.6-MDEV-33213.pdf     PDF File 12x5.pdf     PDF File 24x5.pdf     PDF File 24x5_high_threads.pdf     PDF File 24x5_high_threads_pausing.pdf     PNG File no_pause.png     PNG File with_pause.png    
Issue Links:
Relates
relates to MDEV-30628 10.6 performance regression with sust... Needs Feedback
relates to MDEV-31676 Innodb history length keeps growing Closed
relates to MDEV-32050 UNDO logs still growing for write-int... Closed
relates to MDEV-33009 Server hangs for a long time with inn... Closed
relates to MDEV-33112 innodb_undo_log_truncate=ON is blocki... Closed
relates to MDEV-33315 InnoDB history length and undo tables... Needs Feedback

 Description   

While testing MDEV-32050 and its follow-up changes MDEV-33009 and MDEV-33112, axel pointed out that the InnoDB history list is not being shrunk during a workload.

Some initial debugging indicates that the purge_truncation_task (which was added in MDEV-32050) is being invoked but it is not doing anything. Also, the purge_sys.view is advancing; there are no old read views that would block the purge of history. Throttling the Sysbench workload for a few seconds would allow the history list to shrink immediately.



 Comments   
Comment by Axel Schwenke [ 2024-01-10 ]

Attached:

This is the same build from branch 10.6-MDEV-33112. The first run is sysbench-tpcc for 1x 60min and shows the history list growing to 12 mio trx (and according UNDO log size). The other run is sysbench-tpcc for 6x 10min with only 5 seconds pause between. The 5 second pause is enough to clean up the history list and shrink the UNDO logs.

Comment by Marko Mäkelä [ 2024-01-10 ]

I instrumented a 11.x server with the following patch:

diff --git a/storage/innobase/srv/srv0srv.cc b/storage/innobase/srv/srv0srv.cc
index d1ea843e521..1018d2f880d 100644
--- a/storage/innobase/srv/srv0srv.cc
+++ b/storage/innobase/srv/srv0srv.cc
@@ -1098,7 +1098,8 @@ static void purge_truncation_callback(void*)
   purge_sys.latch.rd_lock(SRW_LOCK_CALL);
   const purge_sys_t::iterator head= purge_sys.head;
   purge_sys.latch.rd_unlock();
-  head.free_history();
+  dberr_t err= head.free_history();
+  sql_print_information("InnoDB purge truncation: %s", ut_strerr(err));
 }
 
 static tpool::task_group purge_task_group;
@@ -1415,6 +1416,7 @@ inline void purge_coordinator_state::do_purge()
       break;
     }
 
+    sql_print_information("InnoDB purge: %zu", history_size);
     ulint n_pages_handled= trx_purge(n_threads, history_size);
     if (!trx_sys.history_exists())
       goto no_history;

I see output like this:

2024-01-10 16:42:46 0 [Note] InnoDB purge: 34
2024-01-10 16:42:46 0 [Note] InnoDB purge: 36
2024-01-10 16:42:46 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:42:46 0 [Note] InnoDB purge: 3
2024-01-10 16:42:46 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:42:46 0 [Note] InnoDB purge: 5
2024-01-10 16:42:46 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:42:47 0 [Note] InnoDB purge: 114649
2024-01-10 16:42:54 0 [Note] InnoDB purge: 1444953
2024-01-10 16:42:54 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:42:58 0 [Note] InnoDB purge: 2199290
2024-01-10 16:42:58 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:43:02 0 [Note] InnoDB purge: 2924550
2024-01-10 16:43:02 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:43:24 0 [Note] InnoDB purge: 5928129
2024-01-10 16:43:24 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:43:47 0 [Note] InnoDB purge: 7893229
2024-01-10 16:43:47 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:44:23 0 [Note] InnoDB purge: 10563812
2024-01-10 16:44:23 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:44:49 0 [Note] InnoDB: Shutdown completed; log sequence number 2550911428; transaction id 23745949

The workload script is based on something that I used for MDEV-32050. It consisted of two 60-second sysbench oltp_update_index, separated by a sleep 1. The purge message with a history_size of 7893229 was output around the switching point. During the second run, there was only one more attempt of purging.

I tried again oltp_update_non_index, which should allow purge to work around MDEV-17598. Same thing there, despite a 3-second sleep between two 60-second workloads:

2024-01-10 16:47:48 0 [Note] InnoDB purge: 34
2024-01-10 16:47:48 0 [Note] InnoDB purge: 36
2024-01-10 16:47:48 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:47:48 0 [Note] InnoDB purge: 3
2024-01-10 16:47:48 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:47:48 0 [Note] InnoDB purge: 5
2024-01-10 16:47:48 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:47:49 0 [Note] InnoDB purge: 141988
2024-01-10 16:47:50 0 [Note] InnoDB purge: 345948
2024-01-10 16:47:50 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:47:50 0 [Note] InnoDB purge: 450133
2024-01-10 16:47:50 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:47:51 0 [Note] InnoDB purge: 551898
2024-01-10 16:47:51 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:47:51 0 [Note] InnoDB purge: 669964
2024-01-10 16:47:51 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:47:52 0 [Note] InnoDB purge: 763708
2024-01-10 16:47:52 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:47:53 0 [Note] InnoDB purge: 905287
2024-01-10 16:47:53 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:47:53 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:47:53 0 [Note] InnoDB purge: 1026138
2024-01-10 16:48:47 0 [Note] InnoDB purge: 12705454
2024-01-10 16:48:47 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:48:48 0 [Note] InnoDB purge: 12865213
2024-01-10 16:48:48 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:48:48 0 [Note] InnoDB purge: 12897164
2024-01-10 16:48:48 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:48:48 0 [Note] InnoDB purge: 12897164
2024-01-10 16:48:50 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:48:50 0 [Note] InnoDB purge: 10519866
2024-01-10 16:48:51 0 [Note] InnoDB purge: 10181678
2024-01-10 16:48:51 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:48:51 0 [Note] InnoDB purge: 10153547
2024-01-10 16:48:51 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:48:52 0 [Note] InnoDB purge: 10262769
2024-01-10 16:48:52 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:48:52 0 [Note] InnoDB purge: 10363969
2024-01-10 16:48:52 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:49:51 0 [Note] InnoDB purge: 23166576
2024-01-10 16:49:51 0 [Note] InnoDB purge truncation: Success
2024-01-10 16:49:51 0 [Note] InnoDB purge: 23166576

Roughly coinciding with the 3-second sleep, we see some reduction of the history list, followed by a steady growth to the end of the run.

Comment by Marko Mäkelä [ 2024-01-11 ]

During the busy part of my test workload, purge_sys.sees(rseg.needs_purge) would never hold in purge_sys_t::iterator::free_history(). Occasionally, rseg.is_referenced() would hold. Only when trx_purge_truncate_rseg_history() is invoked with all=true, the history would be shrunk. When all=false, I observed the difference trx_sys.get_max_trx_id()-rseg.needs_purge to be between 0 and 3379. The type of the field trx_rseg_t::needs_purge was changed from a Boolean to a transaction identifier in MDEV-30671 in order to prevent the premature freeing of undo log pages.

Comment by Marko Mäkelä [ 2024-01-11 ]

The following patch, which will prevent the reuse of undo pages by multiple small transactions (TRX_UNDO_CACHED), will allow the history list to shrink during my test workload:

diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
index add4db962d5..f593c49070b 100644
--- a/storage/innobase/trx/trx0purge.cc
+++ b/storage/innobase/trx/trx0purge.cc
@@ -275,6 +275,7 @@ trx_purge_add_undo_to_history(const trx_t* trx, trx_undo_t*& undo, mtr_t* mtr)
 
   uint16_t undo_state;
 
+#if 0
   if (undo->size == 1 &&
       TRX_UNDO_PAGE_REUSE_LIMIT >
       mach_read_from_2(TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_FREE +
@@ -284,6 +285,7 @@ trx_purge_add_undo_to_history(const trx_t* trx, trx_undo_t*& undo, mtr_t* mtr)
     UT_LIST_ADD_FIRST(rseg->undo_cached, undo);
   }
   else
+#endif
   {
     ut_ad(undo->size == flst_get_len(TRX_UNDO_SEG_HDR + TRX_UNDO_PAGE_LIST +
                                      undo_page->page.frame));
 

If this change turns out to improve performance in a broad selection of benchmarks, we might want to introduce a configuration parameter for this. Back when I was working on MDEV-25062, I observed that a change like this would degrade throughput. My test workloads were short, and I ignored the growth of the history list.

Note: In MDEV-29593 the purge of committed history was improved so that can free pages that were marked reusable (TRX_UNDO_CACHED) but have not yet been reused. The rseg.is_referenced() check is needed for the rare case that a starting transaction has chosen to use a rollback segment but has not modified any undo page yet.

Comment by Marko Mäkelä [ 2024-01-11 ]

I think that the only practical way to prevent rseg.needs_purge from blocking the cleanup action is to temporarily disable one rollback segment at a time, by invoking trx_rseg_t::skip_allocation(). Adding more rollback segments is not practical, because the current file format (especially the format of DB_ROLL_PTR) limits us to only 128 rollback segments, or 127 when innodb_undo_tablespaces>0 because the first one is hard-coded in the system tablespace. When transactions are started, they can choose any of the other rollback 127 or 126 segments while the purge subsystem is preparing to free the undo log pages that stored the already purged history in one rollback segment.

For innodb_undo_log_truncate=ON we already disable all rollback segments (trx_rseg_t::set_skip_allocation()) that belong to the purge_sys.truncate.current tablespace. The reason why this logic does not work during a heavy workload is that we are “too shy” to prepare for the truncation. If we allow purge_coordinator_state::do_purge() to invoke that part of trx_purge_truncate_history() directly, then undo tablespace truncation will work during the workload. I have implemented this, and it seems to work with my simple benchmark script, without too much disruption on the throughput.

If we also invoked trx_rseg_t::set_skip_allocation() for one rollback segment at a time in the innodb_undo_log_truncate=OFF case, the need for undo tablespace truncation should be reduced.

Comment by Axel Schwenke [ 2024-01-12 ]

The commit 4799e4efee2 in branch mariadb-10.6-MDEV-33213 looks indeed very good. I have run all 4 TPC-C benchmarks I had prepared on host g5. The performance is better (sometimes much better) and the problem with growing InnoDB History Length is gone.

Attachments: 12x5.pdf 24x5.pdf 24x5_high_threads.pdf 24x5_high_threads_pausing.pdf

Comment by Axel Schwenke [ 2024-01-15 ]

I repeated now the TPC-C regression tests. They run on cheetah01 and include tests with innodb_undo_log_truncate=OFF (default) as well as innodb_undo_log_truncate=ON. The results also look good.

Attached; 10.6-MDEV-33213.pdf

Comment by Matthias Leich [ 2024-01-15 ]

10.6-MDEV-33213 028dd57a0c07e94a3c1f205c4915efc06bae47d0 2024-01-12T09:41:15+02:00
performed well in RQG testing.

Comment by Matthias Leich [ 2024-01-15 ]

RQG testing etc.

Generated at Thu Feb 08 10:37:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.