Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30671

innodb_undo_log_truncate=ON fails to wait for purge of transaction history

Details

    Description

      mleich produced an rr replay trace of an execution where InnoDB crashes soon after being started up on a freshly initialized database, without involving any crash recovery, soon after an undo tablespace was truncated:

      10.5 5300c0fb7632e7e49a22997297bf731e691d3c24

      Thread 5 hit Breakpoint 7, sql_print_information (format=format@entry=0x56404df92008 "InnoDB: %s") at /data/Server/bb-10.5-MDEV-30657/sql/log.cc:9283
      9283	in /data/Server/bb-10.5-MDEV-30657/sql/log.cc
      1: x/i $pc
      => 0x56404d8a3e20 <sql_print_information(char const*, ...)>:	endbr64 
      (rr) 
      Continuing.
      2023-02-15 19:11:55 0 [Note] InnoDB: Truncated .//undo002
       
      Thread 14 received signal SIGSEGV, Segmentation fault.
      [Switching to Thread 3065874.3070558]
      __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:497
      497	../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
      1: x/i $pc
      => 0x7f90c8b77955 <__memmove_avx_unaligned_erms+741>:	vmovdqu (%rsi),%ymm0
      (rr) bt
      #0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:497
      #1  0x000056404db25250 in memcpy (__len=<optimized out>, __src=0x7f90bb9123f7, __dest=0x7f8f8bf92078) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:34
      #2  mem_heap_dup (len=<optimized out>, data=0x7f90bb9123f7, heap=0x7f90a40643a0) at /data/Server/bb-10.5-MDEV-30657/storage/innobase/include/mem0mem.h:240
      #3  trx_undo_rec_copy (heap=0x7f90a40643a0, undo_rec=0x7f90bb9123f7 "") at /data/Server/bb-10.5-MDEV-30657/storage/innobase/include/trx0rec.inl:70
      #4  trx_undo_get_undo_rec_low (roll_ptr=<optimized out>, heap=0x7f90a40643a0) at /data/Server/bb-10.5-MDEV-30657/storage/innobase/trx/trx0rec.cc:1998
      #5  0x000056404db27513 in trx_undo_get_undo_rec (undo_rec=<synthetic pointer>, name=..., trx_id=<optimized out>, heap=<optimized out>, roll_ptr=19984723349668855)
          at /data/Server/bb-10.5-MDEV-30657/storage/innobase/trx/trx0rec.cc:2030
      #6  trx_undo_prev_version_build (index_rec=index_rec@entry=0x7f90bbd98088 "\200", index_mtr=index_mtr@entry=0x7f909c074370, rec=rec@entry=0x7f90bbd98088 "\200", index=index@entry=0x7f90a805a198, 
          offsets=0x7f909c073d30, heap=heap@entry=0x7f90a40643a0, old_vers=0x7f909c073b98, v_heap=0x0, vrow=0x0, v_status=0) at /data/Server/bb-10.5-MDEV-30657/storage/innobase/trx/trx0rec.cc:2116
      #7  0x000056404db0ce3c in row_vers_build_for_consistent_read (rec=0x7f90bbd98088 "\200", mtr=0x7f909c074370, index=0x7f90a805a198, offsets=0x7f909c073cf8, view=0x7f90c803e470, offset_heap=0x7f909c073cf0, 
          in_heap=0x7f90b407df70, old_vers=0x7f909c073f90, vrow=0x0) at /data/Server/bb-10.5-MDEV-30657/storage/innobase/row/row0vers.cc:1161
      #8  0x000056404db016dc in row_search_mvcc (buf=0x7f9094059680 "\377\377\377", mode=PAGE_CUR_G, mode@entry=PAGE_CUR_UNSUPP, prebuilt=0x7f9094071b28, match_mode=<optimized out>, direction=1)
          at /data/Server/bb-10.5-MDEV-30657/storage/innobase/row/row0sel.cc:5220
      

      After extensive debugging, I concluded that the problem is that undo tablespace truncation only waits until there are no active transactions (which could be potentially rolled back) in any rollback segment that is stored in the undo tablespace that is being considered for truncation. There is a check like this in trx_purge_truncate_history():

          for (ulint i= 0; i < TRX_SYS_N_RSEGS; ++i)
          {
            trx_rseg_t *rseg= trx_sys.rseg_array[i];
            if (!rseg || rseg->space != &space)
              continue;
            mutex_enter(&rseg->mutex);
            ut_ad(rseg->skip_allocation);
            ut_ad(rseg->is_persistent());
            if (rseg->trx_ref_count)
            {
      not_free:
              mutex_exit(&rseg->mutex);
              return;
            }
      

      At this point of time, rseg->needs_purge, which was added in MDEV-13536, can still hold. The transaction reference count would be decremented in trx_t::commit_in_memory().

      At the time of the crash, the DB_TRX_ID=0x2aea of the clustered index record whose history we are attempting to fetch, is explicitly listed in the purge_sys.view, meaning that its history must not be purged yet:

      (rr) f 7
      #7  0x000056404db0ce3c in row_vers_build_for_consistent_read (rec=0x7f90bbd98088 "\200", mtr=0x7f909c074370, index=0x7f90a805a198, offsets=0x7f909c073cf8, view=0x7f90c803e470, offset_heap=0x7f909c073cf0, 
          in_heap=0x7f90b407df70, old_vers=0x7f909c073f90, vrow=0x0) at /data/Server/bb-10.5-MDEV-30657/storage/innobase/row/row0vers.cc:1161
      1161	/data/Server/bb-10.5-MDEV-30657/storage/innobase/row/row0vers.cc: No such file or directory.
      (rr) p/x *rec@17
      $38 = {0x80, 0x0, 0x5, 0x37, 0x0, 0x0, 0x0, 0x0, 0x2a, 0xea, 0x47, 0x0, 0x0, 0x0, 0x31, 0x3, 0xf7}
      (rr) p/x purge_sys.view
      $39 = {m_low_limit_id = 0x2b26, m_up_limit_id = 0x2ad5, m_ids = std::vector of length 8, capacity 41 = {0x2ad5, 0x2aea, 0x2b0a, 0x2b0f, 0x2b1b, 0x2b22, 0x2b24, 0x2b25}, m_low_limit_no = 0x2b26}
      

      I think that the following could fix the bug:

      diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
      index 4d84f295c0b..39095013457 100644
      --- a/storage/innobase/trx/trx0purge.cc
      +++ b/storage/innobase/trx/trx0purge.cc
      @@ -627,7 +627,7 @@ static void trx_purge_truncate_history()
             mutex_enter(&rseg->mutex);
             ut_ad(rseg->skip_allocation);
             ut_ad(rseg->is_persistent());
      -      if (rseg->trx_ref_count)
      +      if (rseg->needs_purge || rseg->trx_ref_count)
             {
       not_free:
               mutex_exit(&rseg->mutex);
      @@ -753,6 +753,8 @@ static void trx_purge_truncate_history()
       
             ut_ad(rseg->id == i);
             ut_ad(rseg->is_persistent());
      +      ut_ad(!rseg->trx_ref_count);
      +      ut_ad(!rseg->needs_purge);
             ut_d(const auto old_page= rseg->page_no);
       
             buf_block_t *rblock= trx_rseg_header_create(&space, i,
      

      Attachments

        Issue Links

          Activity

            It turns out that rseg->needs_purge will not be reset when everything belonging to a rollback segment has been purged:

            diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
            index 39095013457..0194bdc1380 100644
            --- a/storage/innobase/trx/trx0purge.cc
            +++ b/storage/innobase/trx/trx0purge.cc
            @@ -410,6 +410,7 @@ trx_purge_free_segment(trx_rseg_t* rseg, fil_addr_t hdr_addr)
             	ut_ad(rseg->curr_size >= seg_size);
             
             	rseg->curr_size -= seg_size;
            +	rseg->needs_purge = false;
             
             	mutex_exit(&(rseg->mutex));
             
            @@ -488,6 +489,7 @@ trx_purge_truncate_rseg_history(
             		trx_purge_remove_log_hdr(rseg_hdr, block, hdr_addr.boffset,
             					 &mtr);
             
            +		rseg.needs_purge = false;
             		mutex_exit(&rseg.mutex);
             		mtr.commit();
             	}
            

            Without this, the test innodb.undo_truncate_recover would fail, and I suppose the undo truncation would never be triggered, unless the server was restarted in between.

            marko Marko Mäkelä added a comment - It turns out that rseg->needs_purge will not be reset when everything belonging to a rollback segment has been purged: diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc index 39095013457..0194bdc1380 100644 --- a/storage/innobase/trx/trx0purge.cc +++ b/storage/innobase/trx/trx0purge.cc @@ -410,6 +410,7 @@ trx_purge_free_segment(trx_rseg_t* rseg, fil_addr_t hdr_addr) ut_ad(rseg->curr_size >= seg_size); rseg->curr_size -= seg_size; + rseg->needs_purge = false; mutex_exit(&(rseg->mutex)); @@ -488,6 +489,7 @@ trx_purge_truncate_rseg_history( trx_purge_remove_log_hdr(rseg_hdr, block, hdr_addr.boffset, &mtr); + rseg.needs_purge = false; mutex_exit(&rseg.mutex); mtr.commit(); } Without this, the test innodb.undo_truncate_recover would fail, and I suppose the undo truncation would never be triggered, unless the server was restarted in between.

            origin/bb-10.5-MDEV-30671 28dedf65cfbb1fa0029c5659695f511dd922a04b 2023-02-17T14:58:12+02:00
            performed well in RQG testing.

            mleich Matthias Leich added a comment - origin/bb-10.5- MDEV-30671 28dedf65cfbb1fa0029c5659695f511dd922a04b 2023-02-17T14:58:12+02:00 performed well in RQG testing.

            Unfortunately, the test innodb.dml_purge started to fail with this change, indicating that some history was not purged. I tried adding an explicit wait for a purge after the rollback that the test initiated. Even if I changed the test to perform a slow shutdown before examining the contents of the data file, it would occasionally fail.

            In other words, the fix will need to be revised and retested. First, I need to generate rr replay traces leading to a failure.

            marko Marko Mäkelä added a comment - Unfortunately, the test innodb.dml_purge started to fail with this change, indicating that some history was not purged. I tried adding an explicit wait for a purge after the rollback that the test initiated. Even if I changed the test to perform a slow shutdown before examining the contents of the data file, it would occasionally fail. In other words, the fix will need to be revised and retested. First, I need to generate rr replay traces leading to a failure.

            I can produce rr replay traces of failures with the current fix and the following:

            while ./mtr --rr --parallel=95 innodb.dml_purge{,,,,,,,,,,,,,,,,,,}; do :; done
            

            The trick is to force more frequent context switches by starting many more concurrent client/server process pairs than there are CPU cores on the system.

            marko Marko Mäkelä added a comment - I can produce rr replay traces of failures with the current fix and the following: while ./mtr --rr --parallel=95 innodb.dml_purge{,,,,,,,,,,,,,,,,,,}; do :; done The trick is to force more frequent context switches by starting many more concurrent client/server process pairs than there are CPU cores on the system.

            The following is the relevant part of the test:

            --connect (prevent_purge,localhost,root)
            START TRANSACTION WITH CONSISTENT SNAPSHOT;
             
            --connection default
            INSERT INTO t1 VALUES(1,2),(3,4);
            ALTER TABLE t1 ADD COLUMN c INT;
            UPDATE t1 SET b=-3 WHERE a=3;
             
            --connect (con1,localhost,root)
            BEGIN;
            # For purgeable records, we must record DB_TRX_ID=0 in the undo log!
            UPDATE t1 SET b=4 WHERE a=3;
            --disconnect prevent_purge
             
            --connection default
            # Initiate a full purge, which should reset the DB_TRX_ID except for a=3.
            --source include/wait_all_purged.inc
            # Initiate a ROLLBACK of the update, which should reset the DB_TRX_ID for a=3.
            --disconnect con1
             
            FLUSH TABLE t1 FOR EXPORT;
            

            In the trace that I am debugging, at the time the connection default completed the ALTER TABLE and the UPDATE, trx_sys.rseg_history_length was incremented to 15 and 16. Then, the purge coordinator task would decrement trx_sys.rseg_history_length one by one all the way to 3, without buffer-fixing the clustered index root page at all. The first buffer-fix will be for the UPDATE t1 SET b=4 WHERE a=3. This is still fine: we have 3 committed single-statement transactions (INSERT, ALTER, UPDATE) whose history cannot be purged, because the connection prevent_purge is hogging an older read view.

            After disconnect prevent_purge, all of the old history may be purged, but for some reason the history was only purged for the last record:
            (instant ADD metadata, DB_TRX_ID=0x21, DB_ROLL_PTR=(insert), …),
            (a=1, DB_TRX_ID=0x1d, DB_ROLL_PTR=(insert), b=2),
            (a=3, DB_TRX_ID=0, DB_ROLL_PTR=1<<55, b=-3).

            In fact, no purge activity took place on the page before the UPDATE of con1 accesses the last record, carrying DB_TRX_ID=0x23 at that time. It will update the record to
            (a=3, DB_TRX_ID=0x25, DB_ROLL_PTR=(update), b=4).
            While that connection thread is waiting for the next command (or disconnect con1), the purge_coordinator_callback will decrement trx_sys.rseg_history_len from 3 to 0, even though the clustered index root page is not being accessed at all. The next access to the index page is by the rollback that was triggered by disconnect con1. The rollback will discard the history of the record (a=3,b=-3) as it is supposed to. Upon completion of the rollback, trx_sys.rseg_history_len will be incremented from 0 to 1, and again to 0 by purge_coordinator_callback. The next user access to the page via the buffer pool is by SELECT * FROM t1 after the Perl code to read and validate the contents of the page had been executed.

            During the entire execution, the only call to trx_undo_rec_copy() is by the rollback. After the ALTER TABLE was completed and the trx_sys.rseg_history_len reached 3, there is this call, but it looks like the undo records have been copied off the undo page(s) already:

            #0  0x000055ef7e41f3d6 in trx_purge_fetch_next_rec (heap=<optimized out>, n_pages_handled=<optimized out>, roll_ptr=<optimized out>) at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:1104
            #1  trx_purge_attach_undo_recs (n_purge_threads=3) at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:1177
            #2  trx_purge (n_tasks=<optimized out>, truncate=<optimized out>) at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:1298
            #3  0x000055ef7e40b812 in srv_do_purge (n_total_purged=<synthetic pointer>) at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1838
            #4  purge_coordinator_callback_low () at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1940
            #5  purge_coordinator_callback () at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1969
            

            I will have to debug this deeper before I can come up with a sufficient wait condition.

            marko Marko Mäkelä added a comment - The following is the relevant part of the test: --connect (prevent_purge,localhost,root) START TRANSACTION WITH CONSISTENT SNAPSHOT;   --connection default INSERT INTO t1 VALUES (1,2),(3,4); ALTER TABLE t1 ADD COLUMN c INT ; UPDATE t1 SET b=-3 WHERE a=3;   --connect (con1,localhost,root) BEGIN ; # For purgeable records, we must record DB_TRX_ID=0 in the undo log! UPDATE t1 SET b=4 WHERE a=3; --disconnect prevent_purge   --connection default # Initiate a full purge, which should reset the DB_TRX_ID except for a=3. --source include/wait_all_purged.inc # Initiate a ROLLBACK of the update , which should reset the DB_TRX_ID for a=3. --disconnect con1   FLUSH TABLE t1 FOR EXPORT; In the trace that I am debugging, at the time the connection default completed the ALTER TABLE and the UPDATE , trx_sys.rseg_history_length was incremented to 15 and 16. Then, the purge coordinator task would decrement trx_sys.rseg_history_length one by one all the way to 3, without buffer-fixing the clustered index root page at all. The first buffer-fix will be for the UPDATE t1 SET b=4 WHERE a=3 . This is still fine: we have 3 committed single-statement transactions ( INSERT , ALTER , UPDATE ) whose history cannot be purged, because the connection prevent_purge is hogging an older read view. After disconnect prevent_purge , all of the old history may be purged, but for some reason the history was only purged for the last record: (instant ADD metadata, DB_TRX_ID=0x21, DB_ROLL_PTR=(insert), …), (a=1, DB_TRX_ID=0x1d, DB_ROLL_PTR=(insert), b=2), (a=3, DB_TRX_ID=0, DB_ROLL_PTR=1<<55, b=-3). In fact, no purge activity took place on the page before the UPDATE of con1 accesses the last record, carrying DB_TRX_ID=0x23 at that time. It will update the record to (a=3, DB_TRX_ID=0x25, DB_ROLL_PTR=(update), b=4). While that connection thread is waiting for the next command (or disconnect con1 ), the purge_coordinator_callback will decrement trx_sys.rseg_history_len from 3 to 0, even though the clustered index root page is not being accessed at all. The next access to the index page is by the rollback that was triggered by disconnect con1 . The rollback will discard the history of the record (a=3,b=-3) as it is supposed to. Upon completion of the rollback, trx_sys.rseg_history_len will be incremented from 0 to 1, and again to 0 by purge_coordinator_callback . The next user access to the page via the buffer pool is by SELECT * FROM t1 after the Perl code to read and validate the contents of the page had been executed. During the entire execution, the only call to trx_undo_rec_copy() is by the rollback. After the ALTER TABLE was completed and the trx_sys.rseg_history_len reached 3, there is this call, but it looks like the undo records have been copied off the undo page(s) already: #0 0x000055ef7e41f3d6 in trx_purge_fetch_next_rec (heap=<optimized out>, n_pages_handled=<optimized out>, roll_ptr=<optimized out>) at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:1104 #1 trx_purge_attach_undo_recs (n_purge_threads=3) at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:1177 #2 trx_purge (n_tasks=<optimized out>, truncate=<optimized out>) at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:1298 #3 0x000055ef7e40b812 in srv_do_purge (n_total_purged=<synthetic pointer>) at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1838 #4 purge_coordinator_callback_low () at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1940 #5 purge_coordinator_callback () at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1969 I will have to debug this deeper before I can come up with a sufficient wait condition.

            I can reproduce this even with innodb_purge_threads=1, that is, only the purge_coordinator_callback() would take care of purging history, not delegating any work to purge_worker_callback() threads. This will ease debugging a lot.

            The update of purge_sys.view gives the permission to remove some more history:

            #0  purge_sys_t::clone_oldest_view (this=this@entry=0x55912ae3dec0 <purge_sys>)
                at /mariadb/10.5/storage/innobase/include/trx0purge.h:261
            #1  0x000055912aba740b in trx_purge (n_tasks=1, truncate=<optimized out>)
                at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:1289
            #2  0x000055912ab8a82d in srv_do_purge (
                n_total_purged=n_total_purged@entry=0x7f8426df9bf8)
                at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1838
            #3  0x000055912ab8a99f in purge_coordinator_callback_low ()
                at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1940
            #4  0x000055912ab8cf39 in purge_coordinator_callback ()
                at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1969
            

            Later, the trx_sys.rseg_history_len is being decremented in the same thread:

            #2  0x000055912aba5c4b in trx_purge_truncate_history ()
                at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:573
            #3  0x000055912aba757c in trx_purge (n_tasks=<optimized out>, 
                truncate=<optimized out>)
                at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:1315
            #4  0x000055912ab8a82d in srv_do_purge (
                n_total_purged=n_total_purged@entry=0x7f8426df9bf8)
                at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1838
            #5  0x000055912ab8a99f in purge_coordinator_callback_low ()
                at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1940
            #6  0x000055912ab8cf39 in purge_coordinator_callback ()
                at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1969
            

            I must find out why there was no attempt to access to the index page in between.

            marko Marko Mäkelä added a comment - I can reproduce this even with innodb_purge_threads=1 , that is, only the purge_coordinator_callback() would take care of purging history, not delegating any work to purge_worker_callback() threads. This will ease debugging a lot. The update of purge_sys.view gives the permission to remove some more history: #0 purge_sys_t::clone_oldest_view (this=this@entry=0x55912ae3dec0 <purge_sys>) at /mariadb/10.5/storage/innobase/include/trx0purge.h:261 #1 0x000055912aba740b in trx_purge (n_tasks=1, truncate=<optimized out>) at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:1289 #2 0x000055912ab8a82d in srv_do_purge ( n_total_purged=n_total_purged@entry=0x7f8426df9bf8) at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1838 #3 0x000055912ab8a99f in purge_coordinator_callback_low () at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1940 #4 0x000055912ab8cf39 in purge_coordinator_callback () at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1969 Later, the trx_sys.rseg_history_len is being decremented in the same thread: #2 0x000055912aba5c4b in trx_purge_truncate_history () at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:573 #3 0x000055912aba757c in trx_purge (n_tasks=<optimized out>, truncate=<optimized out>) at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:1315 #4 0x000055912ab8a82d in srv_do_purge ( n_total_purged=n_total_purged@entry=0x7f8426df9bf8) at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1838 #5 0x000055912ab8a99f in purge_coordinator_callback_low () at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1940 #6 0x000055912ab8cf39 in purge_coordinator_callback () at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1969 I must find out why there was no attempt to access to the index page in between.

            With the following patch (reverting a hunk from the original attempted fix), the test innodb.dml_purge would no longer fail, but the test innodb.undo_truncate_recover would fail to trigger a truncation of the undo log:

            diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
            index 0194bdc1380..e2dc55a285f 100644
            --- a/storage/innobase/trx/trx0purge.cc
            +++ b/storage/innobase/trx/trx0purge.cc
            @@ -489,7 +489,6 @@ trx_purge_truncate_rseg_history(
             		trx_purge_remove_log_hdr(rseg_hdr, block, hdr_addr.boffset,
             					 &mtr);
             
            -		rseg.needs_purge = false;
             		mutex_exit(&rseg.mutex);
             		mtr.commit();
             	}
            

            I must come up with better logic for determining when all undo log for a rollback segment has been processed. A undo log page may contain records that belong to multiple transactions. This undo page was also being used for the committed UPDATE statement, and its history was definitely not purged yet.

            marko Marko Mäkelä added a comment - With the following patch (reverting a hunk from the original attempted fix), the test innodb.dml_purge would no longer fail, but the test innodb.undo_truncate_recover would fail to trigger a truncation of the undo log: diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc index 0194bdc1380..e2dc55a285f 100644 --- a/storage/innobase/trx/trx0purge.cc +++ b/storage/innobase/trx/trx0purge.cc @@ -489,7 +489,6 @@ trx_purge_truncate_rseg_history( trx_purge_remove_log_hdr(rseg_hdr, block, hdr_addr.boffset, &mtr); - rseg.needs_purge = false; mutex_exit(&rseg.mutex); mtr.commit(); } I must come up with better logic for determining when all undo log for a rollback segment has been processed. A undo log page may contain records that belong to multiple transactions. This undo page was also being used for the committed UPDATE statement, and its history was definitely not purged yet.

            I think that we must replace the Boolean flag trx_rseg_t::needs_purge with a trx_id_t of the last transaction in the rollback segment that needs purging. This field must also be reliably recovered from the undo header pages on InnoDB startup. The field TRX_UNDO_NEEDS_PURGE a.k.a. TRX_UNDO_DEL_MARKS would become redundant. To avoid changing the file format, we will only remove some reads of that field, not any writes.

            Tomorrow, I hope to fix a small number of hanging tests. Special handling will be needed to avoid a shutdown hang with innodb_fast_shutdown=0 if any transactions in XA PREPARE state exist.

            marko Marko Mäkelä added a comment - I think that we must replace the Boolean flag trx_rseg_t::needs_purge with a trx_id_t of the last transaction in the rollback segment that needs purging. This field must also be reliably recovered from the undo header pages on InnoDB startup. The field TRX_UNDO_NEEDS_PURGE a.k.a. TRX_UNDO_DEL_MARKS would become redundant. To avoid changing the file format, we will only remove some reads of that field, not any writes. Tomorrow, I hope to fix a small number of hanging tests. Special handling will be needed to avoid a shutdown hang with innodb_fast_shutdown=0 if any transactions in XA PREPARE state exist.

            The tests innodb.xa_recovery innodb.recovery_shutdown would hang on shutdown in the 10.5 based development branch, but not 10.6. I will try to find the reason tomorrow.

            marko Marko Mäkelä added a comment - The tests innodb.xa_recovery innodb.recovery_shutdown would hang on shutdown in the 10.5 based development branch, but not 10.6. I will try to find the reason tomorrow.

            The fix of the hangs turned out to be this simple:

            diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
            index b9c959c6c33..b8f8235c17e 100644
            --- a/storage/innobase/trx/trx0purge.cc
            +++ b/storage/innobase/trx/trx0purge.cc
            @@ -556,10 +556,8 @@ static void trx_purge_truncate_history()
                   ut_ad(rseg->is_persistent());
                   mutex_enter(&rseg->mutex);
                   if (!rseg->trx_ref_count && rseg->needs_purge <= head.trx_no)
            -      {
                     rseg->needs_purge= 0;
            -        trx_purge_truncate_rseg_history(*rseg, head);
            -      }
            +      trx_purge_truncate_rseg_history(*rseg, head);
                   mutex_exit(&rseg->mutex);
                 }
               }
            

            When the function trx_purge_truncate_rseg_history() finds that the entire rollback segment is empty, it will call trx_purge_free_segment(). In that function, a debug assertion checks that !rseg->needs_purge, that is, no records that should have been purged would be lost. In case the rollback segment contains any active transactions or transactions in the XA PREPARE transactions, then trx_purge_free_segment() will not be called, but instead trx_purge_remove_log_hdr() will be called in order to remove individual log headers that belong to committed transactions.

            The shutdown in the tests would hang only in case the XA PREPARE transactions happened to be stored in the same rollback segment where some history remains to be purged. That was never the case when I ran the tests with ./mtr --rr and therefore the hang could not be reproduced.

            marko Marko Mäkelä added a comment - The fix of the hangs turned out to be this simple: diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc index b9c959c6c33..b8f8235c17e 100644 --- a/storage/innobase/trx/trx0purge.cc +++ b/storage/innobase/trx/trx0purge.cc @@ -556,10 +556,8 @@ static void trx_purge_truncate_history() ut_ad(rseg->is_persistent()); mutex_enter(&rseg->mutex); if (!rseg->trx_ref_count && rseg->needs_purge <= head.trx_no) - { rseg->needs_purge= 0; - trx_purge_truncate_rseg_history(*rseg, head); - } + trx_purge_truncate_rseg_history(*rseg, head); mutex_exit(&rseg->mutex); } } When the function trx_purge_truncate_rseg_history() finds that the entire rollback segment is empty, it will call trx_purge_free_segment() . In that function, a debug assertion checks that !rseg->needs_purge , that is, no records that should have been purged would be lost. In case the rollback segment contains any active transactions or transactions in the XA PREPARE transactions, then trx_purge_free_segment() will not be called, but instead trx_purge_remove_log_hdr() will be called in order to remove individual log headers that belong to committed transactions. The shutdown in the tests would hang only in case the XA PREPARE transactions happened to be stored in the same rollback segment where some history remains to be purged. That was never the case when I ran the tests with ./mtr --rr and therefore the hang could not be reproduced.

            Actually, the conditions on rseg are better "pushed down" to trx_purge_truncate_rseg_history(), so that we will only invoke trx_purge_truncate_rseg_history() when the rollback segment is empty of history and nothing will be added to it by currently active transactions.

            marko Marko Mäkelä added a comment - Actually, the conditions on rseg are better "pushed down" to trx_purge_truncate_rseg_history() , so that we will only invoke trx_purge_truncate_rseg_history() when the rollback segment is empty of history and nothing will be added to it by currently active transactions.

            origin/bb-10.5-MDEV-30671 d7f55a4ff5f409cbb238ba04a7ff7345e581e233 2023-02-24T10:15:30+02:00 (CMAKE_BUILD_TYPE=RelWithDebInfo) and
            origin/bb-10.6-MDEV-30671 96133583cc54f0da7524e4f657daf412e9a50209 2023-02-24T10:22:57+02:00 (build with ASAN, debug and -Og)
            performed well in RQG testing.
            - I had replay tests for the following error patterns     
                     [ 'TBR-1793-MDEV-30671', '#.{1,50} in mem_heap_alloc.+#.{1,50} in mem_heap_dup.+#.{1,50} in trx_undo_rec_copy.+#.{1,50} in trx_undo_get_undo_rec_low.+#.{1,50} in trx_undo_get_undo_rec.+\[ERROR\] mysqld got signal 6.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],
                    [ 'TBR-1793', '#.{1,50} mem_heap_dup.+#.{1,50} trx_undo_rec_copy.+#.{1,50} trx_undo_get_undo_rec_low.+#.{1,50} in trx_undo_get_undo_rec.+\[ERROR\] mysqld got 
            signal 11.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],
                     First element: Some name used for error classification by RQG.
                     Second element: Some fragment of the stack backtrace + output of RQG.
                 Replaying one of these two errors required in average ~ 200 test runs against some 10.5 without the fix (CMAKE_BUILD_TYPE=RelWithDebInfo).
                 There were no replays in ~ 1500 test runs against bb-10.5-MDEV-30671 (CMAKE_BUILD_TYPE=RelWithDebInfo).
             2. A few of the replay tests in 1. and also some tests of the test battery for broad range functional coverage failed. 
                 But these problems are known and in the current 10.5/10.6 trees too.
            

            mleich Matthias Leich added a comment - origin/bb-10.5-MDEV-30671 d7f55a4ff5f409cbb238ba04a7ff7345e581e233 2023-02-24T10:15:30+02:00 (CMAKE_BUILD_TYPE=RelWithDebInfo) and origin/bb-10.6-MDEV-30671 96133583cc54f0da7524e4f657daf412e9a50209 2023-02-24T10:22:57+02:00 (build with ASAN, debug and -Og) performed well in RQG testing. - I had replay tests for the following error patterns [ 'TBR-1793-MDEV-30671', '#.{1,50} in mem_heap_alloc.+#.{1,50} in mem_heap_dup.+#.{1,50} in trx_undo_rec_copy.+#.{1,50} in trx_undo_get_undo_rec_low.+#.{1,50} in trx_undo_get_undo_rec.+\[ERROR\] mysqld got signal 6.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ], [ 'TBR-1793', '#.{1,50} mem_heap_dup.+#.{1,50} trx_undo_rec_copy.+#.{1,50} trx_undo_get_undo_rec_low.+#.{1,50} in trx_undo_get_undo_rec.+\[ERROR\] mysqld got signal 11.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ], First element: Some name used for error classification by RQG. Second element: Some fragment of the stack backtrace + output of RQG. Replaying one of these two errors required in average ~ 200 test runs against some 10.5 without the fix (CMAKE_BUILD_TYPE=RelWithDebInfo). There were no replays in ~ 1500 test runs against bb-10.5-MDEV-30671 (CMAKE_BUILD_TYPE=RelWithDebInfo). 2. A few of the replay tests in 1. and also some tests of the test battery for broad range functional coverage failed. But these problems are known and in the current 10.5/10.6 trees too.

            I have pushed this to 10.5 and merged to 10.6. I do not expect major conflicts for later releases.

            marko Marko Mäkelä added a comment - I have pushed this to 10.5 and merged to 10.6. I do not expect major conflicts for later releases.
            TheWitness Larry Adams added a comment -

            Marko, can you confirm this made it to ES 10.6.14-9?

            TheWitness Larry Adams added a comment - Marko, can you confirm this made it to ES 10.6.14-9?

            TheWitness, generally, the Enterprise Server is based on the Community Server, with some additional changes. Because this fix was already in 10.6.13, it was included in 10.6.14-9 as well. There was no 10.6.13 based Enterprise Server release.

            For 10.6 but not 10.5, this change introduced an I/O stall that was fixed in MDEV-33213 (10.6.17).

            marko Marko Mäkelä added a comment - TheWitness , generally, the Enterprise Server is based on the Community Server, with some additional changes. Because this fix was already in 10.6.13, it was included in 10.6.14-9 as well. There was no 10.6.13 based Enterprise Server release. For 10.6 but not 10.5, this change introduced an I/O stall that was fixed in MDEV-33213 (10.6.17).
            TheWitness Larry Adams added a comment -

            Yup, waiting for 10.6.17 ES. Thanks for your continued support. Larry

            TheWitness Larry Adams added a comment - Yup, waiting for 10.6.17 ES. Thanks for your continued support. Larry

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.