Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32820

Race condition between trx_purge_free_segment() and trx_undo_create()

Details

    Description

      The test encryption.create_or_replace failed on a IA-32 debug build like this:

      bb-11.2-release 23651e27c672aa57d0dfba1251a4fc0abc6c95d6

      encryption.create_or_replace 'ctr,innodb' w3 [ fail ]
              Test ended at 2023-11-03 14:56:55
       
      CURRENT_TEST: encryption.create_or_replace
      mysqltest: At line 68: query 'CREATE OR REPLACE TABLE `create_or_replace_t` AS SELECT * FROM `table10_int_autoinc`' failed: <Unknown> (2013): Lost connection to server during query
      ...
      Version: '11.2.2-MariaDB-debug-log'  socket: '/mnt/buildbot/build/mariadb-11.2.2/mysql-test/var/tmp/3/mysqld.1.sock'  port: 16040  Source distribution
      2023-11-03 14:56:46 4 [Note] InnoDB: Creating #1 encryption thread id 2890922816 total threads 4.
      2023-11-03 14:56:46 4 [Note] InnoDB: Creating #2 encryption thread id 2882530112 total threads 4.
      2023-11-03 14:56:46 4 [Note] InnoDB: Creating #3 encryption thread id 2874137408 total threads 4.
      2023-11-03 14:56:46 4 [Note] InnoDB: Creating #4 encryption thread id 2865744704 total threads 4.
      mariadbd: /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/include/buf0buf.h:789: uint32_t buf_page_t::fix(uint32_t): Assertion `f >= FREED' failed.
      ...
      #9  0xb7042d8b in __assert_fail () from /lib/i386-linux-gnu/libc.so.6
      #10 0x810c6b5b in buf_page_t::fix (this=0xb0c00068, count=1) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/include/buf0buf.h:789
      #11 0x8119a97d in buf_page_create_low (page_id=..., zip_size=0, mtr=0xaefa5724, free_block=0xb09ff298) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/buf/buf0buf.cc:2861
      #12 0x8119b23f in buf_page_create (space=0x83a87ed8, offset=46, zip_size=0, mtr=0xaefa5724, free_block=0xb09ff298) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/buf/buf0buf.cc:3010
      #13 0x8123c91e in fsp_page_create (space=0x83a87ed8, offset=46, mtr=0xaefa5724) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:1066
      #14 0x8123cdaa in fsp_alloc_free_page (space=0x83a87ed8, hint=0, mtr=0xaefa5724, init_mtr=0xaefa5724, err=0xaefa56f4) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:1178
      #15 0x81240017 in fseg_alloc_free_page_low (space=0x83a87ed8, seg_inode=0xb0eca072 "", iblock=0xb0dffa08, hint=0, direction=111 'o', has_done_reservation=true, mtr=0xaefa5724, init_mtr=0xaefa5724, err=0xaefa56f4) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:2143
      #16 0x8123e8e5 in fseg_create (space=0x83a87ed8, byte_offset=60, mtr=0xaefa5724, err=0xaefa56f4, has_done_reservation=true, block=0x0) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:1755
      #17 0x811349b0 in trx_undo_seg_create (space=0x83a87ed8, rseg_hdr=0xb0fff870, id=0xaefa5638, err=0xaefa56f4, mtr=0xaefa5724) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0undo.cc:482
      #18 0x81136d11 in trx_undo_create (trx=0xb1a00b80, rseg=0x81e6f280 <trx_sys+22336>, undo=0xb1a010f4, err=0xaefa56f4, mtr=0xaefa5724) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0undo.cc:1180
      #19 0x81138602 in trx_undo_assign_low<false> (trx=0xb1a00b80, rseg=0x81e6f280 <trx_sys+22336>, undo=0xb1a010f4, mtr=0xaefa5724, err=0xaefa56f4) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0undo.cc:1372
      #20 0x8111321e in trx_undo_report_row_operation (thr=0xa7b65fc8, index=0xac50a7f8, clust_entry=0x0, update=0xa7b69158, cmpl_info=0, rec=0xb1138152 "testcreate_or_replace_t", offsets=0xaefa60ac, roll_ptr=0xaefa5ac8) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0rec.cc:1898
      

      Concurrently, another thread is trying to free pages from this undo log tablespace:

      bb-11.2-release 23651e27c672aa57d0dfba1251a4fc0abc6c95d6

      #8  0x80fc5b41 in fil_space_t::x_lock (this=0x83a87ed8) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/include/fil0fil.h:983
      #9  0x80fc1175 in mtr_t::x_lock_space (this=0xa845bc44, space=0x83a87ed8) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/mtr/mtr0mtr.cc:790
      #10 0x80fc1062 in mtr_t::x_lock_space (this=0xa845bc44, space_id=2) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/mtr/mtr0mtr.cc:776
      #11 0x812420f9 in fseg_free_step_not_header (header=0xb0c5c03c "", mtr=0xa845bc44, ahi=false) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:2905
      #12 0x810f5896 in trx_purge_free_segment (block=0xb0bfebb0, mtr=...) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0purge.cc:354
      #13 0x810f5ed7 in trx_purge_truncate_rseg_history (rseg=..., limit=..., all=true) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0purge.cc:459
      #14 0x810f65fd in purge_sys_t::iterator::free_history (this=0xa845bf80) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0purge.cc:546
      #15 0x810e0851 in purge_truncation_callback () at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/srv/srv0srv.cc:1101
      

      This task was added in MDEV-32050. Previously this was part of the purge_coordinator_callback and executed less frequently.

      Yet another thread is trying to write out pages of this undo tablespace:

      bb-11.2-release 23651e27c672aa57d0dfba1251a4fc0abc6c95d6

      #5  0x811af4d0 in inline_mysql_mutex_lock (that=0x81e72080 <buf_pool>, src_file=0x81800010 "/home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/buf/buf0flu.cc", src_line=1677) at /home/buildbot/buildbot/build/mariadb-11.2.2/include/mysql/psi/mysql_thread.h:746
      #6  0x811b6bd9 in buf_flush_list_space (space=0x83a87ed8, n_flushed=0xabcfdc8c) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/buf/buf0flu.cc:1677
      #7  0x81236df5 in fil_crypt_flush_space (state=0xabcfe008) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fil/fil0crypt.cc:1909
      #8  0x812372a3 in fil_crypt_complete_rotate_space (state=0xabcfe008) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fil/fil0crypt.cc:1994
      #9  0x812375af in fil_crypt_thread () at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fil/fil0crypt.cc:2066
      

      So far, I failed to reproduce this locally by running 2,800 repetitions of the test on a local 32-bit build. I started another campaign of 56,000 repetitions, and it is over ¼ through, with no failures.

      Attachments

        Issue Links

          Activity

            Next, I will try to reproduce this with a lower concurrency:

            encryption.create_or_replace 'cbc,innodb' w33 [ 1000 pass ]   3754
            --------------------------------------------------------------------------
            The servers were restarted 0 times
            Spent 256990.644 of 5111 seconds executing testcases
             
            Completed: All 56000 tests were successful.
            

            marko Marko Mäkelä added a comment - Next, I will try to reproduce this with a lower concurrency: encryption.create_or_replace 'cbc,innodb' w33 [ 1000 pass ] 3754 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 256990.644 of 5111 seconds executing testcases   Completed: All 56000 tests were successful.

            A lower-concurrency test did not work out either:

            encryption.create_or_replace 'ctr,innodb' w6 [ 1000 pass ]   2089
            --------------------------------------------------------------------------
            The servers were restarted 0 times
            Spent 15358.207 of 2119 seconds executing testcases
             
            Completed: All 8000 tests were successful.
            

            marko Marko Mäkelä added a comment - A lower-concurrency test did not work out either: encryption.create_or_replace 'ctr,innodb' w6 [ 1000 pass ] 2089 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 15358.207 of 2119 seconds executing testcases   Completed: All 8000 tests were successful.

            Further low-concurrency attempts to reproduce this failed, as did attempts to reproduce this with a 64-bit executable.

            The function trx_undo_create() acquires an exclusive latch on block = rseg->get(mtr, err) that it should contiguously hold until its grand-caller trx_undo_report_row_operation() invokes mtr_t::commit().

            The critical section trx_purge_truncate_rseg_history() is intended to be protected by both an exclusive latch on rseg_hdr (which corresponds to the above block) and an exclusive rseg.latch that was acquired in purge_sys_t::iterator::free_history(). At the time of the crash, trx_purge_free_segment() is expected to be holding these latches as well as an exclusive latch on the segment header page that is known as b in trx_purge_truncate_rseg_history(). It is waiting for an exclusive latch on the undo tablespace. I think that it is plausible to assume that trx_purge_free_segment() has already successfully invoked fseg_free_step_not_header() at least once.

            It turns out that when trx_purge_free_segment() enters its first while loop body, it fails to re-latch rseg_hdr between mini-transactions. To fix this, we might recursively acquire another exclusive rseg_hdr latch:

            diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
            index 88d36fd1b12..2d1e01d858a 100644
            --- a/storage/innobase/trx/trx0purge.cc
            +++ b/storage/innobase/trx/trx0purge.cc
            @@ -456,7 +456,10 @@ trx_purge_truncate_rseg_history(trx_rseg_t &rseg,
                 free_segment:
                   ut_ad(rseg.curr_size >= seg_size);
                   rseg.curr_size-= seg_size;
            +      /* Prevent concurrent trx_undo_create() in case of mtr.commit() below */
            +      rseg_hdr->page.lock.x_lock_recursive();
                   trx_purge_free_segment(b, mtr);
            +      rseg_hdr->page.lock.x_unlock();
                   break;
                 case TRX_UNDO_CACHED:
                   /* rseg.undo_cached must point to this page */
            

            Before the fix of MDEV-30753, trx_purge_free_segment() used to reacquire the exclusive rseg_hdr page latch if the mini-transaction was restarted. I had thought that this latch only matters in trx_purge_truncate_rseg_history().

            A better alternative would be to make trx_purge_free_segment() acquire the exclusive rseg_hdr page latch between each call to fseg_free_step_not_header(). It corresponds to what we did before MDEV-30753, and I think that it is safe to invoke trx_undo_create() between each call to fseg_free_step_not_header(), as long as the calls themselves are protected by exclusive latch on the rseg_hdr page.

            It would seem that the one-time CI failure in the Description must have involved at least 2 invocations to fseg_free_step_not_header() in trx_purge_free_segment() before a further invocation was blocked. A previous invocation of fseg_free_step_not_header() (after the first one) would have corrupted things for the concurrently running trx_undo_create().

            Presumably, in the failed CI run, some undo log had accumulated from previous tests, causing the need for at least 3 calls to fseg_free_step_not_header() within trx_purge_free_segment(). When the test is being run by itself, the undo segment can always be freed in fewer steps.

            marko Marko Mäkelä added a comment - Further low-concurrency attempts to reproduce this failed, as did attempts to reproduce this with a 64-bit executable. The function trx_undo_create() acquires an exclusive latch on block = rseg->get(mtr, err) that it should contiguously hold until its grand-caller trx_undo_report_row_operation() invokes mtr_t::commit() . The critical section trx_purge_truncate_rseg_history() is intended to be protected by both an exclusive latch on rseg_hdr (which corresponds to the above block ) and an exclusive rseg.latch that was acquired in purge_sys_t::iterator::free_history() . At the time of the crash, trx_purge_free_segment() is expected to be holding these latches as well as an exclusive latch on the segment header page that is known as b in trx_purge_truncate_rseg_history() . It is waiting for an exclusive latch on the undo tablespace. I think that it is plausible to assume that trx_purge_free_segment() has already successfully invoked fseg_free_step_not_header() at least once. It turns out that when trx_purge_free_segment() enters its first while loop body, it fails to re-latch rseg_hdr between mini-transactions. To fix this, we might recursively acquire another exclusive rseg_hdr latch: diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc index 88d36fd1b12..2d1e01d858a 100644 --- a/storage/innobase/trx/trx0purge.cc +++ b/storage/innobase/trx/trx0purge.cc @@ -456,7 +456,10 @@ trx_purge_truncate_rseg_history(trx_rseg_t &rseg, free_segment: ut_ad(rseg.curr_size >= seg_size); rseg.curr_size-= seg_size; + /* Prevent concurrent trx_undo_create() in case of mtr.commit() below */ + rseg_hdr->page.lock.x_lock_recursive(); trx_purge_free_segment(b, mtr); + rseg_hdr->page.lock.x_unlock(); break; case TRX_UNDO_CACHED: /* rseg.undo_cached must point to this page */ Before the fix of MDEV-30753 , trx_purge_free_segment() used to reacquire the exclusive rseg_hdr page latch if the mini-transaction was restarted. I had thought that this latch only matters in trx_purge_truncate_rseg_history() . A better alternative would be to make trx_purge_free_segment() acquire the exclusive rseg_hdr page latch between each call to fseg_free_step_not_header() . It corresponds to what we did before MDEV-30753 , and I think that it is safe to invoke trx_undo_create() between each call to fseg_free_step_not_header() , as long as the calls themselves are protected by exclusive latch on the rseg_hdr page. It would seem that the one-time CI failure in the Description must have involved at least 2 invocations to fseg_free_step_not_header() in trx_purge_free_segment() before a further invocation was blocked. A previous invocation of fseg_free_step_not_header() (after the first one) would have corrupted things for the concurrently running trx_undo_create() . Presumably, in the failed CI run, some undo log had accumulated from previous tests, causing the need for at least 3 calls to fseg_free_step_not_header() within trx_purge_free_segment() . When the test is being run by itself, the undo segment can always be freed in fewer steps.

            The fix looks good to me.

            vlad.lesin Vladislav Lesin added a comment - The fix looks good to me.

            There was a very low-probability race condition between freeing and allocating undo log pages that can corrupt InnoDB transaction metadata. Data could be retrieved by producing a logical dump after starting the server with innodb_force_recovery=3. MDEV-32757 is another bug with similar consequences.

            marko Marko Mäkelä added a comment - There was a very low-probability race condition between freeing and allocating undo log pages that can corrupt InnoDB transaction metadata. Data could be retrieved by producing a logical dump after starting the server with innodb_force_recovery=3 . MDEV-32757 is another bug with similar consequences.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.