Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29383

Assertion mysql_mutex_assert_owner(&log_sys.flush_order_mutex) failed in mtr_t::commit()

Details

    Description

      The assertion mysql_mutex_assert_owner(&log_sys.flush_order_mutex) failed while rolling back an INSERT statement:

      bb-10.6-MDEV-27700bis af77ff651602ad33e7142a409829f04489081793

      #3  0x0000000068236f36 in __GI___assert_fail (
          assertion=0x55dbee0ba8a0 "((&(&log_sys.flush_order_mutex)->m_mutex)->count > 0 && pthread_equal(pthread_self(), (&(&log_sys.flush_order_mutex)->m_mutex)->thread))", 
          file=0x55dbee0ba600 "/data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/buf/buf0flu.cc", line=175, function=0x55dbee0ba740 "void buf_pool_t::insert_into_flush_list(buf_block_t*, lsn_t)") at assert.c:101
      #4  0x000055dbecf8b273 in buf_pool_t::insert_into_flush_list (this=0x55dbeefb3d40 <buf_pool>, block=0x7f5902009b00, lsn=60915068) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/buf/buf0flu.cc:175
      #5  0x000055dbecb97e72 in buf_flush_note_modification (b=0x7f5902009b00, start=60915068, end=60915132) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/include/buf0flu.h:129
      #6  0x000055dbecbdcdb9 in ReleaseBlocks::operator() (this=0x3ea561b4ed90, slot=0x3ea561b4ef80) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/mtr/mtr0mtr.cc:379
      #7  0x000055dbecbe7067 in CIterate<ReleaseBlocks const>::operator() (this=0x3ea561b4ed90, block=0x3ea561b4ef20) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/mtr/mtr0mtr.cc:62
      #8  0x000055dbecbe127b in mtr_buf_t::for_each_block_in_reverse<CIterate<ReleaseBlocks const> > (this=0x3ea561b4eef8, functor=...) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/include/dyn0buf.h:379
      #9  0x000055dbecbd126e in mtr_t::commit (this=0x3ea561b4eee0) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/mtr/mtr0mtr.cc:484
      #10 0x000055dbed16663b in btr_pcur_commit_specify_mtr (pcur=0x61a000091f78, mtr=0x3ea561b4eee0) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/include/btr0pcur.inl:258
      #11 0x000055dbed168777 in row_undo_ins_remove_clust_rec (node=0x61a000091f08) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/row/row0uins.cc:240
      #12 0x000055dbed16af8d in row_undo_ins (node=0x61a000091f08, thr=0x6160047290c0) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/row/row0uins.cc:608
      

      The cause of this appears to be that earlier in the mini-transaction, we failed to set the mtr_t::m_made_dirty flag when flagging a freed page as modified. The following change ought to fix that:

      diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc
      index 062eb650871..1104a0b52b8 100644
      --- a/storage/innobase/mtr/mtr0mtr.cc
      +++ b/storage/innobase/mtr/mtr0mtr.cc
      @@ -1599,8 +1599,10 @@ void mtr_t::free(const fil_space_t &space, uint32_t offset)
       
         if (is_logged())
         {
      -    m_memo.for_each_block_in_reverse
      -      (CIterate<MarkFreed>((MarkFreed{{space.id, offset}})));
      +    CIterate<MarkFreed> mf{MarkFreed{{space.id, offset}}};
      +    m_memo.for_each_block_in_reverse(mf);
      +    if (mf.functor.freed && is_block_dirtied(mf.functor.freed))
      +      m_made_dirty= true;
           m_log.close(log_write<FREE_PAGE>({space.id, offset}, nullptr));
         }
       }
      

      If I understood it correctly, this scenario can only be triggered if the page had been written out to the data file before this mini-transaction modified something in the page (most likely, deleted the only record that had been inserted into the page).

      Attachments

        Issue Links

          Activity

            For the record, the tree where the error was encountered was something that implemented a fix for MDEV-29384.

            If my understanding of the scenario is correct, then this error should occur more frequently when using the following setting:

            SET GLOBAL innodb_max_dirty_pages_pct=0;
            

            marko Marko Mäkelä added a comment - For the record, the tree where the error was encountered was something that implemented a fix for MDEV-29384 . If my understanding of the scenario is correct, then this error should occur more frequently when using the following setting: SET GLOBAL innodb_max_dirty_pages_pct=0;

            # git clone https://github.com/mleich1/rqg --branch experimental RQG
            #
            # GIT_SHOW: HEAD -> experimental, origin/experimental 32e1b1a96b072f6a71c564d3e2d50d9cf2fb3561 2022-08-24T17:04:19+02:00
            # rqg.pl  : Version 4.0.6 (2022-05)
            #
            # $RQG_HOME/rqg.pl \
            # --duration=300 \
            # --queries=10000000 \
            # --no_mask \
            # --seed=random \
            # --gendata_sql=conf/mariadb/table_stress.sql \
            # --max_gd_duration=1200 \
            # --engine=InnoDB \
            # --rpl_mode=none \
            # --gendata=conf/mariadb/table_stress.zz \
            # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
            # --mysqld=--innodb-lock-wait-timeout=50 \
            # --mysqld=--log-bin \
            # --mysqld=--connect_timeout=60 \
            # --mysqld=--loose-idle_transaction_timeout=0 \
            # --mysqld=--innodb-buffer-pool-size=8M \
            # --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
            # --mysqld=--net_write_timeout=60 \
            # --mysqld=--log_bin_trust_function_creators=1 \
            # --mysqld=--sync-binlog=1 \
            # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            # --mysqld=--loose-innodb_read_only_compressed=OFF \
            # --mysqld=--loose-plugin-load-add=provider_lz4.so \
            # --mysqld=--wait_timeout=28800 \
            # --mysqld=--innodb_adaptive_hash_index=on \
            # --mysqld=--innodb_page_size=16K \
            # --mysqld=--file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
            # --mysqld=--net_read_timeout=30 \
            # --mysqld=--interactive_timeout=28800 \
            # --mysqld=--plugin-load-add=file_key_management.so \
            # --mysqld=--loose-max-statement-time=30 \
            # --mysqld=--loose-table_lock_wait_timeout=50 \
            # --mysqld=--log-output=none \
            # --mysqld=--lock-wait-timeout=86400 \
            # --mysqld=--loose-idle_write_transaction_timeout=0 \
            # --mysqld=--slave_net_timeout=60 \
            # --mysqld=--transaction-isolation=REPEATABLE-READ \
            # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            # --reporters=Backtrace,Deadlock1,ErrorLog \
            # --validators=None \
            # --threads=33 \
            # --grammar= MDEV-29383.yy \
            # --batch \
            # --basedir1=/data/Server_bin/bb-10.6-MDEV-27700-bis_asan \
            # --script_debug=_nix_ \
            # --rr=Extended \
            # --rr_options=--chaos --wait  \
            # <local setings>
             
            Errorpattern
            [ 'TBR-1499-MDEV-29383-ExMDEV-28731', 'mysqld: .{1,250}buf0flu.cc:.{1,10}: void buf_pool_t::insert_into_flush_list.{1,250}: Assertion \`\(\(&\(&log_sys.flush_order_mutex\)->m_mutex\)->count > 0 && pthread_equal\(pthread_self\(\), \(&\(&log_sys.flush_order_mutex\)->m_mutex\)->thread\)\)\' failed.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],
            

            mleich Matthias Leich added a comment - # git clone https://github.com/mleich1/rqg --branch experimental RQG # # GIT_SHOW: HEAD -> experimental, origin/experimental 32e1b1a96b072f6a71c564d3e2d50d9cf2fb3561 2022-08-24T17:04:19+02:00 # rqg.pl : Version 4.0.6 (2022-05) # # $RQG_HOME/rqg.pl \ # --duration=300 \ # --queries=10000000 \ # --no_mask \ # --seed=random \ # --gendata_sql=conf/mariadb/table_stress.sql \ # --max_gd_duration=1200 \ # --engine=InnoDB \ # --rpl_mode=none \ # --gendata=conf/mariadb/table_stress.zz \ # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \ # --mysqld=--innodb-lock-wait-timeout=50 \ # --mysqld=--log-bin \ # --mysqld=--connect_timeout=60 \ # --mysqld=--loose-idle_transaction_timeout=0 \ # --mysqld=--innodb-buffer-pool-size=8M \ # --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \ # --mysqld=--net_write_timeout=60 \ # --mysqld=--log_bin_trust_function_creators=1 \ # --mysqld=--sync-binlog=1 \ # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \ # --mysqld=--loose-innodb_read_only_compressed=OFF \ # --mysqld=--loose-plugin-load-add=provider_lz4.so \ # --mysqld=--wait_timeout=28800 \ # --mysqld=--innodb_adaptive_hash_index=on \ # --mysqld=--innodb_page_size=16K \ # --mysqld=--file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \ # --mysqld=--net_read_timeout=30 \ # --mysqld=--interactive_timeout=28800 \ # --mysqld=--plugin-load-add=file_key_management.so \ # --mysqld=--loose-max-statement-time=30 \ # --mysqld=--loose-table_lock_wait_timeout=50 \ # --mysqld=--log-output=none \ # --mysqld=--lock-wait-timeout=86400 \ # --mysqld=--loose-idle_write_transaction_timeout=0 \ # --mysqld=--slave_net_timeout=60 \ # --mysqld=--transaction-isolation=REPEATABLE-READ \ # --mysqld=--loose-idle_readonly_transaction_timeout=0 \ # --reporters=Backtrace,Deadlock1,ErrorLog \ # --validators=None \ # --threads=33 \ # --grammar= MDEV-29383.yy \ # --batch \ # --basedir1=/data/Server_bin/bb-10.6-MDEV-27700-bis_asan \ # --script_debug=_nix_ \ # --rr=Extended \ # --rr_options=--chaos --wait \ # <local setings>   Errorpattern [ 'TBR-1499-MDEV-29383-ExMDEV-28731', 'mysqld: .{1,250}buf0flu.cc:.{1,10}: void buf_pool_t::insert_into_flush_list.{1,250}: Assertion \`\(\(&\(&log_sys.flush_order_mutex\)->m_mutex\)->count > 0 && pthread_equal\(pthread_self\(\), \(&\(&log_sys.flush_order_mutex\)->m_mutex\)->thread\)\)\' failed.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],

            mleich, were you able to reproduce this bug more often than just the single occurrence?

            I got an idea for reproducing this, but unfortunately it failed to crash the server:

            diff --git a/mysql-test/suite/innodb/t/page_cleaner.test b/mysql-test/suite/innodb/t/page_cleaner.test
            index f597619aa08..79a973df25a 100644
            --- a/mysql-test/suite/innodb/t/page_cleaner.test
            +++ b/mysql-test/suite/innodb/t/page_cleaner.test
            @@ -15,7 +15,8 @@ WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY';
             
             SET GLOBAL innodb_max_dirty_pages_pct=90.0;
             
            -CREATE TABLE t ENGINE=InnoDB SELECT * FROM seq_1_to_10000;
            +CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;
            +BEGIN; INSERT INTO t SELECT * FROM seq_1_to_10000;
             
             SELECT variable_value>0 FROM information_schema.global_status
             WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY';
            @@ -24,6 +25,8 @@ SET GLOBAL innodb_max_dirty_pages_pct=0.0;
             
             # Without the MDEV-24917 fix, we would time out here.
             --source include/wait_condition.inc
            +
            +ROLLBACK;
             DROP TABLE t;
             
             SET GLOBAL innodb_max_dirty_pages_pct = @save_pct;
            

            A more elaborate attempt failed to reproduce it as well:

            diff --git a/mysql-test/suite/innodb/t/page_cleaner.test b/mysql-test/suite/innodb/t/page_cleaner.test
            index f597619aa08..508518edf2d 100644
            --- a/mysql-test/suite/innodb/t/page_cleaner.test
            +++ b/mysql-test/suite/innodb/t/page_cleaner.test
            @@ -15,8 +15,11 @@ WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY';
             
             SET GLOBAL innodb_max_dirty_pages_pct=90.0;
             
            -CREATE TABLE t ENGINE=InnoDB SELECT * FROM seq_1_to_10000;
            +CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;
            +connect (con1,localhost,root,,);
            +BEGIN; INSERT INTO t SELECT * FROM seq_1_to_10000;
             
            +connection default;
             SELECT variable_value>0 FROM information_schema.global_status
             WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY';
             
            @@ -24,6 +27,11 @@ SET GLOBAL innodb_max_dirty_pages_pct=0.0;
             
             # Without the MDEV-24917 fix, we would time out here.
             --source include/wait_condition.inc
            +SET GLOBAL innodb_max_dirty_pages_pct=90.0;
            +
            +disconnect con1;
            +SET GLOBAL innodb_max_dirty_pages_pct=0.0;
            +--source include/wait_condition.inc
             DROP TABLE t;
             
             SET GLOBAL innodb_max_dirty_pages_pct = @save_pct;
            

            Outside debug builds, I think that the impact of this bug could be broken crash recovery or backups. Because I am unable to reproduce this, I can’t assess the real impact either.

            marko Marko Mäkelä added a comment - mleich , were you able to reproduce this bug more often than just the single occurrence? I got an idea for reproducing this, but unfortunately it failed to crash the server: diff --git a/mysql-test/suite/innodb/t/page_cleaner.test b/mysql-test/suite/innodb/t/page_cleaner.test index f597619aa08..79a973df25a 100644 --- a/mysql-test/suite/innodb/t/page_cleaner.test +++ b/mysql-test/suite/innodb/t/page_cleaner.test @@ -15,7 +15,8 @@ WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY'; SET GLOBAL innodb_max_dirty_pages_pct=90.0; -CREATE TABLE t ENGINE=InnoDB SELECT * FROM seq_1_to_10000; +CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB; +BEGIN; INSERT INTO t SELECT * FROM seq_1_to_10000; SELECT variable_value>0 FROM information_schema.global_status WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY'; @@ -24,6 +25,8 @@ SET GLOBAL innodb_max_dirty_pages_pct=0.0; # Without the MDEV-24917 fix, we would time out here. --source include/wait_condition.inc + +ROLLBACK; DROP TABLE t; SET GLOBAL innodb_max_dirty_pages_pct = @save_pct; A more elaborate attempt failed to reproduce it as well: diff --git a/mysql-test/suite/innodb/t/page_cleaner.test b/mysql-test/suite/innodb/t/page_cleaner.test index f597619aa08..508518edf2d 100644 --- a/mysql-test/suite/innodb/t/page_cleaner.test +++ b/mysql-test/suite/innodb/t/page_cleaner.test @@ -15,8 +15,11 @@ WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY'; SET GLOBAL innodb_max_dirty_pages_pct=90.0; -CREATE TABLE t ENGINE=InnoDB SELECT * FROM seq_1_to_10000; +CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB; +connect (con1,localhost,root,,); +BEGIN; INSERT INTO t SELECT * FROM seq_1_to_10000; +connection default; SELECT variable_value>0 FROM information_schema.global_status WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY'; @@ -24,6 +27,11 @@ SET GLOBAL innodb_max_dirty_pages_pct=0.0; # Without the MDEV-24917 fix, we would time out here. --source include/wait_condition.inc +SET GLOBAL innodb_max_dirty_pages_pct=90.0; + +disconnect con1; +SET GLOBAL innodb_max_dirty_pages_pct=0.0; +--source include/wait_condition.inc DROP TABLE t; SET GLOBAL innodb_max_dirty_pages_pct = @save_pct; Outside debug builds, I think that the impact of this bug could be broken crash recovery or backups. Because I am unable to reproduce this, I can’t assess the real impact either.

            With my test case modification and ./mtr --rr innodb.page_cleaner I found out that normally, mtr_t::m_made_dirty (indicating that the mini-transaction will have to add a previously clean page to buf_pool.flush_list) would be set earlier during the rollback:

            10.6 76bb671e422d958f7252f428b39e109369e2679d

            #0  0x00005558c5773bac in mtr_t::x_latch_at_savepoint (this=0x7fce3e3c8d60, 
                savepoint=16, block=0x7fce5c0ab510)
                at /mariadb/10.5/storage/innobase/include/mtr0mtr.inl:152
            #1  0x00005558c575c04b in btr_cur_search_to_nth_level_func (
                index=0x7fce3019f3e0, level=0, tuple=0x7fce380260d0, mode=PAGE_CUR_LE, 
                latch_mode=10, cursor=0x7fce3805df50, ahi_latch=0x0, mtr=0x7fce3e3c8d60, 
                autoinc=0) at /mariadb/10.5/storage/innobase/btr/btr0cur.cc:2202
            #2  0x00005558c5777e5e in btr_pcur_open_with_no_init_func (
                index=0x7fce3019f3e0, tuple=0x7fce380260d0, mode=PAGE_CUR_LE, 
                latch_mode=522, cursor=0x7fce3805df50, ahi_latch=0x0, mtr=0x7fce3e3c8d60)
                at /mariadb/10.5/storage/innobase/include/btr0pcur.inl:385
            #3  0x00005558c57798e7 in btr_pcur_t::restore_position (this=0x7fce3805df50, 
                restore_latch_mode=522, mtr=0x7fce3e3c8d60)
                at /mariadb/10.5/storage/innobase/btr/btr0pcur.cc:423
            #4  0x00005558c589b49c in row_undo_ins_remove_clust_rec (node=0x7fce3805dee0)
                at /mariadb/10.5/storage/innobase/row/row0uins.cc:210
            #5  0x00005558c589ca72 in row_undo_ins (node=0x7fce3805dee0, 
                thr=0x7fce3805dd00) at /mariadb/10.5/storage/innobase/row/row0uins.cc:608
            

            That is, most of the time, it would not matter that mtr_t::free() is not setting the flag, because it would already have been set.

            marko Marko Mäkelä added a comment - With my test case modification and ./mtr --rr innodb.page_cleaner I found out that normally, mtr_t::m_made_dirty (indicating that the mini-transaction will have to add a previously clean page to buf_pool.flush_list ) would be set earlier during the rollback: 10.6 76bb671e422d958f7252f428b39e109369e2679d #0 0x00005558c5773bac in mtr_t::x_latch_at_savepoint (this=0x7fce3e3c8d60, savepoint=16, block=0x7fce5c0ab510) at /mariadb/10.5/storage/innobase/include/mtr0mtr.inl:152 #1 0x00005558c575c04b in btr_cur_search_to_nth_level_func ( index=0x7fce3019f3e0, level=0, tuple=0x7fce380260d0, mode=PAGE_CUR_LE, latch_mode=10, cursor=0x7fce3805df50, ahi_latch=0x0, mtr=0x7fce3e3c8d60, autoinc=0) at /mariadb/10.5/storage/innobase/btr/btr0cur.cc:2202 #2 0x00005558c5777e5e in btr_pcur_open_with_no_init_func ( index=0x7fce3019f3e0, tuple=0x7fce380260d0, mode=PAGE_CUR_LE, latch_mode=522, cursor=0x7fce3805df50, ahi_latch=0x0, mtr=0x7fce3e3c8d60) at /mariadb/10.5/storage/innobase/include/btr0pcur.inl:385 #3 0x00005558c57798e7 in btr_pcur_t::restore_position (this=0x7fce3805df50, restore_latch_mode=522, mtr=0x7fce3e3c8d60) at /mariadb/10.5/storage/innobase/btr/btr0pcur.cc:423 #4 0x00005558c589b49c in row_undo_ins_remove_clust_rec (node=0x7fce3805dee0) at /mariadb/10.5/storage/innobase/row/row0uins.cc:210 #5 0x00005558c589ca72 in row_undo_ins (node=0x7fce3805dee0, thr=0x7fce3805dd00) at /mariadb/10.5/storage/innobase/row/row0uins.cc:608 That is, most of the time, it would not matter that mtr_t::free() is not setting the flag, because it would already have been set.

            I analyzed the trace once more to find out why exactly mtr_t::m_made_dirty had not been set in the first place. It turns out that the page write was completed and buf_page_t::oldest_modification_ reset at the time the ROLLBACK was waiting here:

            #14 0x000055dbecdb6637 in sux_lock<ssux_lock_impl<true> >::x_lock (this=0x7f5902009b18, for_io=false) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/include/sux_lock.h:395
            #15 0x000055dbecef3229 in btr_cur_latch_leaves (block=0x7f5902009b00, latch_mode=10, cursor=0x61a000091f78, mtr=0x3ea561b4eee0, latch_leaves=0x3ea561b4d5d0) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/btr/btr0cur.cc:284
            

            There is a race condition in btr_cur_latch_leaves():

            		mtr->memo_push(block, MTR_MEMO_PAGE_X_FIX);
             
            		block->page.fix();
            		block->page.lock.x_lock();
            

            The mtr_t::memo_push() did invoke mtr_t::is_block_dirtied(), but at that point of time, the write of the dirty block had not been completed. The page write would be protected by a U-latch (previously known as SX-latch), which would conflict with the exclusive latch that we would be waiting for. Only at that point it would be safe to execute mtr_t::memo_push() so that mtr_t::m_made_dirty would be set correctly.

            This code was refactored by me in MDEV-13542. The following functions are affected by the same anti-pattern:

            • btr_cur_latch_leaves()
            • btr_store_big_rec_extern_fields()
            • btr_free_externally_stored_field()
            • trx_purge_free_segment()

            I checked all calls to mtr_t::memo_push() with MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX, and it turns out that nothing was broken outside MDEV-13542.

            I no longer think that MDEV-18976 caused this regression. The additional fix to mtr_t::free() is a good idea in any case.

            Side note: Performance could be improved if did not set mtr_t::m_made_dirty already when registering MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX, but deferred it until the moment we set the MTR_MEMO_MODIFY flag on a block. In that way, even if a mini-transaction acquired a U or X latch on a page but never modified that page, mtr_t::commit() could avoid acquiring log_sys.flush_order_mutex. We only need that mutex when the mini-transaction actually needs to add a previously clean block to buf_pool.flush_list.

            I am afraid that it is practically impossible to write a reproducible test case for this. The scheduling of page writes is largely invisible to higher-level code.

            As far as I can tell, this bug can break crash recovery and backups (explaining MDEV-29364 and MDEV-29374).

            marko Marko Mäkelä added a comment - I analyzed the trace once more to find out why exactly mtr_t::m_made_dirty had not been set in the first place. It turns out that the page write was completed and buf_page_t::oldest_modification_ reset at the time the ROLLBACK was waiting here: #14 0x000055dbecdb6637 in sux_lock<ssux_lock_impl<true> >::x_lock (this=0x7f5902009b18, for_io=false) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/include/sux_lock.h:395 #15 0x000055dbecef3229 in btr_cur_latch_leaves (block=0x7f5902009b00, latch_mode=10, cursor=0x61a000091f78, mtr=0x3ea561b4eee0, latch_leaves=0x3ea561b4d5d0) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/btr/btr0cur.cc:284 There is a race condition in btr_cur_latch_leaves() : mtr->memo_push(block, MTR_MEMO_PAGE_X_FIX);   block->page.fix(); block->page.lock.x_lock(); The mtr_t::memo_push() did invoke mtr_t::is_block_dirtied() , but at that point of time, the write of the dirty block had not been completed. The page write would be protected by a U-latch (previously known as SX-latch), which would conflict with the exclusive latch that we would be waiting for. Only at that point it would be safe to execute mtr_t::memo_push() so that mtr_t::m_made_dirty would be set correctly. This code was refactored by me in MDEV-13542 . The following functions are affected by the same anti-pattern: btr_cur_latch_leaves() btr_store_big_rec_extern_fields() btr_free_externally_stored_field() trx_purge_free_segment() I checked all calls to mtr_t::memo_push() with MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX , and it turns out that nothing was broken outside MDEV-13542 . I no longer think that MDEV-18976 caused this regression. The additional fix to mtr_t::free() is a good idea in any case. Side note: Performance could be improved if did not set mtr_t::m_made_dirty already when registering MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX , but deferred it until the moment we set the MTR_MEMO_MODIFY flag on a block. In that way, even if a mini-transaction acquired a U or X latch on a page but never modified that page, mtr_t::commit() could avoid acquiring log_sys.flush_order_mutex . We only need that mutex when the mini-transaction actually needs to add a previously clean block to buf_pool.flush_list . I am afraid that it is practically impossible to write a reproducible test case for this. The scheduling of page writes is largely invisible to higher-level code. As far as I can tell, this bug can break crash recovery and backups (explaining MDEV-29364 and MDEV-29374 ).

            Up till now I had no luck in reproducing the problem.

            mleich Matthias Leich added a comment - Up till now I had no luck in reproducing the problem.

            I believe that in release builds, this failure could lead to a corrupted database after a seemingly successful backup or crash recovery, like in this failure:

            10.10 c0a6ce61d88104db14f7107cb16edf475b46bffd

            mariabackup.huge_lsn 'strict_full_crc32' w3 [ fail ]
                    Test ended at 2022-09-06 10:55:11
            …
            [01] 2022-09-06 10:55:09 Copying ./mysql/transaction_registry.ibd to /dev/shm/var/3/mysqld.1/data/mysql/transaction_registry.ibd
            [01] 2022-09-06 10:55:09         ...done
            [00] 2022-09-06 10:55:09 completed OK!
            mysqltest: At line 92: query 'SELECT * FROM t' failed: ER_TABLE_CORRUPT (1877): Table test/t is corrupted. Please drop the table and recreate.
            

            This was with a code revision that did not include the fixes of MDEV-29383 or MDEV-29374.

            marko Marko Mäkelä added a comment - I believe that in release builds, this failure could lead to a corrupted database after a seemingly successful backup or crash recovery, like in this failure : 10.10 c0a6ce61d88104db14f7107cb16edf475b46bffd mariabackup.huge_lsn 'strict_full_crc32' w3 [ fail ] Test ended at 2022-09-06 10:55:11 … [01] 2022-09-06 10:55:09 Copying ./mysql/transaction_registry.ibd to /dev/shm/var/3/mysqld.1/data/mysql/transaction_registry.ibd [01] 2022-09-06 10:55:09 ...done [00] 2022-09-06 10:55:09 completed OK! mysqltest: At line 92: query 'SELECT * FROM t' failed: ER_TABLE_CORRUPT (1877): Table test/t is corrupted. Please drop the table and recreate. This was with a code revision that did not include the fixes of MDEV-29383 or MDEV-29374 .

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.