[MDEV-29383] Assertion mysql_mutex_assert_owner(&log_sys.flush_order_mutex) failed in mtr_t::commit() Created: 2022-08-25  Updated: 2022-09-15  Resolved: 2022-08-26

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.9, 10.7.5, 10.8.4, 10.9.2, 10.10.1
Fix Version/s: 10.6.10, 10.7.6, 10.8.5, 10.9.3, 10.10.2

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: recovery, regression

Attachments: File MDEV-29383.yy    
Issue Links:
Blocks
Problem/Incident
is caused by MDEV-13542 Crashing on a corrupted page is unhel... Closed
is caused by MDEV-18976 Implement a CHECKSUM redo log record ... Closed
Relates
relates to MDEV-29364 CRITICAL - MariaDB 10.8.4 creating co... Closed
relates to MDEV-29511 "Page read from tablespace is corrupt... Closed

 Description   

The assertion mysql_mutex_assert_owner(&log_sys.flush_order_mutex) failed while rolling back an INSERT statement:

bb-10.6-MDEV-27700bis af77ff651602ad33e7142a409829f04489081793

#3  0x0000000068236f36 in __GI___assert_fail (
    assertion=0x55dbee0ba8a0 "((&(&log_sys.flush_order_mutex)->m_mutex)->count > 0 && pthread_equal(pthread_self(), (&(&log_sys.flush_order_mutex)->m_mutex)->thread))", 
    file=0x55dbee0ba600 "/data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/buf/buf0flu.cc", line=175, function=0x55dbee0ba740 "void buf_pool_t::insert_into_flush_list(buf_block_t*, lsn_t)") at assert.c:101
#4  0x000055dbecf8b273 in buf_pool_t::insert_into_flush_list (this=0x55dbeefb3d40 <buf_pool>, block=0x7f5902009b00, lsn=60915068) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/buf/buf0flu.cc:175
#5  0x000055dbecb97e72 in buf_flush_note_modification (b=0x7f5902009b00, start=60915068, end=60915132) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/include/buf0flu.h:129
#6  0x000055dbecbdcdb9 in ReleaseBlocks::operator() (this=0x3ea561b4ed90, slot=0x3ea561b4ef80) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/mtr/mtr0mtr.cc:379
#7  0x000055dbecbe7067 in CIterate<ReleaseBlocks const>::operator() (this=0x3ea561b4ed90, block=0x3ea561b4ef20) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/mtr/mtr0mtr.cc:62
#8  0x000055dbecbe127b in mtr_buf_t::for_each_block_in_reverse<CIterate<ReleaseBlocks const> > (this=0x3ea561b4eef8, functor=...) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/include/dyn0buf.h:379
#9  0x000055dbecbd126e in mtr_t::commit (this=0x3ea561b4eee0) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/mtr/mtr0mtr.cc:484
#10 0x000055dbed16663b in btr_pcur_commit_specify_mtr (pcur=0x61a000091f78, mtr=0x3ea561b4eee0) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/include/btr0pcur.inl:258
#11 0x000055dbed168777 in row_undo_ins_remove_clust_rec (node=0x61a000091f08) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/row/row0uins.cc:240
#12 0x000055dbed16af8d in row_undo_ins (node=0x61a000091f08, thr=0x6160047290c0) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/row/row0uins.cc:608

The cause of this appears to be that earlier in the mini-transaction, we failed to set the mtr_t::m_made_dirty flag when flagging a freed page as modified. The following change ought to fix that:

diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc
index 062eb650871..1104a0b52b8 100644
--- a/storage/innobase/mtr/mtr0mtr.cc
+++ b/storage/innobase/mtr/mtr0mtr.cc
@@ -1599,8 +1599,10 @@ void mtr_t::free(const fil_space_t &space, uint32_t offset)
 
   if (is_logged())
   {
-    m_memo.for_each_block_in_reverse
-      (CIterate<MarkFreed>((MarkFreed{{space.id, offset}})));
+    CIterate<MarkFreed> mf{MarkFreed{{space.id, offset}}};
+    m_memo.for_each_block_in_reverse(mf);
+    if (mf.functor.freed && is_block_dirtied(mf.functor.freed))
+      m_made_dirty= true;
     m_log.close(log_write<FREE_PAGE>({space.id, offset}, nullptr));
   }
 }

If I understood it correctly, this scenario can only be triggered if the page had been written out to the data file before this mini-transaction modified something in the page (most likely, deleted the only record that had been inserted into the page).



 Comments   
Comment by Marko Mäkelä [ 2022-08-25 ]

For the record, the tree where the error was encountered was something that implemented a fix for MDEV-29384.

If my understanding of the scenario is correct, then this error should occur more frequently when using the following setting:

SET GLOBAL innodb_max_dirty_pages_pct=0;

Comment by Matthias Leich [ 2022-08-25 ]

# git clone https://github.com/mleich1/rqg --branch experimental RQG
#
# GIT_SHOW: HEAD -> experimental, origin/experimental 32e1b1a96b072f6a71c564d3e2d50d9cf2fb3561 2022-08-24T17:04:19+02:00
# rqg.pl  : Version 4.0.6 (2022-05)
#
# $RQG_HOME/rqg.pl \
# --duration=300 \
# --queries=10000000 \
# --no_mask \
# --seed=random \
# --gendata_sql=conf/mariadb/table_stress.sql \
# --max_gd_duration=1200 \
# --engine=InnoDB \
# --rpl_mode=none \
# --gendata=conf/mariadb/table_stress.zz \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --mysqld=--log-bin \
# --mysqld=--connect_timeout=60 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--innodb-buffer-pool-size=8M \
# --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--loose-plugin-load-add=provider_lz4.so \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--innodb_adaptive_hash_index=on \
# --mysqld=--innodb_page_size=16K \
# --mysqld=--file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-max-statement-time=30 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--log-output=none \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--transaction-isolation=REPEATABLE-READ \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --reporters=Backtrace,Deadlock1,ErrorLog \
# --validators=None \
# --threads=33 \
# --grammar= MDEV-29383.yy \
# --batch \
# --basedir1=/data/Server_bin/bb-10.6-MDEV-27700-bis_asan \
# --script_debug=_nix_ \
# --rr=Extended \
# --rr_options=--chaos --wait  \
# <local setings>
 
Errorpattern
[ 'TBR-1499-MDEV-29383-ExMDEV-28731', 'mysqld: .{1,250}buf0flu.cc:.{1,10}: void buf_pool_t::insert_into_flush_list.{1,250}: Assertion \`\(\(&\(&log_sys.flush_order_mutex\)->m_mutex\)->count > 0 && pthread_equal\(pthread_self\(\), \(&\(&log_sys.flush_order_mutex\)->m_mutex\)->thread\)\)\' failed.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],

Comment by Marko Mäkelä [ 2022-08-26 ]

mleich, were you able to reproduce this bug more often than just the single occurrence?

I got an idea for reproducing this, but unfortunately it failed to crash the server:

diff --git a/mysql-test/suite/innodb/t/page_cleaner.test b/mysql-test/suite/innodb/t/page_cleaner.test
index f597619aa08..79a973df25a 100644
--- a/mysql-test/suite/innodb/t/page_cleaner.test
+++ b/mysql-test/suite/innodb/t/page_cleaner.test
@@ -15,7 +15,8 @@ WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY';
 
 SET GLOBAL innodb_max_dirty_pages_pct=90.0;
 
-CREATE TABLE t ENGINE=InnoDB SELECT * FROM seq_1_to_10000;
+CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;
+BEGIN; INSERT INTO t SELECT * FROM seq_1_to_10000;
 
 SELECT variable_value>0 FROM information_schema.global_status
 WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY';
@@ -24,6 +25,8 @@ SET GLOBAL innodb_max_dirty_pages_pct=0.0;
 
 # Without the MDEV-24917 fix, we would time out here.
 --source include/wait_condition.inc
+
+ROLLBACK;
 DROP TABLE t;
 
 SET GLOBAL innodb_max_dirty_pages_pct = @save_pct;

A more elaborate attempt failed to reproduce it as well:

diff --git a/mysql-test/suite/innodb/t/page_cleaner.test b/mysql-test/suite/innodb/t/page_cleaner.test
index f597619aa08..508518edf2d 100644
--- a/mysql-test/suite/innodb/t/page_cleaner.test
+++ b/mysql-test/suite/innodb/t/page_cleaner.test
@@ -15,8 +15,11 @@ WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY';
 
 SET GLOBAL innodb_max_dirty_pages_pct=90.0;
 
-CREATE TABLE t ENGINE=InnoDB SELECT * FROM seq_1_to_10000;
+CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;
+connect (con1,localhost,root,,);
+BEGIN; INSERT INTO t SELECT * FROM seq_1_to_10000;
 
+connection default;
 SELECT variable_value>0 FROM information_schema.global_status
 WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY';
 
@@ -24,6 +27,11 @@ SET GLOBAL innodb_max_dirty_pages_pct=0.0;
 
 # Without the MDEV-24917 fix, we would time out here.
 --source include/wait_condition.inc
+SET GLOBAL innodb_max_dirty_pages_pct=90.0;
+
+disconnect con1;
+SET GLOBAL innodb_max_dirty_pages_pct=0.0;
+--source include/wait_condition.inc
 DROP TABLE t;
 
 SET GLOBAL innodb_max_dirty_pages_pct = @save_pct;

Outside debug builds, I think that the impact of this bug could be broken crash recovery or backups. Because I am unable to reproduce this, I can’t assess the real impact either.

Comment by Marko Mäkelä [ 2022-08-26 ]

With my test case modification and ./mtr --rr innodb.page_cleaner I found out that normally, mtr_t::m_made_dirty (indicating that the mini-transaction will have to add a previously clean page to buf_pool.flush_list) would be set earlier during the rollback:

10.6 76bb671e422d958f7252f428b39e109369e2679d

#0  0x00005558c5773bac in mtr_t::x_latch_at_savepoint (this=0x7fce3e3c8d60, 
    savepoint=16, block=0x7fce5c0ab510)
    at /mariadb/10.5/storage/innobase/include/mtr0mtr.inl:152
#1  0x00005558c575c04b in btr_cur_search_to_nth_level_func (
    index=0x7fce3019f3e0, level=0, tuple=0x7fce380260d0, mode=PAGE_CUR_LE, 
    latch_mode=10, cursor=0x7fce3805df50, ahi_latch=0x0, mtr=0x7fce3e3c8d60, 
    autoinc=0) at /mariadb/10.5/storage/innobase/btr/btr0cur.cc:2202
#2  0x00005558c5777e5e in btr_pcur_open_with_no_init_func (
    index=0x7fce3019f3e0, tuple=0x7fce380260d0, mode=PAGE_CUR_LE, 
    latch_mode=522, cursor=0x7fce3805df50, ahi_latch=0x0, mtr=0x7fce3e3c8d60)
    at /mariadb/10.5/storage/innobase/include/btr0pcur.inl:385
#3  0x00005558c57798e7 in btr_pcur_t::restore_position (this=0x7fce3805df50, 
    restore_latch_mode=522, mtr=0x7fce3e3c8d60)
    at /mariadb/10.5/storage/innobase/btr/btr0pcur.cc:423
#4  0x00005558c589b49c in row_undo_ins_remove_clust_rec (node=0x7fce3805dee0)
    at /mariadb/10.5/storage/innobase/row/row0uins.cc:210
#5  0x00005558c589ca72 in row_undo_ins (node=0x7fce3805dee0, 
    thr=0x7fce3805dd00) at /mariadb/10.5/storage/innobase/row/row0uins.cc:608

That is, most of the time, it would not matter that mtr_t::free() is not setting the flag, because it would already have been set.

Comment by Marko Mäkelä [ 2022-08-26 ]

I analyzed the trace once more to find out why exactly mtr_t::m_made_dirty had not been set in the first place. It turns out that the page write was completed and buf_page_t::oldest_modification_ reset at the time the ROLLBACK was waiting here:

#14 0x000055dbecdb6637 in sux_lock<ssux_lock_impl<true> >::x_lock (this=0x7f5902009b18, for_io=false) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/include/sux_lock.h:395
#15 0x000055dbecef3229 in btr_cur_latch_leaves (block=0x7f5902009b00, latch_mode=10, cursor=0x61a000091f78, mtr=0x3ea561b4eee0, latch_leaves=0x3ea561b4d5d0) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/btr/btr0cur.cc:284

There is a race condition in btr_cur_latch_leaves():

		mtr->memo_push(block, MTR_MEMO_PAGE_X_FIX);
 
		block->page.fix();
		block->page.lock.x_lock();

The mtr_t::memo_push() did invoke mtr_t::is_block_dirtied(), but at that point of time, the write of the dirty block had not been completed. The page write would be protected by a U-latch (previously known as SX-latch), which would conflict with the exclusive latch that we would be waiting for. Only at that point it would be safe to execute mtr_t::memo_push() so that mtr_t::m_made_dirty would be set correctly.

This code was refactored by me in MDEV-13542. The following functions are affected by the same anti-pattern:

  • btr_cur_latch_leaves()
  • btr_store_big_rec_extern_fields()
  • btr_free_externally_stored_field()
  • trx_purge_free_segment()

I checked all calls to mtr_t::memo_push() with MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX, and it turns out that nothing was broken outside MDEV-13542.

I no longer think that MDEV-18976 caused this regression. The additional fix to mtr_t::free() is a good idea in any case.

Side note: Performance could be improved if did not set mtr_t::m_made_dirty already when registering MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX, but deferred it until the moment we set the MTR_MEMO_MODIFY flag on a block. In that way, even if a mini-transaction acquired a U or X latch on a page but never modified that page, mtr_t::commit() could avoid acquiring log_sys.flush_order_mutex. We only need that mutex when the mini-transaction actually needs to add a previously clean block to buf_pool.flush_list.

I am afraid that it is practically impossible to write a reproducible test case for this. The scheduling of page writes is largely invisible to higher-level code.

As far as I can tell, this bug can break crash recovery and backups (explaining MDEV-29364 and MDEV-29374).

Comment by Matthias Leich [ 2022-08-26 ]

Up till now I had no luck in reproducing the problem.

Comment by Marko Mäkelä [ 2022-09-15 ]

I believe that in release builds, this failure could lead to a corrupted database after a seemingly successful backup or crash recovery, like in this failure:

10.10 c0a6ce61d88104db14f7107cb16edf475b46bffd

mariabackup.huge_lsn 'strict_full_crc32' w3 [ fail ]
        Test ended at 2022-09-06 10:55:11
[01] 2022-09-06 10:55:09 Copying ./mysql/transaction_registry.ibd to /dev/shm/var/3/mysqld.1/data/mysql/transaction_registry.ibd
[01] 2022-09-06 10:55:09         ...done
[00] 2022-09-06 10:55:09 completed OK!
mysqltest: At line 92: query 'SELECT * FROM t' failed: ER_TABLE_CORRUPT (1877): Table test/t is corrupted. Please drop the table and recreate.

This was with a code revision that did not include the fixes of MDEV-29383 or MDEV-29374.

Generated at Thu Feb 08 10:08:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.