[MDEV-29383] Assertion mysql_mutex_assert_owner(&log_sys.flush_order_mutex) failed in mtr_t::commit() - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.7.5, 10.6.9, 10.8.4, 10.9.2, 10.10.1
Fix Version/s: 10.6.10, 10.7.6, 10.8.5, 10.9.3, 10.10.2
Component/s: Storage Engine - InnoDB
Labels:
- recovery
- regression

Description

The assertion mysql_mutex_assert_owner(&log_sys.flush_order_mutex) failed while rolling back an INSERT statement:

bb-10.6-MDEV-27700bis af77ff651602ad33e7142a409829f04489081793
#3 0x0000000068236f36 in __GI___assert_fail (
assertion=0x55dbee0ba8a0 "((&(&log_sys.flush_order_mutex)->m_mutex)->count > 0 && pthread_equal(pthread_self(), (&(&log_sys.flush_order_mutex)->m_mutex)->thread))",
file=0x55dbee0ba600 "/data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/buf/buf0flu.cc", line=175, function=0x55dbee0ba740 "void buf_pool_t::insert_into_flush_list(buf_block_t*, lsn_t)") at assert.c:101
#4 0x000055dbecf8b273 in buf_pool_t::insert_into_flush_list (this=0x55dbeefb3d40 <buf_pool>, block=0x7f5902009b00, lsn=60915068) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/buf/buf0flu.cc:175
#5 0x000055dbecb97e72 in buf_flush_note_modification (b=0x7f5902009b00, start=60915068, end=60915132) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/include/buf0flu.h:129
#6 0x000055dbecbdcdb9 in ReleaseBlocks::operator() (this=0x3ea561b4ed90, slot=0x3ea561b4ef80) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/mtr/mtr0mtr.cc:379
#7 0x000055dbecbe7067 in CIterate<ReleaseBlocks const>::operator() (this=0x3ea561b4ed90, block=0x3ea561b4ef20) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/mtr/mtr0mtr.cc:62
#8 0x000055dbecbe127b in mtr_buf_t::for_each_block_in_reverse<CIterate<ReleaseBlocks const> > (this=0x3ea561b4eef8, functor=...) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/include/dyn0buf.h:379
#9 0x000055dbecbd126e in mtr_t::commit (this=0x3ea561b4eee0) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/mtr/mtr0mtr.cc:484
#10 0x000055dbed16663b in btr_pcur_commit_specify_mtr (pcur=0x61a000091f78, mtr=0x3ea561b4eee0) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/include/btr0pcur.inl:258
#11 0x000055dbed168777 in row_undo_ins_remove_clust_rec (node=0x61a000091f08) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/row/row0uins.cc:240
#12 0x000055dbed16af8d in row_undo_ins (node=0x61a000091f08, thr=0x6160047290c0) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/row/row0uins.cc:608

The cause of this appears to be that earlier in the mini-transaction, we failed to set the mtr_t::m_made_dirty flag when flagging a freed page as modified. The following change ought to fix that:

diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc

index 062eb650871..1104a0b52b8 100644

--- a/storage/innobase/mtr/mtr0mtr.cc

+++ b/storage/innobase/mtr/mtr0mtr.cc

@@ -1599,8 +1599,10 @@ void mtr_t::free(const fil_space_t &space, uint32_t offset)

   if (is_logged())

-    m_memo.for_each_block_in_reverse

-      (CIterate<MarkFreed>((MarkFreed{{space.id, offset}})));

+    CIterate<MarkFreed> mf{MarkFreed{{space.id, offset}}};

+    m_memo.for_each_block_in_reverse(mf);

+    if (mf.functor.freed && is_block_dirtied(mf.functor.freed))

+      m_made_dirty= true;

     m_log.close(log_write<FREE_PAGE>({space.id, offset}, nullptr));

If I understood it correctly, this scenario can only be triggered if the page had been written out to the data file before this mini-transaction modified something in the page (most likely, deleted the only record that had been inserted into the page).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

MDEV-29383.yy
10 kB
2022-08-25 11:28

Issue Links

is caused by

MDEV-13542 Crashing on a corrupted page is unhelpful

Closed

MDEV-18976 Implement a CHECKSUM redo log record for improved validation

Closed

relates to

MDEV-29364 CRITICAL - MariaDB 10.8.4 creating corrupted backups - "InnoDB: Failed to read page 4 from file 'database/table.ibd': Page read from tablespace is corrupted."

Closed

MDEV-29511 "Page read from tablespace is corrupted" when starting mariadb service after crash (signal 6)

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2022-08-25 11:09

For the record, the tree where the error was encountered was something that implemented a fix for ~~MDEV-29384~~.

If my understanding of the scenario is correct, then this error should occur more frequently when using the following setting:

SET GLOBAL innodb_max_dirty_pages_pct=0;

Marko Mäkelä added a comment - 2022-08-25 11:09 For the record, the tree where the error was encountered was something that implemented a fix for MDEV-29384 . If my understanding of the scenario is correct, then this error should occur more frequently when using the following setting: SET GLOBAL innodb_max_dirty_pages_pct=0;

Matthias Leich added a comment - 2022-08-25 11:33

# git clone https://github.com/mleich1/rqg --branch experimental RQG

# GIT_SHOW: HEAD -> experimental, origin/experimental 32e1b1a96b072f6a71c564d3e2d50d9cf2fb3561 2022-08-24T17:04:19+02:00

# rqg.pl  : Version 4.0.6 (2022-05)

# $RQG_HOME/rqg.pl \

# --duration=300 \

# --queries=10000000 \

# --no_mask \

# --seed=random \

# --gendata_sql=conf/mariadb/table_stress.sql \

# --max_gd_duration=1200 \

# --engine=InnoDB \

# --rpl_mode=none \

# --gendata=conf/mariadb/table_stress.zz \

# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \

# --mysqld=--innodb-lock-wait-timeout=50 \

# --mysqld=--log-bin \

# --mysqld=--connect_timeout=60 \

# --mysqld=--loose-idle_transaction_timeout=0 \

# --mysqld=--innodb-buffer-pool-size=8M \

# --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \

# --mysqld=--net_write_timeout=60 \

# --mysqld=--log_bin_trust_function_creators=1 \

# --mysqld=--sync-binlog=1 \

# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \

# --mysqld=--loose-innodb_read_only_compressed=OFF \

# --mysqld=--loose-plugin-load-add=provider_lz4.so \

# --mysqld=--wait_timeout=28800 \

# --mysqld=--innodb_adaptive_hash_index=on \

# --mysqld=--innodb_page_size=16K \

# --mysqld=--file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \

# --mysqld=--net_read_timeout=30 \

# --mysqld=--interactive_timeout=28800 \

# --mysqld=--plugin-load-add=file_key_management.so \

# --mysqld=--loose-max-statement-time=30 \

# --mysqld=--loose-table_lock_wait_timeout=50 \

# --mysqld=--log-output=none \

# --mysqld=--lock-wait-timeout=86400 \

# --mysqld=--loose-idle_write_transaction_timeout=0 \

# --mysqld=--slave_net_timeout=60 \

# --mysqld=--transaction-isolation=REPEATABLE-READ \

# --mysqld=--loose-idle_readonly_transaction_timeout=0 \

# --reporters=Backtrace,Deadlock1,ErrorLog \

# --validators=None \

# --threads=33 \

# --grammar= MDEV-29383.yy \

# --batch \

# --basedir1=/data/Server_bin/bb-10.6-MDEV-27700-bis_asan \

# --script_debug=_nix_ \

# --rr=Extended \

# --rr_options=--chaos --wait  \

# <local setings>

Errorpattern

[ 'TBR-1499-MDEV-29383-ExMDEV-28731', 'mysqld: .{1,250}buf0flu.cc:.{1,10}: void buf_pool_t::insert_into_flush_list.{1,250}: Assertion \`\(\(&\(&log_sys.flush_order_mutex\)->m_mutex\)->count > 0 && pthread_equal\(pthread_self\(\), \(&\(&log_sys.flush_order_mutex\)->m_mutex\)->thread\)\)\' failed.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],

Matthias Leich added a comment - 2022-08-25 11:33 # git clone https://github.com/mleich1/rqg --branch experimental RQG # # GIT_SHOW: HEAD -> experimental, origin/experimental 32e1b1a96b072f6a71c564d3e2d50d9cf2fb3561 2022-08-24T17:04:19+02:00 # rqg.pl : Version 4.0.6 (2022-05) # # $RQG_HOME/rqg.pl \ # --duration=300 \ # --queries=10000000 \ # --no_mask \ # --seed=random \ # --gendata_sql=conf/mariadb/table_stress.sql \ # --max_gd_duration=1200 \ # --engine=InnoDB \ # --rpl_mode=none \ # --gendata=conf/mariadb/table_stress.zz \ # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \ # --mysqld=--innodb-lock-wait-timeout=50 \ # --mysqld=--log-bin \ # --mysqld=--connect_timeout=60 \ # --mysqld=--loose-idle_transaction_timeout=0 \ # --mysqld=--innodb-buffer-pool-size=8M \ # --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \ # --mysqld=--net_write_timeout=60 \ # --mysqld=--log_bin_trust_function_creators=1 \ # --mysqld=--sync-binlog=1 \ # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \ # --mysqld=--loose-innodb_read_only_compressed=OFF \ # --mysqld=--loose-plugin-load-add=provider_lz4.so \ # --mysqld=--wait_timeout=28800 \ # --mysqld=--innodb_adaptive_hash_index=on \ # --mysqld=--innodb_page_size=16K \ # --mysqld=--file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \ # --mysqld=--net_read_timeout=30 \ # --mysqld=--interactive_timeout=28800 \ # --mysqld=--plugin-load-add=file_key_management.so \ # --mysqld=--loose-max-statement-time=30 \ # --mysqld=--loose-table_lock_wait_timeout=50 \ # --mysqld=--log-output=none \ # --mysqld=--lock-wait-timeout=86400 \ # --mysqld=--loose-idle_write_transaction_timeout=0 \ # --mysqld=--slave_net_timeout=60 \ # --mysqld=--transaction-isolation=REPEATABLE-READ \ # --mysqld=--loose-idle_readonly_transaction_timeout=0 \ # --reporters=Backtrace,Deadlock1,ErrorLog \ # --validators=None \ # --threads=33 \ # --grammar= MDEV-29383.yy \ # --batch \ # --basedir1=/data/Server_bin/bb-10.6-MDEV-27700-bis_asan \ # --script_debug=_nix_ \ # --rr=Extended \ # --rr_options=--chaos --wait \ # <local setings> Errorpattern [ 'TBR-1499-MDEV-29383-ExMDEV-28731', 'mysqld: .{1,250}buf0flu.cc:.{1,10}: void buf_pool_t::insert_into_flush_list.{1,250}: Assertion \`$\(&\(&log_sys.flush_order_mutex$->m_mutex\)->count > 0 && pthread_equal$pthread_self\($, $&\(&log_sys.flush_order_mutex$->m_mutex\)->thread\)\)\' failed.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],

Marko Mäkelä added a comment - 2022-08-26 06:35

mleich, were you able to reproduce this bug more often than just the single occurrence?

I got an idea for reproducing this, but unfortunately it failed to crash the server:

diff --git a/mysql-test/suite/innodb/t/page_cleaner.test b/mysql-test/suite/innodb/t/page_cleaner.test

index f597619aa08..79a973df25a 100644

--- a/mysql-test/suite/innodb/t/page_cleaner.test

+++ b/mysql-test/suite/innodb/t/page_cleaner.test

@@ -15,7 +15,8 @@ WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY';

 SET GLOBAL innodb_max_dirty_pages_pct=90.0;

-CREATE TABLE t ENGINE=InnoDB SELECT * FROM seq_1_to_10000;

+CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;

+BEGIN; INSERT INTO t SELECT * FROM seq_1_to_10000;

 SELECT variable_value>0 FROM information_schema.global_status

 WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY';

@@ -24,6 +25,8 @@ SET GLOBAL innodb_max_dirty_pages_pct=0.0;

 # Without the MDEV-24917 fix, we would time out here.

 --source include/wait_condition.inc

+ROLLBACK;

 DROP TABLE t;

 SET GLOBAL innodb_max_dirty_pages_pct = @save_pct;

A more elaborate attempt failed to reproduce it as well:

diff --git a/mysql-test/suite/innodb/t/page_cleaner.test b/mysql-test/suite/innodb/t/page_cleaner.test

index f597619aa08..508518edf2d 100644

--- a/mysql-test/suite/innodb/t/page_cleaner.test

+++ b/mysql-test/suite/innodb/t/page_cleaner.test

@@ -15,8 +15,11 @@ WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY';

 SET GLOBAL innodb_max_dirty_pages_pct=90.0;

-CREATE TABLE t ENGINE=InnoDB SELECT * FROM seq_1_to_10000;

+CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;

+connect (con1,localhost,root,,);

+BEGIN; INSERT INTO t SELECT * FROM seq_1_to_10000;

+connection default;

 SELECT variable_value>0 FROM information_schema.global_status

 WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY';

@@ -24,6 +27,11 @@ SET GLOBAL innodb_max_dirty_pages_pct=0.0;

 # Without the MDEV-24917 fix, we would time out here.

 --source include/wait_condition.inc

+SET GLOBAL innodb_max_dirty_pages_pct=90.0;

+disconnect con1;

+SET GLOBAL innodb_max_dirty_pages_pct=0.0;

+--source include/wait_condition.inc

 DROP TABLE t;

 SET GLOBAL innodb_max_dirty_pages_pct = @save_pct;

Outside debug builds, I think that the impact of this bug could be broken crash recovery or backups. Because I am unable to reproduce this, I can’t assess the real impact either.

Marko Mäkelä added a comment - 2022-08-26 06:35 mleich , were you able to reproduce this bug more often than just the single occurrence? I got an idea for reproducing this, but unfortunately it failed to crash the server: diff --git a/mysql-test/suite/innodb/t/page_cleaner.test b/mysql-test/suite/innodb/t/page_cleaner.test index f597619aa08..79a973df25a 100644 --- a/mysql-test/suite/innodb/t/page_cleaner.test +++ b/mysql-test/suite/innodb/t/page_cleaner.test @@ -15,7 +15,8 @@ WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY'; SET GLOBAL innodb_max_dirty_pages_pct=90.0; -CREATE TABLE t ENGINE=InnoDB SELECT * FROM seq_1_to_10000; +CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB; +BEGIN; INSERT INTO t SELECT * FROM seq_1_to_10000; SELECT variable_value>0 FROM information_schema.global_status WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY'; @@ -24,6 +25,8 @@ SET GLOBAL innodb_max_dirty_pages_pct=0.0; # Without the MDEV-24917 fix, we would time out here. --source include/wait_condition.inc + +ROLLBACK; DROP TABLE t; SET GLOBAL innodb_max_dirty_pages_pct = @save_pct; A more elaborate attempt failed to reproduce it as well: diff --git a/mysql-test/suite/innodb/t/page_cleaner.test b/mysql-test/suite/innodb/t/page_cleaner.test index f597619aa08..508518edf2d 100644 --- a/mysql-test/suite/innodb/t/page_cleaner.test +++ b/mysql-test/suite/innodb/t/page_cleaner.test @@ -15,8 +15,11 @@ WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY'; SET GLOBAL innodb_max_dirty_pages_pct=90.0; -CREATE TABLE t ENGINE=InnoDB SELECT * FROM seq_1_to_10000; +CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB; +connect (con1,localhost,root,,); +BEGIN; INSERT INTO t SELECT * FROM seq_1_to_10000; +connection default; SELECT variable_value>0 FROM information_schema.global_status WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY'; @@ -24,6 +27,11 @@ SET GLOBAL innodb_max_dirty_pages_pct=0.0; # Without the MDEV-24917 fix, we would time out here. --source include/wait_condition.inc +SET GLOBAL innodb_max_dirty_pages_pct=90.0; + +disconnect con1; +SET GLOBAL innodb_max_dirty_pages_pct=0.0; +--source include/wait_condition.inc DROP TABLE t; SET GLOBAL innodb_max_dirty_pages_pct = @save_pct; Outside debug builds, I think that the impact of this bug could be broken crash recovery or backups. Because I am unable to reproduce this, I can’t assess the real impact either.

Marko Mäkelä added a comment - 2022-08-26 07:05

With my test case modification and ./mtr --rr innodb.page_cleaner I found out that normally, mtr_t::m_made_dirty (indicating that the mini-transaction will have to add a previously clean page to buf_pool.flush_list) would be set earlier during the rollback:

10.6 76bb671e422d958f7252f428b39e109369e2679d
#0 0x00005558c5773bac in mtr_t::x_latch_at_savepoint (this=0x7fce3e3c8d60,
savepoint=16, block=0x7fce5c0ab510)
at /mariadb/10.5/storage/innobase/include/mtr0mtr.inl:152
#1 0x00005558c575c04b in btr_cur_search_to_nth_level_func (
index=0x7fce3019f3e0, level=0, tuple=0x7fce380260d0, mode=PAGE_CUR_LE,
latch_mode=10, cursor=0x7fce3805df50, ahi_latch=0x0, mtr=0x7fce3e3c8d60,
autoinc=0) at /mariadb/10.5/storage/innobase/btr/btr0cur.cc:2202
#2 0x00005558c5777e5e in btr_pcur_open_with_no_init_func (
index=0x7fce3019f3e0, tuple=0x7fce380260d0, mode=PAGE_CUR_LE,
latch_mode=522, cursor=0x7fce3805df50, ahi_latch=0x0, mtr=0x7fce3e3c8d60)
at /mariadb/10.5/storage/innobase/include/btr0pcur.inl:385
#3 0x00005558c57798e7 in btr_pcur_t::restore_position (this=0x7fce3805df50,
restore_latch_mode=522, mtr=0x7fce3e3c8d60)
at /mariadb/10.5/storage/innobase/btr/btr0pcur.cc:423
#4 0x00005558c589b49c in row_undo_ins_remove_clust_rec (node=0x7fce3805dee0)
at /mariadb/10.5/storage/innobase/row/row0uins.cc:210
#5 0x00005558c589ca72 in row_undo_ins (node=0x7fce3805dee0,
thr=0x7fce3805dd00) at /mariadb/10.5/storage/innobase/row/row0uins.cc:608

That is, most of the time, it would not matter that mtr_t::free() is not setting the flag, because it would already have been set.

Marko Mäkelä added a comment - 2022-08-26 07:05 With my test case modification and ./mtr --rr innodb.page_cleaner I found out that normally, mtr_t::m_made_dirty (indicating that the mini-transaction will have to add a previously clean page to buf_pool.flush_list ) would be set earlier during the rollback: 10.6 76bb671e422d958f7252f428b39e109369e2679d #0 0x00005558c5773bac in mtr_t::x_latch_at_savepoint (this=0x7fce3e3c8d60, savepoint=16, block=0x7fce5c0ab510) at /mariadb/10.5/storage/innobase/include/mtr0mtr.inl:152 #1 0x00005558c575c04b in btr_cur_search_to_nth_level_func ( index=0x7fce3019f3e0, level=0, tuple=0x7fce380260d0, mode=PAGE_CUR_LE, latch_mode=10, cursor=0x7fce3805df50, ahi_latch=0x0, mtr=0x7fce3e3c8d60, autoinc=0) at /mariadb/10.5/storage/innobase/btr/btr0cur.cc:2202 #2 0x00005558c5777e5e in btr_pcur_open_with_no_init_func ( index=0x7fce3019f3e0, tuple=0x7fce380260d0, mode=PAGE_CUR_LE, latch_mode=522, cursor=0x7fce3805df50, ahi_latch=0x0, mtr=0x7fce3e3c8d60) at /mariadb/10.5/storage/innobase/include/btr0pcur.inl:385 #3 0x00005558c57798e7 in btr_pcur_t::restore_position (this=0x7fce3805df50, restore_latch_mode=522, mtr=0x7fce3e3c8d60) at /mariadb/10.5/storage/innobase/btr/btr0pcur.cc:423 #4 0x00005558c589b49c in row_undo_ins_remove_clust_rec (node=0x7fce3805dee0) at /mariadb/10.5/storage/innobase/row/row0uins.cc:210 #5 0x00005558c589ca72 in row_undo_ins (node=0x7fce3805dee0, thr=0x7fce3805dd00) at /mariadb/10.5/storage/innobase/row/row0uins.cc:608 That is, most of the time, it would not matter that mtr_t::free() is not setting the flag, because it would already have been set.

Marko Mäkelä added a comment - 2022-08-26 08:15

I analyzed the trace once more to find out why exactly mtr_t::m_made_dirty had not been set in the first place. It turns out that the page write was completed and buf_page_t::oldest_modification_ reset at the time the ROLLBACK was waiting here:

#14 0x000055dbecdb6637 in sux_lock<ssux_lock_impl<true> >::x_lock (this=0x7f5902009b18, for_io=false) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/include/sux_lock.h:395

#15 0x000055dbecef3229 in btr_cur_latch_leaves (block=0x7f5902009b00, latch_mode=10, cursor=0x61a000091f78, mtr=0x3ea561b4eee0, latch_leaves=0x3ea561b4d5d0) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/btr/btr0cur.cc:284

There is a race condition in btr_cur_latch_leaves():

		mtr->memo_push(block, MTR_MEMO_PAGE_X_FIX);

		block->page.fix();

		block->page.lock.x_lock();

The mtr_t::memo_push() did invoke mtr_t::is_block_dirtied(), but at that point of time, the write of the dirty block had not been completed. The page write would be protected by a U-latch (previously known as SX-latch), which would conflict with the exclusive latch that we would be waiting for. Only at that point it would be safe to execute mtr_t::memo_push() so that mtr_t::m_made_dirty would be set correctly.

This code was refactored by me in ~~MDEV-13542~~. The following functions are affected by the same anti-pattern:

btr_cur_latch_leaves()
btr_store_big_rec_extern_fields()
btr_free_externally_stored_field()
trx_purge_free_segment()

I checked all calls to mtr_t::memo_push() with MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX, and it turns out that nothing was broken outside ~~MDEV-13542~~.

I no longer think that ~~MDEV-18976~~ caused this regression. The additional fix to mtr_t::free() is a good idea in any case.

Side note: Performance could be improved if did not set mtr_t::m_made_dirty already when registering MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX, but deferred it until the moment we set the MTR_MEMO_MODIFY flag on a block. In that way, even if a mini-transaction acquired a U or X latch on a page but never modified that page, mtr_t::commit() could avoid acquiring log_sys.flush_order_mutex. We only need that mutex when the mini-transaction actually needs to add a previously clean block to buf_pool.flush_list.

I am afraid that it is practically impossible to write a reproducible test case for this. The scheduling of page writes is largely invisible to higher-level code.

As far as I can tell, this bug can break crash recovery and backups (explaining ~~MDEV-29364~~ and ~~MDEV-29374~~).

Marko Mäkelä added a comment - 2022-08-26 08:15 I analyzed the trace once more to find out why exactly mtr_t::m_made_dirty had not been set in the first place. It turns out that the page write was completed and buf_page_t::oldest_modification_ reset at the time the ROLLBACK was waiting here: #14 0x000055dbecdb6637 in sux_lock<ssux_lock_impl<true> >::x_lock (this=0x7f5902009b18, for_io=false) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/include/sux_lock.h:395 #15 0x000055dbecef3229 in btr_cur_latch_leaves (block=0x7f5902009b00, latch_mode=10, cursor=0x61a000091f78, mtr=0x3ea561b4eee0, latch_leaves=0x3ea561b4d5d0) at /data/Server/bb-10.6-MDEV-27700-bis/storage/innobase/btr/btr0cur.cc:284 There is a race condition in btr_cur_latch_leaves() : mtr->memo_push(block, MTR_MEMO_PAGE_X_FIX); block->page.fix(); block->page.lock.x_lock(); The mtr_t::memo_push() did invoke mtr_t::is_block_dirtied() , but at that point of time, the write of the dirty block had not been completed. The page write would be protected by a U-latch (previously known as SX-latch), which would conflict with the exclusive latch that we would be waiting for. Only at that point it would be safe to execute mtr_t::memo_push() so that mtr_t::m_made_dirty would be set correctly. This code was refactored by me in MDEV-13542 . The following functions are affected by the same anti-pattern: btr_cur_latch_leaves() btr_store_big_rec_extern_fields() btr_free_externally_stored_field() trx_purge_free_segment() I checked all calls to mtr_t::memo_push() with MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX , and it turns out that nothing was broken outside MDEV-13542 . I no longer think that MDEV-18976 caused this regression. The additional fix to mtr_t::free() is a good idea in any case. Side note: Performance could be improved if did not set mtr_t::m_made_dirty already when registering MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX , but deferred it until the moment we set the MTR_MEMO_MODIFY flag on a block. In that way, even if a mini-transaction acquired a U or X latch on a page but never modified that page, mtr_t::commit() could avoid acquiring log_sys.flush_order_mutex . We only need that mutex when the mini-transaction actually needs to add a previously clean block to buf_pool.flush_list . I am afraid that it is practically impossible to write a reproducible test case for this. The scheduling of page writes is largely invisible to higher-level code. As far as I can tell, this bug can break crash recovery and backups (explaining MDEV-29364 and MDEV-29374 ).

Matthias Leich added a comment - 2022-08-26 10:40

Up till now I had no luck in reproducing the problem.

Matthias Leich added a comment - 2022-08-26 10:40 Up till now I had no luck in reproducing the problem.

Marko Mäkelä added a comment - 2022-09-15 15:55

I believe that in release builds, this failure could lead to a corrupted database after a seemingly successful backup or crash recovery, like in this failure:

10.10 c0a6ce61d88104db14f7107cb16edf475b46bffd
mariabackup.huge_lsn 'strict_full_crc32' w3 [ fail ]
Test ended at 2022-09-06 10:55:11
…
[01] 2022-09-06 10:55:09 Copying ./mysql/transaction_registry.ibd to /dev/shm/var/3/mysqld.1/data/mysql/transaction_registry.ibd
[01] 2022-09-06 10:55:09 ...done
[00] 2022-09-06 10:55:09 completed OK!
mysqltest: At line 92: query 'SELECT * FROM t' failed: ER_TABLE_CORRUPT (1877): Table test/t is corrupted. Please drop the table and recreate.

This was with a code revision that did not include the fixes of ~~MDEV-29383~~ or ~~MDEV-29374~~.

Marko Mäkelä added a comment - 2022-09-15 15:55 I believe that in release builds, this failure could lead to a corrupted database after a seemingly successful backup or crash recovery, like in this failure : 10.10 c0a6ce61d88104db14f7107cb16edf475b46bffd mariabackup.huge_lsn 'strict_full_crc32' w3 [ fail ] Test ended at 2022-09-06 10:55:11 … [01] 2022-09-06 10:55:09 Copying ./mysql/transaction_registry.ibd to /dev/shm/var/3/mysqld.1/data/mysql/transaction_registry.ibd [01] 2022-09-06 10:55:09 ...done [00] 2022-09-06 10:55:09 completed OK! mysqltest: At line 92: query 'SELECT * FROM t' failed: ER_TABLE_CORRUPT (1877): Table test/t is corrupted. Please drop the table and recreate. This was with a code revision that did not include the fixes of MDEV-29383 or MDEV-29374 .

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2022-08-25 10:55

Updated:: 2022-09-15 15:55

Resolved:: 2022-08-26 12:02

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration