[MDEV-32820] Race condition between trx_purge_free_segment() and trx_undo_create() Created: 2023-11-16  Updated: 2024-01-10  Resolved: 2023-11-21

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 11.1.1, 10.11.3, 11.0.2, 10.5.20, 10.6.13, 10.8.8, 10.9.6, 10.10.4, 11.2, 11.2.1
Fix Version/s: 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: corruption, race, regression

Issue Links:
Blocks
Problem/Incident
causes MDEV-33137 Assertion `end_lsn == page_lsn' faile... Closed
is caused by MDEV-30753 Possible corruption due to trx_purge_... Closed
Relates
relates to MDEV-32050 UNDO logs still growing for write-int... Closed

 Description   

The test encryption.create_or_replace failed on a IA-32 debug build like this:

bb-11.2-release 23651e27c672aa57d0dfba1251a4fc0abc6c95d6

encryption.create_or_replace 'ctr,innodb' w3 [ fail ]
        Test ended at 2023-11-03 14:56:55
 
CURRENT_TEST: encryption.create_or_replace
mysqltest: At line 68: query 'CREATE OR REPLACE TABLE `create_or_replace_t` AS SELECT * FROM `table10_int_autoinc`' failed: <Unknown> (2013): Lost connection to server during query
...
Version: '11.2.2-MariaDB-debug-log'  socket: '/mnt/buildbot/build/mariadb-11.2.2/mysql-test/var/tmp/3/mysqld.1.sock'  port: 16040  Source distribution
2023-11-03 14:56:46 4 [Note] InnoDB: Creating #1 encryption thread id 2890922816 total threads 4.
2023-11-03 14:56:46 4 [Note] InnoDB: Creating #2 encryption thread id 2882530112 total threads 4.
2023-11-03 14:56:46 4 [Note] InnoDB: Creating #3 encryption thread id 2874137408 total threads 4.
2023-11-03 14:56:46 4 [Note] InnoDB: Creating #4 encryption thread id 2865744704 total threads 4.
mariadbd: /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/include/buf0buf.h:789: uint32_t buf_page_t::fix(uint32_t): Assertion `f >= FREED' failed.
...
#9  0xb7042d8b in __assert_fail () from /lib/i386-linux-gnu/libc.so.6
#10 0x810c6b5b in buf_page_t::fix (this=0xb0c00068, count=1) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/include/buf0buf.h:789
#11 0x8119a97d in buf_page_create_low (page_id=..., zip_size=0, mtr=0xaefa5724, free_block=0xb09ff298) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/buf/buf0buf.cc:2861
#12 0x8119b23f in buf_page_create (space=0x83a87ed8, offset=46, zip_size=0, mtr=0xaefa5724, free_block=0xb09ff298) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/buf/buf0buf.cc:3010
#13 0x8123c91e in fsp_page_create (space=0x83a87ed8, offset=46, mtr=0xaefa5724) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:1066
#14 0x8123cdaa in fsp_alloc_free_page (space=0x83a87ed8, hint=0, mtr=0xaefa5724, init_mtr=0xaefa5724, err=0xaefa56f4) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:1178
#15 0x81240017 in fseg_alloc_free_page_low (space=0x83a87ed8, seg_inode=0xb0eca072 "", iblock=0xb0dffa08, hint=0, direction=111 'o', has_done_reservation=true, mtr=0xaefa5724, init_mtr=0xaefa5724, err=0xaefa56f4) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:2143
#16 0x8123e8e5 in fseg_create (space=0x83a87ed8, byte_offset=60, mtr=0xaefa5724, err=0xaefa56f4, has_done_reservation=true, block=0x0) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:1755
#17 0x811349b0 in trx_undo_seg_create (space=0x83a87ed8, rseg_hdr=0xb0fff870, id=0xaefa5638, err=0xaefa56f4, mtr=0xaefa5724) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0undo.cc:482
#18 0x81136d11 in trx_undo_create (trx=0xb1a00b80, rseg=0x81e6f280 <trx_sys+22336>, undo=0xb1a010f4, err=0xaefa56f4, mtr=0xaefa5724) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0undo.cc:1180
#19 0x81138602 in trx_undo_assign_low<false> (trx=0xb1a00b80, rseg=0x81e6f280 <trx_sys+22336>, undo=0xb1a010f4, mtr=0xaefa5724, err=0xaefa56f4) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0undo.cc:1372
#20 0x8111321e in trx_undo_report_row_operation (thr=0xa7b65fc8, index=0xac50a7f8, clust_entry=0x0, update=0xa7b69158, cmpl_info=0, rec=0xb1138152 "testcreate_or_replace_t", offsets=0xaefa60ac, roll_ptr=0xaefa5ac8) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0rec.cc:1898

Concurrently, another thread is trying to free pages from this undo log tablespace:

bb-11.2-release 23651e27c672aa57d0dfba1251a4fc0abc6c95d6

#8  0x80fc5b41 in fil_space_t::x_lock (this=0x83a87ed8) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/include/fil0fil.h:983
#9  0x80fc1175 in mtr_t::x_lock_space (this=0xa845bc44, space=0x83a87ed8) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/mtr/mtr0mtr.cc:790
#10 0x80fc1062 in mtr_t::x_lock_space (this=0xa845bc44, space_id=2) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/mtr/mtr0mtr.cc:776
#11 0x812420f9 in fseg_free_step_not_header (header=0xb0c5c03c "", mtr=0xa845bc44, ahi=false) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:2905
#12 0x810f5896 in trx_purge_free_segment (block=0xb0bfebb0, mtr=...) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0purge.cc:354
#13 0x810f5ed7 in trx_purge_truncate_rseg_history (rseg=..., limit=..., all=true) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0purge.cc:459
#14 0x810f65fd in purge_sys_t::iterator::free_history (this=0xa845bf80) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0purge.cc:546
#15 0x810e0851 in purge_truncation_callback () at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/srv/srv0srv.cc:1101

This task was added in MDEV-32050. Previously this was part of the purge_coordinator_callback and executed less frequently.

Yet another thread is trying to write out pages of this undo tablespace:

bb-11.2-release 23651e27c672aa57d0dfba1251a4fc0abc6c95d6

#5  0x811af4d0 in inline_mysql_mutex_lock (that=0x81e72080 <buf_pool>, src_file=0x81800010 "/home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/buf/buf0flu.cc", src_line=1677) at /home/buildbot/buildbot/build/mariadb-11.2.2/include/mysql/psi/mysql_thread.h:746
#6  0x811b6bd9 in buf_flush_list_space (space=0x83a87ed8, n_flushed=0xabcfdc8c) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/buf/buf0flu.cc:1677
#7  0x81236df5 in fil_crypt_flush_space (state=0xabcfe008) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fil/fil0crypt.cc:1909
#8  0x812372a3 in fil_crypt_complete_rotate_space (state=0xabcfe008) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fil/fil0crypt.cc:1994
#9  0x812375af in fil_crypt_thread () at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fil/fil0crypt.cc:2066

So far, I failed to reproduce this locally by running 2,800 repetitions of the test on a local 32-bit build. I started another campaign of 56,000 repetitions, and it is over ¼ through, with no failures.



 Comments   
Comment by Marko Mäkelä [ 2023-11-16 ]

Next, I will try to reproduce this with a lower concurrency:

encryption.create_or_replace 'cbc,innodb' w33 [ 1000 pass ]   3754
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 256990.644 of 5111 seconds executing testcases
 
Completed: All 56000 tests were successful.

Comment by Marko Mäkelä [ 2023-11-16 ]

A lower-concurrency test did not work out either:

encryption.create_or_replace 'ctr,innodb' w6 [ 1000 pass ]   2089
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 15358.207 of 2119 seconds executing testcases
 
Completed: All 8000 tests were successful.

Comment by Marko Mäkelä [ 2023-11-16 ]

Further low-concurrency attempts to reproduce this failed, as did attempts to reproduce this with a 64-bit executable.

The function trx_undo_create() acquires an exclusive latch on block = rseg->get(mtr, err) that it should contiguously hold until its grand-caller trx_undo_report_row_operation() invokes mtr_t::commit().

The critical section trx_purge_truncate_rseg_history() is intended to be protected by both an exclusive latch on rseg_hdr (which corresponds to the above block) and an exclusive rseg.latch that was acquired in purge_sys_t::iterator::free_history(). At the time of the crash, trx_purge_free_segment() is expected to be holding these latches as well as an exclusive latch on the segment header page that is known as b in trx_purge_truncate_rseg_history(). It is waiting for an exclusive latch on the undo tablespace. I think that it is plausible to assume that trx_purge_free_segment() has already successfully invoked fseg_free_step_not_header() at least once.

It turns out that when trx_purge_free_segment() enters its first while loop body, it fails to re-latch rseg_hdr between mini-transactions. To fix this, we might recursively acquire another exclusive rseg_hdr latch:

diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
index 88d36fd1b12..2d1e01d858a 100644
--- a/storage/innobase/trx/trx0purge.cc
+++ b/storage/innobase/trx/trx0purge.cc
@@ -456,7 +456,10 @@ trx_purge_truncate_rseg_history(trx_rseg_t &rseg,
     free_segment:
       ut_ad(rseg.curr_size >= seg_size);
       rseg.curr_size-= seg_size;
+      /* Prevent concurrent trx_undo_create() in case of mtr.commit() below */
+      rseg_hdr->page.lock.x_lock_recursive();
       trx_purge_free_segment(b, mtr);
+      rseg_hdr->page.lock.x_unlock();
       break;
     case TRX_UNDO_CACHED:
       /* rseg.undo_cached must point to this page */

Before the fix of MDEV-30753, trx_purge_free_segment() used to reacquire the exclusive rseg_hdr page latch if the mini-transaction was restarted. I had thought that this latch only matters in trx_purge_truncate_rseg_history().

A better alternative would be to make trx_purge_free_segment() acquire the exclusive rseg_hdr page latch between each call to fseg_free_step_not_header(). It corresponds to what we did before MDEV-30753, and I think that it is safe to invoke trx_undo_create() between each call to fseg_free_step_not_header(), as long as the calls themselves are protected by exclusive latch on the rseg_hdr page.

It would seem that the one-time CI failure in the Description must have involved at least 2 invocations to fseg_free_step_not_header() in trx_purge_free_segment() before a further invocation was blocked. A previous invocation of fseg_free_step_not_header() (after the first one) would have corrupted things for the concurrently running trx_undo_create().

Presumably, in the failed CI run, some undo log had accumulated from previous tests, causing the need for at least 3 calls to fseg_free_step_not_header() within trx_purge_free_segment(). When the test is being run by itself, the undo segment can always be freed in fewer steps.

Comment by Vladislav Lesin [ 2023-11-21 ]

The fix looks good to me.

Comment by Marko Mäkelä [ 2023-12-13 ]

There was a very low-probability race condition between freeing and allocating undo log pages that can corrupt InnoDB transaction metadata. Data could be retrieved by producing a logical dump after starting the server with innodb_force_recovery=3. MDEV-32757 is another bug with similar consequences.

Generated at Thu Feb 08 10:34:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.