[MDEV-26200] buf_pool.flush_list corruption in buffer pool resizing or with ROW_FORMAT=COMPRESSED Created: 2021-07-21  Updated: 2021-07-23  Resolved: 2021-07-22

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.3
Fix Version/s: 10.5.12, 10.6.4

Type: Bug Priority: Blocker
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: corruption, race, regression

Issue Links:
Duplicate
duplicates MDEV-26082 Assertion failure in buf_pool.watch_i... Closed
Problem/Incident
is caused by MDEV-25113 Reduce effect of parallel background ... Closed
Relates
relates to MDEV-26010 Assertion `lsn > 2' failed in buf_poo... Closed

 Description   

MDEV-25113 introduced a race condition that we already fixed up in MDEV-26010. It turns out that this fix was not sufficient. The test innodb_zip.wl5522_debug_zip as well as the buffer pool resizing tests would still occasionally fail in debug builds due to a corruption of buf_pool.flush_list. The symptom would be that buf_pool.flush_list.count disagrees with the length of the linked list chain.

The race condition might be unobservable on single-socket IA-32 and AMD64 setups. I observed it on a dual Intel® Xeon® E5-2630. Adding more calls to buf_flush_validate_low() would seem to reduce the probability of failure.

The safe procedure for relocating a block in buf_pool.flush_list seems to be the following:

  1. Acquire buf_pool.mutex.
  2. Acquire the exclusive buf_pool.page_hash.latch.
  3. Acquire buf_pool.flush_list_mutex.
  4. Copy the block descriptor.
  5. Invoke buf_flush_relocate_on_flush_list().
  6. Release buf_pool.flush_list_mutex.

In this way, the relocated block descriptor should be guaranteed to be in a consistent state. At least the test innodb_zip.wl5522_debug_zip,16k no longer triggered the debug assertion on my system.

For the record, the debug assertion looks like this:

10.6 61fcbed920c0ed1373725c4122af5a483ae7ffb2

innodb_zip.wl5522_debug_zip '16k,innodb' w26 [ fail ]
        Test ended at 2021-07-21 14:08:52CURRENT_TEST: innodb_zip.wl5522_debug_zip
mysqltest: At line 320: query 'UPDATE t1 SET c2 = c2 + c1' failed: <Unknown> (2013): Lost connection to server during query
#5  0x000055a7aed68614 in ut_dbg_assertion_failed (expr=0x55a7af241007 "count == list.count", file=<optimized out>, line=<optimized out>, line@entry=467) at /mariadb/10.6/storage/innobase/ut/ut0dbg.cc:60
#6  0x000055a7aedeba16 in ut_list_map<ut_list_base<buf_page_t, ut_list_node<buf_page_t> buf_page_t::*>, Check> (list=<optimized out>, functor=@0x7fc07cfef930: {<No data fields>}) at /mariadb/10.6/storage/innobase/include/ut0lst.h:467
#7  ut_list_validate<ut_list_base<buf_page_t, ut_list_node<buf_page_t> buf_page_t::*>, Check> (list=<optimized out>, functor=@0x7fc07cfef930: {<No data fields>}) at /mariadb/10.6/storage/innobase/include/ut0lst.h:496
#8  buf_flush_validate_low () at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:2475
#9  0x000055a7aedeab03 in buf_flush_validate_skip () at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:124
#10 buf_pool_t::insert_into_flush_list (this=0x55a7af831280 <buf_pool>, block=0x7fc07d36f578, lsn=158319367) at /mariadb/10.6/storage/innobase/buf/buf0flu.cc:204
#11 0x000055a7aec166ad in buf_flush_note_modification (block=0x7fc07d36f578, start_lsn=158319367, end_lsn=158323907) at /mariadb/10.6/storage/innobase/include/buf0flu.ic:62
#12 ReleaseBlocks::operator() (this=<optimized out>, this@entry=0x7fc07cfefa30, slot=slot@entry=0x7fc07cff01a8) at /mariadb/10.6/storage/innobase/mtr/mtr0mtr.cc:348
#13 0x000055a7aec13481 in CIterate<ReleaseBlocks const>::operator() (this=0x7fc07cfefa30, block=<optimized out>) at /mariadb/10.6/storage/innobase/mtr/mtr0mtr.cc:61
#14 mtr_buf_t::for_each_block_in_reverse<CIterate<ReleaseBlocks const> > (this=<optimized out>, this@entry=0x7fc07cff0160, functor=@0x7fc07cfefa30: {functor = {start = 158319367, end = 158323907, memo = @0x7fc07cff0160}}) at /mariadb/10.6/storage/innobase/include/dyn0buf.h:379
#15 0x000055a7aec0ff3c in mtr_t::commit (this=<optimized out>) at /mariadb/10.6/storage/innobase/mtr/mtr0mtr.cc:444


Generated at Thu Feb 08 09:43:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.