[MDEV-32811] Potentially broken crash recovery if a mini-transaction frees a page, not modifying previously clean pages Created: 2023-11-15  Updated: 2023-11-16  Resolved: 2023-11-16

Status: Closed
Project: MariaDB Server
Component/s: Backup, Storage Engine - InnoDB
Affects Version/s: 10.11.2, 11.0.1, 10.8.7, 10.9.5, 10.10.3, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2, 11.3
Fix Version/s: 10.11.7, 11.0.5, 11.1.4, 11.2.3

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: corruption, recovery, rr-profile-analyzed

Issue Links:
Relates
relates to MDEV-32817 在最近将版本升级到10.11.5后,针对表进行频繁的读写操作不久后,出现i... Needs Feedback

 Description   

The MDEV-14795 test innodb.sys_truncate_debug in 11.2 would occasionally fail like this in the setup phase that is executing INSERT and using a debug setting to limit the number of records per page:

bb-11.2-release 0427c4739e7de5cd52d5e83479acf11ddb033ce6

innodb.sys_truncate_debug 'innodb'       w1 [ fail ]
        Test ended at 2023-11-14 20:04:26
 
CURRENT_TEST: innodb.sys_truncate_debug
mysqltest: At line 20: query 'INSERT INTO t1 SELECT seq, seq, seq FROM seq_1_to_16384' failed: <Unknown> (2013): Lost connection to server during query
buf/buf0flu.cc:766(buf_page_t::flush(bool, fil_space_t*))[0x1406b5d80]
buf/buf0flu.cc:1103(buf_flush_try_neighbors(fil_space_t*, page_id_t, buf_page_t*, bool, bool, unsigned long, unsigned long))[0x1406b7cd4]
buf/buf0flu.cc:1350(buf_flush_LRU_list_batch(unsigned long, bool, flush_counters_t*))[0x1406b9120]
buf/buf0flu.cc:1390(buf_do_LRU_batch(unsigned long, bool, flush_counters_t*))[0x1406b93d8]
buf/buf0flu.cc:1734(buf_flush_LRU(unsigned long, bool))[0x1406bae34]
buf/buf0lru.cc:506(buf_LRU_get_free_block(buf_LRU_get))[0x1406c657c]
fsp/fsp0fsp.cc:1063(fsp_page_create(fil_space_t*, unsigned int, mtr_t*))[0x140769ae4]
fsp/fsp0fsp.cc:2230(fseg_alloc_free_page_low(fil_space_t*, unsigned char*, buf_block_t*, unsigned int, unsigned char, bool, mtr_t*, mtr_t*, dberr_t*))[0x14076efe8]
fsp/fsp0fsp.cc:2284(fseg_alloc_free_page_general(unsigned char*, unsigned int, unsigned char, bool, mtr_t*, mtr_t*, dberr_t*))[0x14076f1a8]
btr/btr0btr.cc:567(btr_page_alloc(dict_index_t*, unsigned int, unsigned char, unsigned long, mtr_t*, mtr_t*, dberr_t*))[0x140624dd8]
btr/btr0btr.cc:2833(btr_page_split_and_insert(unsigned long, btr_cur_t*, unsigned short**, mem_block_info_t**, dtuple_t const*, unsigned long, mtr_t*, dberr_t*))[0x1406310b0]
btr/btr0cur.cc:2534(btr_cur_pessimistic_insert(unsigned long, btr_cur_t*, unsigned short**, mem_block_info_t**, dtuple_t*, unsigned char**, big_rec_t**, unsigned long, que_thr_t*, mtr_t*))[0x140660a5c]
row/row0ins.cc:3147(row_ins_sec_index_entry_low(unsigned long, btr_latch_mode, dict_index_t*, mem_block_info_t*, mem_block_info_t*, dtuple_t*, unsigned long, que_thr_t*))[0x1404ee318]
row/row0ins.cc:3332(row_ins_sec_index_entry(dict_index_t*, dtuple_t*, que_thr_t*, bool))[0x1404eebdc]
row/row0ins.cc:3377(row_ins_index_entry(dict_index_t*, dtuple_t*, que_thr_t*))[0x1404eef04]
row/row0ins.cc:3543(row_ins_index_entry_step(ins_node_t*, que_thr_t*))[0x1404efa68]
row/row0ins.cc:3660(row_ins(ins_node_t*, que_thr_t*))[0x1404eff9c]
row/row0ins.cc:3789(row_ins_step(que_thr_t*))[0x1404f0898]
row/row0mysql.cc:1314(row_insert_for_mysql(unsigned char const*, row_prebuilt_t*, ins_mode_t))[0x14051c328]
handler/ha_innodb.cc:7835(ha_innobase::write_row(unsigned char const*))[0x1402daa68]
sql/handler.cc:7852(handler::ha_write_row(unsigned char const*))[0x13fdb86e0]

thiru was able to reproduce this under rr on his system.

The reason for this failure is that the same mini-transaction is freeing, allocating, and freeing the same index page. The page initialization would clear the FIL_PAGE_LSN field on the page.

Before MDEV-27774 in 10.8 refactored some code, there was the function buf_flush_note_modification() that would set the FIL_PAGE_LSN on all blocks that a mini-transaction modified. Starting with 10.8, this logic is within mtr_t::commit() itself, in two copies. The second copy of the logic, in the !m_made_dirty code path, wrongly skips the setting of FIL_PAGE_LSN on blocks that are marked as freed:

            if (s >= buf_page_t::UNFIXED)
            {
              mach_write_to_8(bpage->frame + FIL_PAGE_LSN, m_commit_lsn);
              if (UNIV_LIKELY_NULL(bpage->zip.data))
                memcpy_aligned<8>(FIL_PAGE_LSN + bpage->zip.data,
                                  FIL_PAGE_LSN + bpage->frame, 8);
            }

The if condition must be removed here, so that the write-ahead logging protocol will be followed.

As far as I can tell, the user impact of this on non-debug builds is that the log checkpoint may be wrongly advanced too early, similar to MDEV-32757. This could cause crash recovery and backup to produce corrupted data.

The bug was introduced in this merge of MDEV-29383 and many other changes.



 Comments   
Comment by Marko Mäkelä [ 2023-11-15 ]

thiru, please push the if removal to 10.11. In the commit message, mention the 11.2 test case that caught it. Because the failure was sporadic (depending on when the buf_flush_page_cleaner() thread is being scheduled, I do not think that it makes sense to try to write a regression test for this.

Comment by Marko Mäkelä [ 2023-11-15 ]

The debug assertion failure that this test triggers may occur if the same mini-transaction frees, allocates, and frees a page. But, I think that the problem of invalid log checkpoint advances is possible whenever a page is freed just once.

Generated at Thu Feb 08 10:34:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.