Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.11.2, 11.0.1, 10.8.7, 10.9.5, 10.10.3, 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2, 11.3(EOL)
Description
The MDEV-14795 test innodb.sys_truncate_debug in 11.2 would occasionally fail like this in the setup phase that is executing INSERT and using a debug setting to limit the number of records per page:
bb-11.2-release 0427c4739e7de5cd52d5e83479acf11ddb033ce6 |
innodb.sys_truncate_debug 'innodb' w1 [ fail ]
|
Test ended at 2023-11-14 20:04:26
|
|
CURRENT_TEST: innodb.sys_truncate_debug
|
mysqltest: At line 20: query 'INSERT INTO t1 SELECT seq, seq, seq FROM seq_1_to_16384' failed: <Unknown> (2013): Lost connection to server during query
|
…
|
buf/buf0flu.cc:766(buf_page_t::flush(bool, fil_space_t*))[0x1406b5d80]
|
buf/buf0flu.cc:1103(buf_flush_try_neighbors(fil_space_t*, page_id_t, buf_page_t*, bool, bool, unsigned long, unsigned long))[0x1406b7cd4]
|
buf/buf0flu.cc:1350(buf_flush_LRU_list_batch(unsigned long, bool, flush_counters_t*))[0x1406b9120]
|
buf/buf0flu.cc:1390(buf_do_LRU_batch(unsigned long, bool, flush_counters_t*))[0x1406b93d8]
|
buf/buf0flu.cc:1734(buf_flush_LRU(unsigned long, bool))[0x1406bae34]
|
buf/buf0lru.cc:506(buf_LRU_get_free_block(buf_LRU_get))[0x1406c657c]
|
fsp/fsp0fsp.cc:1063(fsp_page_create(fil_space_t*, unsigned int, mtr_t*))[0x140769ae4]
|
fsp/fsp0fsp.cc:2230(fseg_alloc_free_page_low(fil_space_t*, unsigned char*, buf_block_t*, unsigned int, unsigned char, bool, mtr_t*, mtr_t*, dberr_t*))[0x14076efe8]
|
fsp/fsp0fsp.cc:2284(fseg_alloc_free_page_general(unsigned char*, unsigned int, unsigned char, bool, mtr_t*, mtr_t*, dberr_t*))[0x14076f1a8]
|
btr/btr0btr.cc:567(btr_page_alloc(dict_index_t*, unsigned int, unsigned char, unsigned long, mtr_t*, mtr_t*, dberr_t*))[0x140624dd8]
|
btr/btr0btr.cc:2833(btr_page_split_and_insert(unsigned long, btr_cur_t*, unsigned short**, mem_block_info_t**, dtuple_t const*, unsigned long, mtr_t*, dberr_t*))[0x1406310b0]
|
btr/btr0cur.cc:2534(btr_cur_pessimistic_insert(unsigned long, btr_cur_t*, unsigned short**, mem_block_info_t**, dtuple_t*, unsigned char**, big_rec_t**, unsigned long, que_thr_t*, mtr_t*))[0x140660a5c]
|
row/row0ins.cc:3147(row_ins_sec_index_entry_low(unsigned long, btr_latch_mode, dict_index_t*, mem_block_info_t*, mem_block_info_t*, dtuple_t*, unsigned long, que_thr_t*))[0x1404ee318]
|
row/row0ins.cc:3332(row_ins_sec_index_entry(dict_index_t*, dtuple_t*, que_thr_t*, bool))[0x1404eebdc]
|
row/row0ins.cc:3377(row_ins_index_entry(dict_index_t*, dtuple_t*, que_thr_t*))[0x1404eef04]
|
row/row0ins.cc:3543(row_ins_index_entry_step(ins_node_t*, que_thr_t*))[0x1404efa68]
|
row/row0ins.cc:3660(row_ins(ins_node_t*, que_thr_t*))[0x1404eff9c]
|
row/row0ins.cc:3789(row_ins_step(que_thr_t*))[0x1404f0898]
|
row/row0mysql.cc:1314(row_insert_for_mysql(unsigned char const*, row_prebuilt_t*, ins_mode_t))[0x14051c328]
|
handler/ha_innodb.cc:7835(ha_innobase::write_row(unsigned char const*))[0x1402daa68]
|
sql/handler.cc:7852(handler::ha_write_row(unsigned char const*))[0x13fdb86e0]
|
thiru was able to reproduce this under rr on his system.
The reason for this failure is that the same mini-transaction is freeing, allocating, and freeing the same index page. The page initialization would clear the FIL_PAGE_LSN field on the page.
Before MDEV-27774 in 10.8 refactored some code, there was the function buf_flush_note_modification() that would set the FIL_PAGE_LSN on all blocks that a mini-transaction modified. Starting with 10.8, this logic is within mtr_t::commit() itself, in two copies. The second copy of the logic, in the !m_made_dirty code path, wrongly skips the setting of FIL_PAGE_LSN on blocks that are marked as freed:
if (s >= buf_page_t::UNFIXED) |
{
|
mach_write_to_8(bpage->frame + FIL_PAGE_LSN, m_commit_lsn);
|
if (UNIV_LIKELY_NULL(bpage->zip.data)) |
memcpy_aligned<8>(FIL_PAGE_LSN + bpage->zip.data,
|
FIL_PAGE_LSN + bpage->frame, 8);
|
}
|
The if condition must be removed here, so that the write-ahead logging protocol will be followed.
As far as I can tell, the user impact of this on non-debug builds is that the log checkpoint may be wrongly advanced too early, similar to MDEV-32757. This could cause crash recovery and backup to produce corrupted data.
The bug was introduced in this merge of MDEV-29383 and many other changes.
Attachments
Issue Links
- relates to
-
MDEV-32817 在最近将版本升级到10.11.5后,针对表进行频繁的读写操作不久后,出现index for table xxxx is corrupt,随后此表tablespace xxxxxx corrupted,最后Tablespace is missing for a table,此表已完全不可用
- Closed