Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32811

Potentially broken crash recovery if a mini-transaction frees a page, not modifying previously clean pages

    XMLWordPrintable

Details

    Description

      The MDEV-14795 test innodb.sys_truncate_debug in 11.2 would occasionally fail like this in the setup phase that is executing INSERT and using a debug setting to limit the number of records per page:

      bb-11.2-release 0427c4739e7de5cd52d5e83479acf11ddb033ce6

      innodb.sys_truncate_debug 'innodb'       w1 [ fail ]
              Test ended at 2023-11-14 20:04:26
       
      CURRENT_TEST: innodb.sys_truncate_debug
      mysqltest: At line 20: query 'INSERT INTO t1 SELECT seq, seq, seq FROM seq_1_to_16384' failed: <Unknown> (2013): Lost connection to server during query
      buf/buf0flu.cc:766(buf_page_t::flush(bool, fil_space_t*))[0x1406b5d80]
      buf/buf0flu.cc:1103(buf_flush_try_neighbors(fil_space_t*, page_id_t, buf_page_t*, bool, bool, unsigned long, unsigned long))[0x1406b7cd4]
      buf/buf0flu.cc:1350(buf_flush_LRU_list_batch(unsigned long, bool, flush_counters_t*))[0x1406b9120]
      buf/buf0flu.cc:1390(buf_do_LRU_batch(unsigned long, bool, flush_counters_t*))[0x1406b93d8]
      buf/buf0flu.cc:1734(buf_flush_LRU(unsigned long, bool))[0x1406bae34]
      buf/buf0lru.cc:506(buf_LRU_get_free_block(buf_LRU_get))[0x1406c657c]
      fsp/fsp0fsp.cc:1063(fsp_page_create(fil_space_t*, unsigned int, mtr_t*))[0x140769ae4]
      fsp/fsp0fsp.cc:2230(fseg_alloc_free_page_low(fil_space_t*, unsigned char*, buf_block_t*, unsigned int, unsigned char, bool, mtr_t*, mtr_t*, dberr_t*))[0x14076efe8]
      fsp/fsp0fsp.cc:2284(fseg_alloc_free_page_general(unsigned char*, unsigned int, unsigned char, bool, mtr_t*, mtr_t*, dberr_t*))[0x14076f1a8]
      btr/btr0btr.cc:567(btr_page_alloc(dict_index_t*, unsigned int, unsigned char, unsigned long, mtr_t*, mtr_t*, dberr_t*))[0x140624dd8]
      btr/btr0btr.cc:2833(btr_page_split_and_insert(unsigned long, btr_cur_t*, unsigned short**, mem_block_info_t**, dtuple_t const*, unsigned long, mtr_t*, dberr_t*))[0x1406310b0]
      btr/btr0cur.cc:2534(btr_cur_pessimistic_insert(unsigned long, btr_cur_t*, unsigned short**, mem_block_info_t**, dtuple_t*, unsigned char**, big_rec_t**, unsigned long, que_thr_t*, mtr_t*))[0x140660a5c]
      row/row0ins.cc:3147(row_ins_sec_index_entry_low(unsigned long, btr_latch_mode, dict_index_t*, mem_block_info_t*, mem_block_info_t*, dtuple_t*, unsigned long, que_thr_t*))[0x1404ee318]
      row/row0ins.cc:3332(row_ins_sec_index_entry(dict_index_t*, dtuple_t*, que_thr_t*, bool))[0x1404eebdc]
      row/row0ins.cc:3377(row_ins_index_entry(dict_index_t*, dtuple_t*, que_thr_t*))[0x1404eef04]
      row/row0ins.cc:3543(row_ins_index_entry_step(ins_node_t*, que_thr_t*))[0x1404efa68]
      row/row0ins.cc:3660(row_ins(ins_node_t*, que_thr_t*))[0x1404eff9c]
      row/row0ins.cc:3789(row_ins_step(que_thr_t*))[0x1404f0898]
      row/row0mysql.cc:1314(row_insert_for_mysql(unsigned char const*, row_prebuilt_t*, ins_mode_t))[0x14051c328]
      handler/ha_innodb.cc:7835(ha_innobase::write_row(unsigned char const*))[0x1402daa68]
      sql/handler.cc:7852(handler::ha_write_row(unsigned char const*))[0x13fdb86e0]
      

      thiru was able to reproduce this under rr on his system.

      The reason for this failure is that the same mini-transaction is freeing, allocating, and freeing the same index page. The page initialization would clear the FIL_PAGE_LSN field on the page.

      Before MDEV-27774 in 10.8 refactored some code, there was the function buf_flush_note_modification() that would set the FIL_PAGE_LSN on all blocks that a mini-transaction modified. Starting with 10.8, this logic is within mtr_t::commit() itself, in two copies. The second copy of the logic, in the !m_made_dirty code path, wrongly skips the setting of FIL_PAGE_LSN on blocks that are marked as freed:

                  if (s >= buf_page_t::UNFIXED)
                  {
                    mach_write_to_8(bpage->frame + FIL_PAGE_LSN, m_commit_lsn);
                    if (UNIV_LIKELY_NULL(bpage->zip.data))
                      memcpy_aligned<8>(FIL_PAGE_LSN + bpage->zip.data,
                                        FIL_PAGE_LSN + bpage->frame, 8);
                  }
      

      The if condition must be removed here, so that the write-ahead logging protocol will be followed.

      As far as I can tell, the user impact of this on non-debug builds is that the log checkpoint may be wrongly advanced too early, similar to MDEV-32757. This could cause crash recovery and backup to produce corrupted data.

      The bug was introduced in this merge of MDEV-29383 and many other changes.

      Attachments

        Issue Links

          Activity

            People

              thiru Thirunarayanan Balathandayuthapani
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.