Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35577

Broken recovery after SET GLOBAL innodb_log_file_size

    XMLWordPrintable

Details

    Description

      While executing the innodb.log_file_size_online test case, it fails with

      CURRENT_TEST: innodb.log_file_size_online
      mysqltest: At line 59: query 'SELECT * FROM t WHERE a<10' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'

      2024-12-05 16:19:56 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=5, page number=345]
      2024-12-05 16:19:56 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
      2024-12-05 16:19:56 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages. 
      

      Debugged the test case failure, while startup fails with

      Thread 2 hit Breakpoint 1, log_phys_t::page_checksum (block=..., 
          l=0x78eeebcf335f "\267H\251N")
          at /home/thiru/source_file/server/10.11/storage/innobase/log/log0recv.cc:197
      197	    ib::error() << "OPT_PAGE_CHECKSUM mismatch on " << block.page.id();
      (rr) f 2
      p#2  0x000063c80564908d in recv_recover_page (block=0x78eee99011e0, 
          mtr=..., recs=..., space=0x63c807c73c68, init=0x0)
          at /home/thiru/source_file/server/10.11/storage/innobase/log/log0recv.cc:3248
      3248			log_phys_t::apply_status a= l->apply(*block, recs.last_offset);
      (rr) p *l
      $2 = {<log_rec_t> = {next = 0x78eeebcf3368, lsn = 7415422}, 
        start_lsn = 7415393}
      

      Checksum fails to match with page id is the following

      (rr) p mach_read_from_4(l)
      $1 = 3074992462
      

      During recovery, these are the things to be noticed.

      (rr) p checkpoint_lsn
      $5 = 7294363
      (rr) p end_lsn
      $6 = 7415362
      (rr) p first_lsn
      $7 = 7292882
      

      Let's visit before restart:
      =====================
      page_checksum calculated for page_id 21474836825 is 3074992462 during m_commit_lsn: 7323774

      mtr::page_checksum calculated 7323774 for the page_id
      (rr) p bpage.id_.m_id
      $1 = 21474836825
      (rr) p checksum
      $2 = 3074992462
      Thread 2 hit Hardware watchpoint 3: -location m_commit_lsn
      Old value = 0
      New value = 7323774
      mtr_t::finish_writer<false, false> (mtr=0x7673079ff4e0, len=29)
          at /home/thiru/source_file/server/10.11/storage/innobase/mtr/mtr0mtr.cc:1391
      1391	  return {start.first, log_close(mtr->m_commit_lsn)};
       
      (rr) f 0
      #0  log_t::append (
          d=@0x7673079ff0b8: 0x76730c80348f "G\005\200َ\022\006",
          s=0x7673079ff658, size=24)
          at /home/thiru/source_file/server/10.11/storage/innobase/mtr/mtr0m
      tr.cc:1308
      (rr) x/24b d
      0x76730c80348f: 0x47    0x05    0x80    0xd9    0x8e    0x12    0x06  0x00
      0x76730c803497: 0xb2    0x00    0x80    0xc3    0x00    0x06    0x00  0x78
      0x76730c80349f: 0x05    0x80    0xd9    0x00    0xb7    0x48    0xa9  0x4e
      

      Above is the redo log written for checksum for the page

      After that, InnoDB does flushes the log thrice. There are 3 log_t::write_buf() calls

      At the end of the first log_t::write_buf()

      (rr) p write_lsn
      $2 = 7353588
      

      checksum redo log was flushed to disk during first write_buf() call itself.
      Note: Every write_buf() swaps resize_buf with resize_flush_buf

      At the end of the 2nd log_t::write_buf():

      (rr) p write_lsn
      $4 = 7401982
      

      let's have detailed look at the last log_t::write_buf():

      At the start
       
      (rr) p write_lsn
      $16 = 7401982
      (rr) p lsn
      $17 = 7415393
      

      Initially, length is 13455. We copy the last non-aligned
      block to resize_flush_buf. Round off the length to 13824 (multiple of 512 bytes)
      InnoDB writes the old resize_buffer garbage value to the log file for the remaining
      349 bytes. Since InnoDb fails to mark the end of the log in resize_buf. During
      recovery, InnoDB reads the old mtr record and fails.

       
      (rr) x/24b resize_buf + 13455
      0x76730420348f: 0x47    0x05    0x80    0xd9    0x8e    0x12    0x06  0x00
      0x767304203497: 0xb2    0x00    0x80    0xc3    0x00    0x06    0x00  0x78
      0x76730420349f: 0x05    0x80    0xd9    0x00    0xb7    0x48    0xa9  0x4e
      
      

      Above bytes matches with mtr checksum value which was written during 7323774

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              thiru Thirunarayanan Balathandayuthapani
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.