Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35577

Broken recovery after SET GLOBAL innodb_log_file_size

    XMLWordPrintable

Details

    Description

      While executing the innodb.log_file_size_online test case, it fails with

      CURRENT_TEST: innodb.log_file_size_online
      mysqltest: At line 59: query 'SELECT * FROM t WHERE a<10' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'

      2024-12-05 16:19:56 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=5, page number=345]
      2024-12-05 16:19:56 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
      2024-12-05 16:19:56 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages. 
      

      Debugged the test case failure, while startup fails with

      Thread 2 hit Breakpoint 1, log_phys_t::page_checksum (block=..., 
          l=0x78eeebcf335f "\267H\251N")
          at /home/thiru/source_file/server/10.11/storage/innobase/log/log0recv.cc:197
      197	    ib::error() << "OPT_PAGE_CHECKSUM mismatch on " << block.page.id();
      (rr) f 2
      p#2  0x000063c80564908d in recv_recover_page (block=0x78eee99011e0, 
          mtr=..., recs=..., space=0x63c807c73c68, init=0x0)
          at /home/thiru/source_file/server/10.11/storage/innobase/log/log0recv.cc:3248
      3248			log_phys_t::apply_status a= l->apply(*block, recs.last_offset);
      (rr) p *l
      $2 = {<log_rec_t> = {next = 0x78eeebcf3368, lsn = 7415422}, 
        start_lsn = 7415393}
      

      Checksum fails to match with page id is the following

      (rr) p mach_read_from_4(l)
      $1 = 3074992462
      

      During recovery, these are the things to be noticed.

      (rr) p checkpoint_lsn
      $5 = 7294363
      (rr) p end_lsn
      $6 = 7415362
      (rr) p first_lsn
      $7 = 7292882
      

      Let's visit before restart:
      =====================
      page_checksum calculated for page_id 21474836825 is 3074992462 during m_commit_lsn: 7323774

      mtr::page_checksum calculated 7323774 for the page_id
      (rr) p bpage.id_.m_id
      $1 = 21474836825
      (rr) p checksum
      $2 = 3074992462
      Thread 2 hit Hardware watchpoint 3: -location m_commit_lsn
      Old value = 0
      New value = 7323774
      mtr_t::finish_writer<false, false> (mtr=0x7673079ff4e0, len=29)
          at /home/thiru/source_file/server/10.11/storage/innobase/mtr/mtr0mtr.cc:1391
      1391	  return {start.first, log_close(mtr->m_commit_lsn)};
       
      (rr) f 0
      #0  log_t::append (
          d=@0x7673079ff0b8: 0x76730c80348f "G\005\200َ\022\006",
          s=0x7673079ff658, size=24)
          at /home/thiru/source_file/server/10.11/storage/innobase/mtr/mtr0m
      tr.cc:1308
      (rr) x/24b d
      0x76730c80348f: 0x47    0x05    0x80    0xd9    0x8e    0x12    0x06  0x00
      0x76730c803497: 0xb2    0x00    0x80    0xc3    0x00    0x06    0x00  0x78
      0x76730c80349f: 0x05    0x80    0xd9    0x00    0xb7    0x48    0xa9  0x4e
      

      Above is the redo log written for checksum for the page

      After that, InnoDB does flushes the log thrice. There are 3 log_t::write_buf() calls

      At the end of the first log_t::write_buf()

      (rr) p write_lsn
      $2 = 7353588
      

      checksum redo log was flushed to disk during first write_buf() call itself.
      Note: Every write_buf() swaps resize_buf with resize_flush_buf

      At the end of the 2nd log_t::write_buf():

      (rr) p write_lsn
      $4 = 7401982
      

      let's have detailed look at the last log_t::write_buf():

      At the start
       
      (rr) p write_lsn
      $16 = 7401982
      (rr) p lsn
      $17 = 7415393
      

      Initially, length is 13455. We copy the last non-aligned
      block to resize_flush_buf. Round off the length to 13824 (multiple of 512 bytes)
      InnoDB writes the old resize_buffer garbage value to the log file for the remaining
      349 bytes. Since InnoDb fails to mark the end of the log in resize_buf. During
      recovery, InnoDB reads the old mtr record and fails.

       
      (rr) x/24b resize_buf + 13455
      0x76730420348f: 0x47    0x05    0x80    0xd9    0x8e    0x12    0x06  0x00
      0x767304203497: 0xb2    0x00    0x80    0xc3    0x00    0x06    0x00  0x78
      0x76730420349f: 0x05    0x80    0xd9    0x00    0xb7    0x48    0xa9  0x4e
      
      

      Above bytes matches with mtr checksum value which was written during 7323774

      Attachments

        1. data.tar.gz
          1.06 MB
          Thirunarayanan Balathandayuthapani
        2. ib_logfile0.tar.gz
          64 kB
          Thirunarayanan Balathandayuthapani

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              thiru Thirunarayanan Balathandayuthapani
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.