Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22139

fseg_free_page_low() fails to write FREE_PAGE record, breaking recovery

Details

    Description

      Workflow of the RQG test
       
      1. Create DB server and start it
      2. One session starts to run some SQL stream against this server
      3. At some point of time and not related to the state (waiting for result, sending statement, ...) of  the ongoing 2. the server process gets killed (KILL and not TERM).
      4. Make some copy of the data content+logs etc. of the killed server.
      5. Try to restart the killed server.
       
      5. fails with
      ...
      2020-04-03  8:58:49 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=117659470
      2020-04-03  8:58:49 0 [Note] InnoDB: Transaction 2264 was in the XA prepared state.
      2020-04-03  8:58:49 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
      2020-04-03  8:58:49 0 [Note] InnoDB: Trx id counter is 2265
      2020-04-03  8:58:49 0 [Note] InnoDB: Starting final batch to recover 133 pages from redo log.
      2020-04-03  8:58:50 0 [ERROR] InnoDB: Not applying INSERT_HEAP_DYNAMIC due to corruption on [page id: space=5, page number=68]
      2020-04-03  8:58:50 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
      2020-04-03  8:58:50 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1574] with error Data structure corruption
      ...
      2020-04-03  8:58:50 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
      2020-04-03  8:58:50 0 [Note] Plugin 'FEEDBACK' is disabled.
       
      RQG
      git clone https://github.com/mleich1/rqg --branch experimental RQG_mleich1
      origin/experimental d417497b476e263428422b4640cf63b8e9d16afe 2020-03-30T17:17:54+02:00
       
      MariaDB
      origin/10.5 0870b75af7d01f2e8132278aeb0873dd6c65d1be 2020-04-03T08:50:46+03:00
      build with debug
       
      cd <top directory of tree with RQG>
      perl rqg.pl \
      --duration=120 \
      --queries=10000000 \
      --threads=1 \
      --no_mask \
      --seed=random \
      --engine=InnoDB \
      --gendata=conf/engines/many_indexes.zz \
      --rpl_mode=none \
      --mysqld=--innodb_page_size=32K \
      --mysqld=--net_read_timeout=30 \
      --mysqld=--lock-wait-timeout=86400 \
      --mysqld=--loose-table_lock_wait_timeout=50 \
      --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      --mysqld=--innodb-lock-wait-timeout=50 \
      --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      --mysqld=--net_write_timeout=60 \
      --mysqld=--wait_timeout=28800 \
      --mysqld=--log_bin_trust_function_creators=1 \
      --mysqld=--connect_timeout=60 \
      --mysqld=--log-bin \
      --mysqld=--loose-idle_write_transaction_timeout=0 \
      --mysqld=--loose-idle_transaction_timeout=0 \
      --mysqld=--interactive_timeout=28800 \
      --mysqld=--loose_innodb_use_native_aio=1 \
      --mysqld=--slave_net_timeout=60 \
      --mysqld=--innodb_stats_persistent=off \
      --mysqld=--log-output=none \
      --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
      --reporters=CrashRecovery1,Deadlock1,ErrorLog,None,ServerDead \
      --validators=None \
      --grammar=rqg.yy \
      --workdir=<local setting> \
      --vardir=<local setting> \
      --mtr-build-thread=<local setting> \
      --basedir1=<path to binaries>
       
      The uploaded RQG protocol "rqg.log" shows exact that RQG run.
      

      Attachments

        1. rqg.log
          82 kB
          Matthias Leich
        2. rqg.yy
          0.5 kB
          Matthias Leich

        Issue Links

          Activity

            In the case that I analyzed, we have an INSERT_REUSE_REDUNDANT redo log record whose preceding record points to unused space (after the PAGE_HEAP_TOP) in a page.

            In a debugging session, I observed the following:

            1. The latest checkpoint LSN was 114599300.
            2. The problematic page 5:86 carries FIL_PAGE_LSN=0x6be1508=113120520.
            3. The problematic INSERT_REUSE_REDUNDANT record is the very first record for the page since the checkpoint, at LSN 114599768.
            4. In the rr replay of the server that was killed, I found the time when the FIL_PAGE_LSN corresponding to the page image in the file was written. The page was assigned to $1.
            5. With dump binary memory I extracted and compared the page images, and they matched (except for the last 8 bytes, which is normal).
            6. With a watchpoint on PAGE_HEAP_TOP in the page frame, I found some INSERT operations that would grow the page. Log was written for them. That was before the checkpoint LSN, so recovery would not see them.
            7. break log_write_checkpoint_info caught the write of the checkpoint header, with end_lsn=117659470.
            8. But, there was no call to write out the page from the buffer pool.
            9. I restarted after watch -l $1.page.in_flush_list. That was hit in buf_flush_freed_page().
            10. I restarted after setting a watchpoint on $1.page.status. (This had to be set on a numeric (unsigned*) address, because the used gdb version had truoble with enum.)
            11. At LSN 115022079 (note: this is between the LSN of the recovered log record 114599768 and the checkpoint end_lsn=117659470) we would set $1.page.status=buf_page_t::FREED without first writing a FREE_PAGE record in fseg_free_page_low().
            12. If the FREE_PAGE record would have been written, recovery would have seen that (because the log had been completely written until at least the checkpoint end_lsn=117659470) and would have avoided reading the freed page altogether.

            The following patch fixes the problem:

            diff --git a/storage/innobase/fsp/fsp0fsp.cc b/storage/innobase/fsp/fsp0fsp.cc
            index 86b199c5267..7d970c12814 100644
            --- a/storage/innobase/fsp/fsp0fsp.cc
            +++ b/storage/innobase/fsp/fsp0fsp.cc
            @@ -2605,6 +2605,8 @@ fseg_free_page_low(
             			    xdes, xoffset, mtr);
             		fsp_free_extent(space, offset, mtr);
             	}
            +
            +	mtr->free(page_id_t(space->id, offset));
             }
             
             #ifndef BTR_CUR_HASH_ADAPT
            

            marko Marko Mäkelä added a comment - In the case that I analyzed, we have an INSERT_REUSE_REDUNDANT redo log record whose preceding record points to unused space (after the PAGE_HEAP_TOP ) in a page. In a debugging session, I observed the following: The latest checkpoint LSN was 114599300. The problematic page 5:86 carries FIL_PAGE_LSN =0x6be1508=113120520. The problematic INSERT_REUSE_REDUNDANT record is the very first record for the page since the checkpoint, at LSN 114599768. In the rr replay of the server that was killed, I found the time when the FIL_PAGE_LSN corresponding to the page image in the file was written. The page was assigned to $1 . With dump binary memory I extracted and compared the page images, and they matched (except for the last 8 bytes, which is normal). With a watchpoint on PAGE_HEAP_TOP in the page frame, I found some INSERT operations that would grow the page. Log was written for them. That was before the checkpoint LSN, so recovery would not see them. break log_write_checkpoint_info caught the write of the checkpoint header, with end_lsn =117659470. But, there was no call to write out the page from the buffer pool. I restarted after watch -l $1.page.in_flush_list . That was hit in buf_flush_freed_page() . I restarted after setting a watchpoint on $1.page.status . (This had to be set on a numeric (unsigned*) address, because the used gdb version had truoble with enum .) At LSN 115022079 (note: this is between the LSN of the recovered log record 114599768 and the checkpoint end_lsn =117659470) we would set $1.page.status=buf_page_t::FREED without first writing a FREE_PAGE record in fseg_free_page_low() . If the FREE_PAGE record would have been written, recovery would have seen that (because the log had been completely written until at least the checkpoint end_lsn =117659470) and would have avoided reading the freed page altogether. The following patch fixes the problem: diff --git a/storage/innobase/fsp/fsp0fsp.cc b/storage/innobase/fsp/fsp0fsp.cc index 86b199c5267..7d970c12814 100644 --- a/storage/innobase/fsp/fsp0fsp.cc +++ b/storage/innobase/fsp/fsp0fsp.cc @@ -2605,6 +2605,8 @@ fseg_free_page_low( xdes, xoffset, mtr); fsp_free_extent(space, offset, mtr); } + + mtr->free(page_id_t(space->id, offset)); } #ifndef BTR_CUR_HASH_ADAPT

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.