Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32163

Crash recovery fails after DROP TABLE in system tablespace

Details

    Description

      Fix for MDEV-32028 cherry-picked into preview-11.3-preview breaks mariabackup.sys_truncate test sporadically.

      There are 5 InnoDB commits in 11.3 preview, after the first one the test already starts to fail.

      Attachments

        Issue Links

          Activity

            Using the mtr test case that thiru provided, I was able to produce an rr replay trace of this. I got an OPT_PAGE_CHECKSUM failure (MDEV-18976); these records are something that is being written in debug builds. The failure occurred for the following OPT_PAGE_CHECKSUM record:

            10.6 0f9acce3f2fd8d8f6334e10b08f9b5a031d40fdf

            (rr) p *this
            $2 = {<log_rec_t> = {next = 0x7f1871570dc8, lsn = 8448794}, start_lsn = 8448769}
            

            In that page, we had FIL_PAGE_LSN of 6281382, which is too old. The recovery had started like this:

            10.6 0f9acce3f2fd8d8f6334e10b08f9b5a031d40fdf

            2023-09-14 11:08:03 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=8415203,9398815
            2023-09-14 11:08:03 0 [Note] InnoDB: To recover: 260 pages
            2023-09-14 11:08:04 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=0, page number=494]
            

            I found the corresponding mini-transaction in the rr replay trace of the server that was intentionally killed by the test. The page was being marked as freed by the following:

            10.6 0f9acce3f2fd8d8f6334e10b08f9b5a031d40fdf

            #3  0x00005598517b44a2 in buf_page_free (space=space@entry=0x5598545749f8, page=page@entry=494, mtr=mtr@entry=0x7e813d744a70) at /mariadb/10.6m/storage/innobase/buf/buf0buf.cc:2157
            #4  0x00005598517eb685 in fseg_free_extent (seg_inode=seg_inode@entry=0x19806e8cf872 "", iblock=0x19806e4535f0, space=space@entry=0x5598545749f8, page=448, mtr=mtr@entry=0x7e813d744a70, ahi=ahi@entry=false)
                at /mariadb/10.6m/storage/innobase/fsp/fsp0fsp.cc:2767
            #5  0x00005598517ed72c in fseg_free_step (header=0x19806e90c04a "", mtr=mtr@entry=0x7e813d744a70, ahi=ahi@entry=false) at /mariadb/10.6m/storage/innobase/fsp/fsp0fsp.cc:2831
            #6  0x00005598517f450c in btr_free_but_not_root (block=block@entry=0x19806e4542f0, log_mode=MTR_LOG_ALL, ahi=ahi@entry=false) at /mariadb/10.6m/storage/innobase/btr/btr0btr.cc:1142
            #7  0x00005598517f6fb2 in btr_free_if_exists (space=space@entry=0x5598545749f8, page=page@entry=313, index_id=<optimized out>, mtr=mtr@entry=0x7e813d745150)
                at /mariadb/10.6m/storage/innobase/btr/btr0btr.cc:1229
            #8  0x0000559851870d44 in dict_drop_index_tree (pcur=pcur@entry=0x5598545d15e0, trx=trx@entry=0x0, mtr=mtr@entry=0x7e813d745150) at /mariadb/10.6m/storage/innobase/dict/dict0crea.cc:905
            #9  0x000055985198a455 in row_purge_remove_clust_if_poss_low (node=node@entry=0x5598545d1540, mode=mode@entry=BTR_MODIFY_LEAF) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:166
            #10 0x000055985198aea9 in row_purge_remove_clust_if_poss (node=node@entry=0x5598545d1540) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:259
            #11 0x000055985198b043 in row_purge_del_mark (node=node@entry=0x5598545d1540) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:635
            #12 0x000055985198b799 in row_purge_record_func (node=node@entry=0x5598545d1540, undo_rec=undo_rec@entry=0x5c8478084070 "", thr=thr@entry=0x5598545d14a0, updated_extern=false)
                at /mariadb/10.6m/storage/innobase/row/row0purge.cc:1215
            #13 0x000055985198b90e in row_purge (node=node@entry=0x5598545d1540, undo_rec=undo_rec@entry=0x5c8478084070 "", thr=thr@entry=0x5598545d14a0) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:1276
            

            This is the purge of delete-marking a SYS_INDEXES record corresponding to DROP TABLE t1 in the system tablespace.

            This mini-transaction (at LSN=9388592) failed to write any FREE_PAGE record. At mtr_t::commit(), m_memo would comprise 69 latches, and 80 bytes of log would have been written: WRITE, MEMSET and MEMMOVE records and two OPT_PAGE_CHECKSUM for other pages than this freed page.

            The oldest_modification of the page was 8407829 (which is before the recovery checkpoint LSN) at the time when it was cleared by the page cleaner.

            I think that the problem is that there was no FREE_PAGE record written for this page 494 (0x1ee). We do have it for some neighbours, in mlog_init at the time the recovery catches the error:

            [{m_id = 446}] = {lsn = 9388941, created = false},
            [{m_id = 447}] = {lsn = 9388900, created = false},
            [{m_id = 512}] = {lsn = 9390540, created = false},
            [{m_id = 513}] = {lsn = 9392084, created = false},

            The test case that I used is as follows:

            --source include/have_innodb.inc
            --source include/have_sequence.inc
            SET GLOBAL INNODB_FILE_PER_TABLE= 0;
            CREATE TABLE t1(f1 INT NOT NULL, f2 INT NOT NULL,
                            f3 INT NOT NULL, INDEX(f1),
                            INDEX(f2), INDEX(f3))ENGINE=InnoDB;
            INSERT INTO t1 SELECT seq, seq, seq FROM seq_1_to_16384;
            INSERT INTO t1 SELECT seq, seq, seq FROM seq_1_to_16384;
            INSERT INTO t1 SELECT seq, seq, seq FROM seq_1_to_16384;
            DROP TABLE t1;
            SET GLOBAL innodb_max_purge_lag_wait=0;
            let $shutdown_timeout=0;
            --source include/restart_mysqld.inc
            

            This would seem to fail reliably when not using rr. I produced my rr trace as follows:

            while ./mtr --rr=-h --parallel=auto name_of_test{,,,,,,,,,,,,,,,,}; do :; done
            

            marko Marko Mäkelä added a comment - Using the mtr test case that thiru provided, I was able to produce an rr replay trace of this. I got an OPT_PAGE_CHECKSUM failure ( MDEV-18976 ); these records are something that is being written in debug builds. The failure occurred for the following OPT_PAGE_CHECKSUM record: 10.6 0f9acce3f2fd8d8f6334e10b08f9b5a031d40fdf (rr) p *this $2 = {<log_rec_t> = {next = 0x7f1871570dc8, lsn = 8448794}, start_lsn = 8448769} In that page, we had FIL_PAGE_LSN of 6281382, which is too old. The recovery had started like this: 10.6 0f9acce3f2fd8d8f6334e10b08f9b5a031d40fdf 2023-09-14 11:08:03 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=8415203,9398815 2023-09-14 11:08:03 0 [Note] InnoDB: To recover: 260 pages 2023-09-14 11:08:04 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=0, page number=494] I found the corresponding mini-transaction in the rr replay trace of the server that was intentionally killed by the test. The page was being marked as freed by the following: 10.6 0f9acce3f2fd8d8f6334e10b08f9b5a031d40fdf #3 0x00005598517b44a2 in buf_page_free (space=space@entry=0x5598545749f8, page=page@entry=494, mtr=mtr@entry=0x7e813d744a70) at /mariadb/10.6m/storage/innobase/buf/buf0buf.cc:2157 #4 0x00005598517eb685 in fseg_free_extent (seg_inode=seg_inode@entry=0x19806e8cf872 "", iblock=0x19806e4535f0, space=space@entry=0x5598545749f8, page=448, mtr=mtr@entry=0x7e813d744a70, ahi=ahi@entry=false) at /mariadb/10.6m/storage/innobase/fsp/fsp0fsp.cc:2767 #5 0x00005598517ed72c in fseg_free_step (header=0x19806e90c04a "", mtr=mtr@entry=0x7e813d744a70, ahi=ahi@entry=false) at /mariadb/10.6m/storage/innobase/fsp/fsp0fsp.cc:2831 #6 0x00005598517f450c in btr_free_but_not_root (block=block@entry=0x19806e4542f0, log_mode=MTR_LOG_ALL, ahi=ahi@entry=false) at /mariadb/10.6m/storage/innobase/btr/btr0btr.cc:1142 #7 0x00005598517f6fb2 in btr_free_if_exists (space=space@entry=0x5598545749f8, page=page@entry=313, index_id=<optimized out>, mtr=mtr@entry=0x7e813d745150) at /mariadb/10.6m/storage/innobase/btr/btr0btr.cc:1229 #8 0x0000559851870d44 in dict_drop_index_tree (pcur=pcur@entry=0x5598545d15e0, trx=trx@entry=0x0, mtr=mtr@entry=0x7e813d745150) at /mariadb/10.6m/storage/innobase/dict/dict0crea.cc:905 #9 0x000055985198a455 in row_purge_remove_clust_if_poss_low (node=node@entry=0x5598545d1540, mode=mode@entry=BTR_MODIFY_LEAF) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:166 #10 0x000055985198aea9 in row_purge_remove_clust_if_poss (node=node@entry=0x5598545d1540) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:259 #11 0x000055985198b043 in row_purge_del_mark (node=node@entry=0x5598545d1540) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:635 #12 0x000055985198b799 in row_purge_record_func (node=node@entry=0x5598545d1540, undo_rec=undo_rec@entry=0x5c8478084070 "", thr=thr@entry=0x5598545d14a0, updated_extern=false) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:1215 #13 0x000055985198b90e in row_purge (node=node@entry=0x5598545d1540, undo_rec=undo_rec@entry=0x5c8478084070 "", thr=thr@entry=0x5598545d14a0) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:1276 This is the purge of delete-marking a SYS_INDEXES record corresponding to DROP TABLE t1 in the system tablespace. This mini-transaction (at LSN=9388592) failed to write any FREE_PAGE record. At mtr_t::commit() , m_memo would comprise 69 latches, and 80 bytes of log would have been written: WRITE , MEMSET and MEMMOVE records and two OPT_PAGE_CHECKSUM for other pages than this freed page. The oldest_modification of the page was 8407829 (which is before the recovery checkpoint LSN) at the time when it was cleared by the page cleaner. I think that the problem is that there was no FREE_PAGE record written for this page 494 (0x1ee). We do have it for some neighbours, in mlog_init at the time the recovery catches the error: … [{m_id = 446}] = {lsn = 9388941, created = false}, [{m_id = 447}] = {lsn = 9388900, created = false}, [{m_id = 512}] = {lsn = 9390540, created = false}, [{m_id = 513}] = {lsn = 9392084, created = false}, … The test case that I used is as follows: --source include/have_innodb.inc --source include/have_sequence.inc SET GLOBAL INNODB_FILE_PER_TABLE= 0; CREATE TABLE t1(f1 INT NOT NULL , f2 INT NOT NULL , f3 INT NOT NULL , INDEX (f1), INDEX (f2), INDEX (f3))ENGINE=InnoDB; INSERT INTO t1 SELECT seq, seq, seq FROM seq_1_to_16384; INSERT INTO t1 SELECT seq, seq, seq FROM seq_1_to_16384; INSERT INTO t1 SELECT seq, seq, seq FROM seq_1_to_16384; DROP TABLE t1; SET GLOBAL innodb_max_purge_lag_wait=0; let $shutdown_timeout=0; --source include/restart_mysqld.inc This would seem to fail reliably when not using rr . I produced my rr trace as follows: while . /mtr --rr=-h --parallel=auto name_of_test{,,,,,,,,,,,,,,,,}; do :; done

            I was able to reproduce this on 10.5 as well:

            10.5 cb1965bd9d840665b2ea52216985d88c39a8a9f9

            mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:273: log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed.
            mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:273: log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed.
            

            This was with the test case but not the code change of of my 10.5 fix.

            marko Marko Mäkelä added a comment - I was able to reproduce this on 10.5 as well: 10.5 cb1965bd9d840665b2ea52216985d88c39a8a9f9 mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:273: log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed. mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:273: log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed. This was with the test case but not the code change of of my 10.5 fix .

            People

              marko Marko Mäkelä
              serg Sergei Golubchik
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.