Using the mtr test case that thiru provided, I was able to produce an rr replay trace of this. I got an OPT_PAGE_CHECKSUM failure (MDEV-18976); these records are something that is being written in debug builds. The failure occurred for the following OPT_PAGE_CHECKSUM record:
I found the corresponding mini-transaction in the rr replay trace of the server that was intentionally killed by the test. The page was being marked as freed by the following:
10.6 0f9acce3f2fd8d8f6334e10b08f9b5a031d40fdf
#3 0x00005598517b44a2 in buf_page_free (space=space@entry=0x5598545749f8, page=page@entry=494, mtr=mtr@entry=0x7e813d744a70) at /mariadb/10.6m/storage/innobase/buf/buf0buf.cc:2157
at /mariadb/10.6m/storage/innobase/fsp/fsp0fsp.cc:2767
#5 0x00005598517ed72c in fseg_free_step (header=0x19806e90c04a "", mtr=mtr@entry=0x7e813d744a70, ahi=ahi@entry=false) at /mariadb/10.6m/storage/innobase/fsp/fsp0fsp.cc:2831
#6 0x00005598517f450c in btr_free_but_not_root (block=block@entry=0x19806e4542f0, log_mode=MTR_LOG_ALL, ahi=ahi@entry=false) at /mariadb/10.6m/storage/innobase/btr/btr0btr.cc:1142
#7 0x00005598517f6fb2 in btr_free_if_exists (space=space@entry=0x5598545749f8, page=page@entry=313, index_id=<optimized out>, mtr=mtr@entry=0x7e813d745150)
at /mariadb/10.6m/storage/innobase/btr/btr0btr.cc:1229
#8 0x0000559851870d44 in dict_drop_index_tree (pcur=pcur@entry=0x5598545d15e0, trx=trx@entry=0x0, mtr=mtr@entry=0x7e813d745150) at /mariadb/10.6m/storage/innobase/dict/dict0crea.cc:905
#9 0x000055985198a455 in row_purge_remove_clust_if_poss_low (node=node@entry=0x5598545d1540, mode=mode@entry=BTR_MODIFY_LEAF) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:166
#10 0x000055985198aea9 in row_purge_remove_clust_if_poss (node=node@entry=0x5598545d1540) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:259
#11 0x000055985198b043 in row_purge_del_mark (node=node@entry=0x5598545d1540) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:635
#12 0x000055985198b799 in row_purge_record_func (node=node@entry=0x5598545d1540, undo_rec=undo_rec@entry=0x5c8478084070 "", thr=thr@entry=0x5598545d14a0, updated_extern=false)
at /mariadb/10.6m/storage/innobase/row/row0purge.cc:1215
#13 0x000055985198b90e in row_purge (node=node@entry=0x5598545d1540, undo_rec=undo_rec@entry=0x5c8478084070 "", thr=thr@entry=0x5598545d14a0) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:1276
This is the purge of delete-marking a SYS_INDEXES record corresponding to DROP TABLE t1 in the system tablespace.
This mini-transaction (at LSN=9388592) failed to write any FREE_PAGE record. At mtr_t::commit(), m_memo would comprise 69 latches, and 80 bytes of log would have been written: WRITE, MEMSET and MEMMOVE records and two OPT_PAGE_CHECKSUM for other pages than this freed page.
The oldest_modification of the page was 8407829 (which is before the recovery checkpoint LSN) at the time when it was cleared by the page cleaner.
I think that the problem is that there was no FREE_PAGE record written for this page 494 (0x1ee). We do have it for some neighbours, in mlog_init at the time the recovery catches the error:
…
[{m_id = 446}] = {lsn = 9388941, created = false},
[{m_id = 447}] = {lsn = 9388900, created = false},
[{m_id = 512}] = {lsn = 9390540, created = false},
[{m_id = 513}] = {lsn = 9392084, created = false},
…
The test case that I used is as follows:
--source include/have_innodb.inc
--source include/have_sequence.inc
SETGLOBAL INNODB_FILE_PER_TABLE= 0;
CREATETABLE t1(f1 INTNOTNULL, f2 INTNOTNULL,
f3 INTNOTNULL, INDEX(f1),
INDEX(f2), INDEX(f3))ENGINE=InnoDB;
INSERTINTO t1 SELECT seq, seq, seq FROM seq_1_to_16384;
INSERTINTO t1 SELECT seq, seq, seq FROM seq_1_to_16384;
INSERTINTO t1 SELECT seq, seq, seq FROM seq_1_to_16384;
DROPTABLE t1;
SETGLOBAL innodb_max_purge_lag_wait=0;
let $shutdown_timeout=0;
--source include/restart_mysqld.inc
This would seem to fail reliably when not using rr. I produced my rr trace as follows:
while ./mtr --rr=-h --parallel=auto name_of_test{,,,,,,,,,,,,,,,,}; do :; done
Marko Mäkelä
added a comment - Using the mtr test case that thiru provided, I was able to produce an rr replay trace of this. I got an OPT_PAGE_CHECKSUM failure ( MDEV-18976 ); these records are something that is being written in debug builds. The failure occurred for the following OPT_PAGE_CHECKSUM record:
10.6 0f9acce3f2fd8d8f6334e10b08f9b5a031d40fdf
(rr) p *this
$2 = {<log_rec_t> = {next = 0x7f1871570dc8, lsn = 8448794}, start_lsn = 8448769}
In that page, we had FIL_PAGE_LSN of 6281382, which is too old. The recovery had started like this:
10.6 0f9acce3f2fd8d8f6334e10b08f9b5a031d40fdf
2023-09-14 11:08:03 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=8415203,9398815
2023-09-14 11:08:03 0 [Note] InnoDB: To recover: 260 pages
2023-09-14 11:08:04 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=0, page number=494]
I found the corresponding mini-transaction in the rr replay trace of the server that was intentionally killed by the test. The page was being marked as freed by the following:
10.6 0f9acce3f2fd8d8f6334e10b08f9b5a031d40fdf
#3 0x00005598517b44a2 in buf_page_free (space=space@entry=0x5598545749f8, page=page@entry=494, mtr=mtr@entry=0x7e813d744a70) at /mariadb/10.6m/storage/innobase/buf/buf0buf.cc:2157
#4 0x00005598517eb685 in fseg_free_extent (seg_inode=seg_inode@entry=0x19806e8cf872 "", iblock=0x19806e4535f0, space=space@entry=0x5598545749f8, page=448, mtr=mtr@entry=0x7e813d744a70, ahi=ahi@entry=false)
at /mariadb/10.6m/storage/innobase/fsp/fsp0fsp.cc:2767
#5 0x00005598517ed72c in fseg_free_step (header=0x19806e90c04a "", mtr=mtr@entry=0x7e813d744a70, ahi=ahi@entry=false) at /mariadb/10.6m/storage/innobase/fsp/fsp0fsp.cc:2831
#6 0x00005598517f450c in btr_free_but_not_root (block=block@entry=0x19806e4542f0, log_mode=MTR_LOG_ALL, ahi=ahi@entry=false) at /mariadb/10.6m/storage/innobase/btr/btr0btr.cc:1142
#7 0x00005598517f6fb2 in btr_free_if_exists (space=space@entry=0x5598545749f8, page=page@entry=313, index_id=<optimized out>, mtr=mtr@entry=0x7e813d745150)
at /mariadb/10.6m/storage/innobase/btr/btr0btr.cc:1229
#8 0x0000559851870d44 in dict_drop_index_tree (pcur=pcur@entry=0x5598545d15e0, trx=trx@entry=0x0, mtr=mtr@entry=0x7e813d745150) at /mariadb/10.6m/storage/innobase/dict/dict0crea.cc:905
#9 0x000055985198a455 in row_purge_remove_clust_if_poss_low (node=node@entry=0x5598545d1540, mode=mode@entry=BTR_MODIFY_LEAF) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:166
#10 0x000055985198aea9 in row_purge_remove_clust_if_poss (node=node@entry=0x5598545d1540) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:259
#11 0x000055985198b043 in row_purge_del_mark (node=node@entry=0x5598545d1540) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:635
#12 0x000055985198b799 in row_purge_record_func (node=node@entry=0x5598545d1540, undo_rec=undo_rec@entry=0x5c8478084070 "", thr=thr@entry=0x5598545d14a0, updated_extern=false)
at /mariadb/10.6m/storage/innobase/row/row0purge.cc:1215
#13 0x000055985198b90e in row_purge (node=node@entry=0x5598545d1540, undo_rec=undo_rec@entry=0x5c8478084070 "", thr=thr@entry=0x5598545d14a0) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:1276
This is the purge of delete-marking a SYS_INDEXES record corresponding to DROP TABLE t1 in the system tablespace.
This mini-transaction (at LSN=9388592) failed to write any FREE_PAGE record. At mtr_t::commit() , m_memo would comprise 69 latches, and 80 bytes of log would have been written: WRITE , MEMSET and MEMMOVE records and two OPT_PAGE_CHECKSUM for other pages than this freed page.
The oldest_modification of the page was 8407829 (which is before the recovery checkpoint LSN) at the time when it was cleared by the page cleaner.
I think that the problem is that there was no FREE_PAGE record written for this page 494 (0x1ee). We do have it for some neighbours, in mlog_init at the time the recovery catches the error:
…
[{m_id = 446}] = {lsn = 9388941, created = false},
[{m_id = 447}] = {lsn = 9388900, created = false},
[{m_id = 512}] = {lsn = 9390540, created = false},
[{m_id = 513}] = {lsn = 9392084, created = false},
…
The test case that I used is as follows:
--source include/have_innodb.inc
--source include/have_sequence.inc
SET GLOBAL INNODB_FILE_PER_TABLE= 0;
CREATE TABLE t1(f1 INT NOT NULL , f2 INT NOT NULL ,
f3 INT NOT NULL , INDEX (f1),
INDEX (f2), INDEX (f3))ENGINE=InnoDB;
INSERT INTO t1 SELECT seq, seq, seq FROM seq_1_to_16384;
INSERT INTO t1 SELECT seq, seq, seq FROM seq_1_to_16384;
INSERT INTO t1 SELECT seq, seq, seq FROM seq_1_to_16384;
DROP TABLE t1;
SET GLOBAL innodb_max_purge_lag_wait=0;
let $shutdown_timeout=0;
--source include/restart_mysqld.inc
This would seem to fail reliably when not using rr . I produced my rr trace as follows:
while . /mtr --rr=-h --parallel=auto name_of_test{,,,,,,,,,,,,,,,,}; do :; done
This was with the test case but not the code change of of my 10.5 fix.
Marko Mäkelä
added a comment - I was able to reproduce this on 10.5 as well:
10.5 cb1965bd9d840665b2ea52216985d88c39a8a9f9
mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:273: log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed.
mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:273: log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed.
This was with the test case but not the code change of of my 10.5 fix .
People
Marko Mäkelä
Sergei Golubchik
Votes:
0Vote for this issue
Watchers:
4Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
Using the mtr test case that thiru provided, I was able to produce an rr replay trace of this. I got an OPT_PAGE_CHECKSUM failure (
MDEV-18976); these records are something that is being written in debug builds. The failure occurred for the following OPT_PAGE_CHECKSUM record:10.6 0f9acce3f2fd8d8f6334e10b08f9b5a031d40fdf
(rr) p *this
$2 = {<log_rec_t> = {next = 0x7f1871570dc8, lsn = 8448794}, start_lsn = 8448769}
In that page, we had FIL_PAGE_LSN of 6281382, which is too old. The recovery had started like this:
10.6 0f9acce3f2fd8d8f6334e10b08f9b5a031d40fdf
2023-09-14 11:08:03 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=8415203,9398815
2023-09-14 11:08:03 0 [Note] InnoDB: To recover: 260 pages
2023-09-14 11:08:04 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=0, page number=494]
I found the corresponding mini-transaction in the rr replay trace of the server that was intentionally killed by the test. The page was being marked as freed by the following:
10.6 0f9acce3f2fd8d8f6334e10b08f9b5a031d40fdf
#3 0x00005598517b44a2 in buf_page_free (space=space@entry=0x5598545749f8, page=page@entry=494, mtr=mtr@entry=0x7e813d744a70) at /mariadb/10.6m/storage/innobase/buf/buf0buf.cc:2157
#4 0x00005598517eb685 in fseg_free_extent (seg_inode=seg_inode@entry=0x19806e8cf872 "", iblock=0x19806e4535f0, space=space@entry=0x5598545749f8, page=448, mtr=mtr@entry=0x7e813d744a70, ahi=ahi@entry=false)
at /mariadb/10.6m/storage/innobase/fsp/fsp0fsp.cc:2767
#5 0x00005598517ed72c in fseg_free_step (header=0x19806e90c04a "", mtr=mtr@entry=0x7e813d744a70, ahi=ahi@entry=false) at /mariadb/10.6m/storage/innobase/fsp/fsp0fsp.cc:2831
#6 0x00005598517f450c in btr_free_but_not_root (block=block@entry=0x19806e4542f0, log_mode=MTR_LOG_ALL, ahi=ahi@entry=false) at /mariadb/10.6m/storage/innobase/btr/btr0btr.cc:1142
#7 0x00005598517f6fb2 in btr_free_if_exists (space=space@entry=0x5598545749f8, page=page@entry=313, index_id=<optimized out>, mtr=mtr@entry=0x7e813d745150)
at /mariadb/10.6m/storage/innobase/btr/btr0btr.cc:1229
#8 0x0000559851870d44 in dict_drop_index_tree (pcur=pcur@entry=0x5598545d15e0, trx=trx@entry=0x0, mtr=mtr@entry=0x7e813d745150) at /mariadb/10.6m/storage/innobase/dict/dict0crea.cc:905
#9 0x000055985198a455 in row_purge_remove_clust_if_poss_low (node=node@entry=0x5598545d1540, mode=mode@entry=BTR_MODIFY_LEAF) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:166
#10 0x000055985198aea9 in row_purge_remove_clust_if_poss (node=node@entry=0x5598545d1540) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:259
#11 0x000055985198b043 in row_purge_del_mark (node=node@entry=0x5598545d1540) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:635
#12 0x000055985198b799 in row_purge_record_func (node=node@entry=0x5598545d1540, undo_rec=undo_rec@entry=0x5c8478084070 "", thr=thr@entry=0x5598545d14a0, updated_extern=false)
at /mariadb/10.6m/storage/innobase/row/row0purge.cc:1215
#13 0x000055985198b90e in row_purge (node=node@entry=0x5598545d1540, undo_rec=undo_rec@entry=0x5c8478084070 "", thr=thr@entry=0x5598545d14a0) at /mariadb/10.6m/storage/innobase/row/row0purge.cc:1276
This is the purge of delete-marking a SYS_INDEXES record corresponding to DROP TABLE t1 in the system tablespace.
This mini-transaction (at LSN=9388592) failed to write any FREE_PAGE record. At mtr_t::commit(), m_memo would comprise 69 latches, and 80 bytes of log would have been written: WRITE, MEMSET and MEMMOVE records and two OPT_PAGE_CHECKSUM for other pages than this freed page.
The oldest_modification of the page was 8407829 (which is before the recovery checkpoint LSN) at the time when it was cleared by the page cleaner.
I think that the problem is that there was no FREE_PAGE record written for this page 494 (0x1ee). We do have it for some neighbours, in mlog_init at the time the recovery catches the error:
…
[{m_id = 446}] = {lsn = 9388941, created = false},
[{m_id = 447}] = {lsn = 9388900, created = false},
[{m_id = 512}] = {lsn = 9390540, created = false},
[{m_id = 513}] = {lsn = 9392084, created = false},
…
The test case that I used is as follows:
--source include/have_innodb.inc
--source include/have_sequence.inc
let $shutdown_timeout=0;
This would seem to fail reliably when not using rr. I produced my rr trace as follows: