[MDEV-22139] fseg_free_page_low() fails to write FREE_PAGE record, breaking recovery - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.5.2
Fix Version/s: 10.5.3
Component/s: Storage Engine - InnoDB
Labels:
None

Description

Workflow of the RQG test

1. Create DB server and start it

2. One session starts to run some SQL stream against this server

3. At some point of time and not related to the state (waiting for result, sending statement, ...) of  the ongoing 2. the server process gets killed (KILL and not TERM).

4. Make some copy of the data content+logs etc. of the killed server.

5. Try to restart the killed server.

5. fails with

...

2020-04-03  8:58:49 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=117659470

2020-04-03  8:58:49 0 [Note] InnoDB: Transaction 2264 was in the XA prepared state.

2020-04-03  8:58:49 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo

2020-04-03  8:58:49 0 [Note] InnoDB: Trx id counter is 2265

2020-04-03  8:58:49 0 [Note] InnoDB: Starting final batch to recover 133 pages from redo log.

2020-04-03  8:58:50 0 [ERROR] InnoDB: Not applying INSERT_HEAP_DYNAMIC due to corruption on [page id: space=5, page number=68]

2020-04-03  8:58:50 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.

2020-04-03  8:58:50 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1574] with error Data structure corruption

...

2020-04-03  8:58:50 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

2020-04-03  8:58:50 0 [Note] Plugin 'FEEDBACK' is disabled.

RQG

git clone https://github.com/mleich1/rqg --branch experimental RQG_mleich1

origin/experimental d417497b476e263428422b4640cf63b8e9d16afe 2020-03-30T17:17:54+02:00

MariaDB

origin/10.5 0870b75af7d01f2e8132278aeb0873dd6c65d1be 2020-04-03T08:50:46+03:00

build with debug

cd <top directory of tree with RQG>

perl rqg.pl \

--duration=120 \

--queries=10000000 \

--threads=1 \

--no_mask \

--seed=random \

--engine=InnoDB \

--gendata=conf/engines/many_indexes.zz \

--rpl_mode=none \

--mysqld=--innodb_page_size=32K \

--mysqld=--net_read_timeout=30 \

--mysqld=--lock-wait-timeout=86400 \

--mysqld=--loose-table_lock_wait_timeout=50 \

--mysqld=--loose-idle_readonly_transaction_timeout=0 \

--mysqld=--innodb-lock-wait-timeout=50 \

--mysqld=--loose-debug_assert_on_not_freed_memory=0 \

--mysqld=--net_write_timeout=60 \

--mysqld=--wait_timeout=28800 \

--mysqld=--log_bin_trust_function_creators=1 \

--mysqld=--connect_timeout=60 \

--mysqld=--log-bin \

--mysqld=--loose-idle_write_transaction_timeout=0 \

--mysqld=--loose-idle_transaction_timeout=0 \

--mysqld=--interactive_timeout=28800 \

--mysqld=--loose_innodb_use_native_aio=1 \

--mysqld=--slave_net_timeout=60 \

--mysqld=--innodb_stats_persistent=off \

--mysqld=--log-output=none \

--mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \

--reporters=CrashRecovery1,Deadlock1,ErrorLog,None,ServerDead \

--validators=None \

--grammar=rqg.yy \

--workdir=<local setting> \

--vardir=<local setting> \

--mtr-build-thread=<local setting> \

--basedir1=<path to binaries>

The uploaded RQG protocol "rqg.log" shows exact that RQG run.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

rqg.log
2020-04-03 12:08
82 kB
Matthias Leich
rqg.yy
2020-04-03 12:09
0.5 kB
Matthias Leich

Issue Links

is caused by

MDEV-15528 Avoid writing freed InnoDB pages

Closed

is duplicated by

MDEV-22096 Mariabackup copied too old page or too new checkpoint

Closed

MDEV-22097 Not applying DELETE_ROW_FORMAT_REDUNDANT due to corruption

Closed

Activity

Marko Mäkelä added a comment - 2020-04-03 14:36

In the case that I analyzed, we have an INSERT_REUSE_REDUNDANT redo log record whose preceding record points to unused space (after the PAGE_HEAP_TOP) in a page.

In a debugging session, I observed the following:

The latest checkpoint LSN was 114599300.
The problematic page 5:86 carries FIL_PAGE_LSN=0x6be1508=113120520.
The problematic INSERT_REUSE_REDUNDANT record is the very first record for the page since the checkpoint, at LSN 114599768.
In the rr replay of the server that was killed, I found the time when the FIL_PAGE_LSN corresponding to the page image in the file was written. The page was assigned to $1.
With dump binary memory I extracted and compared the page images, and they matched (except for the last 8 bytes, which is normal).
With a watchpoint on PAGE_HEAP_TOP in the page frame, I found some INSERT operations that would grow the page. Log was written for them. That was before the checkpoint LSN, so recovery would not see them.
break log_write_checkpoint_info caught the write of the checkpoint header, with end_lsn=117659470.
But, there was no call to write out the page from the buffer pool.
I restarted after watch -l $1.page.in_flush_list. That was hit in buf_flush_freed_page().
I restarted after setting a watchpoint on $1.page.status. (This had to be set on a numeric (unsigned*) address, because the used gdb version had truoble with enum.)
At LSN 115022079 (note: this is between the LSN of the recovered log record 114599768 and the checkpoint end_lsn=117659470) we would set $1.page.status=buf_page_t::FREED without first writing a FREE_PAGE record in fseg_free_page_low().
If the FREE_PAGE record would have been written, recovery would have seen that (because the log had been completely written until at least the checkpoint end_lsn=117659470) and would have avoided reading the freed page altogether.

The following patch fixes the problem:

diff --git a/storage/innobase/fsp/fsp0fsp.cc b/storage/innobase/fsp/fsp0fsp.cc

index 86b199c5267..7d970c12814 100644

--- a/storage/innobase/fsp/fsp0fsp.cc

+++ b/storage/innobase/fsp/fsp0fsp.cc

@@ -2605,6 +2605,8 @@ fseg_free_page_low(

 			    xdes, xoffset, mtr);

 		fsp_free_extent(space, offset, mtr);

+	mtr->free(page_id_t(space->id, offset));

 #ifndef BTR_CUR_HASH_ADAPT

Marko Mäkelä added a comment - 2020-04-03 14:36 In the case that I analyzed, we have an INSERT_REUSE_REDUNDANT redo log record whose preceding record points to unused space (after the PAGE_HEAP_TOP ) in a page. In a debugging session, I observed the following: The latest checkpoint LSN was 114599300. The problematic page 5:86 carries FIL_PAGE_LSN =0x6be1508=113120520. The problematic INSERT_REUSE_REDUNDANT record is the very first record for the page since the checkpoint, at LSN 114599768. In the rr replay of the server that was killed, I found the time when the FIL_PAGE_LSN corresponding to the page image in the file was written. The page was assigned to $1 . With dump binary memory I extracted and compared the page images, and they matched (except for the last 8 bytes, which is normal). With a watchpoint on PAGE_HEAP_TOP in the page frame, I found some INSERT operations that would grow the page. Log was written for them. That was before the checkpoint LSN, so recovery would not see them. break log_write_checkpoint_info caught the write of the checkpoint header, with end_lsn =117659470. But, there was no call to write out the page from the buffer pool. I restarted after watch -l $1.page.in_flush_list . That was hit in buf_flush_freed_page() . I restarted after setting a watchpoint on $1.page.status . (This had to be set on a numeric (unsigned*) address, because the used gdb version had truoble with enum .) At LSN 115022079 (note: this is between the LSN of the recovered log record 114599768 and the checkpoint end_lsn =117659470) we would set $1.page.status=buf_page_t::FREED without first writing a FREE_PAGE record in fseg_free_page_low() . If the FREE_PAGE record would have been written, recovery would have seen that (because the log had been completely written until at least the checkpoint end_lsn =117659470) and would have avoided reading the freed page altogether. The following patch fixes the problem: diff --git a/storage/innobase/fsp/fsp0fsp.cc b/storage/innobase/fsp/fsp0fsp.cc index 86b199c5267..7d970c12814 100644 --- a/storage/innobase/fsp/fsp0fsp.cc +++ b/storage/innobase/fsp/fsp0fsp.cc @@ -2605,6 +2605,8 @@ fseg_free_page_low( xdes, xoffset, mtr); fsp_free_extent(space, offset, mtr); } + + mtr->free(page_id_t(space->id, offset)); } #ifndef BTR_CUR_HASH_ADAPT

MariaDB Server

fseg_free_page_low() fails to write FREE_PAGE record, breaking recovery

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration