[MDEV-22139] fseg_free_page_low() fails to write FREE_PAGE record, breaking recovery Created: 2020-04-03  Updated: 2020-04-08  Resolved: 2020-04-03

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.2
Fix Version/s: 10.5.3

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Attachments: File rqg.log     File rqg.yy    
Issue Links:
Duplicate
is duplicated by MDEV-22096 Mariabackup copied too old page or to... Closed
is duplicated by MDEV-22097 Not applying DELETE_ROW_FORMAT_REDUND... Closed
Problem/Incident
is caused by MDEV-15528 Avoid writing freed InnoDB pages Closed

 Description   

Workflow of the RQG test
 
1. Create DB server and start it
2. One session starts to run some SQL stream against this server
3. At some point of time and not related to the state (waiting for result, sending statement, ...) of  the ongoing 2. the server process gets killed (KILL and not TERM).
4. Make some copy of the data content+logs etc. of the killed server.
5. Try to restart the killed server.
 
5. fails with
...
2020-04-03  8:58:49 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=117659470
2020-04-03  8:58:49 0 [Note] InnoDB: Transaction 2264 was in the XA prepared state.
2020-04-03  8:58:49 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
2020-04-03  8:58:49 0 [Note] InnoDB: Trx id counter is 2265
2020-04-03  8:58:49 0 [Note] InnoDB: Starting final batch to recover 133 pages from redo log.
2020-04-03  8:58:50 0 [ERROR] InnoDB: Not applying INSERT_HEAP_DYNAMIC due to corruption on [page id: space=5, page number=68]
2020-04-03  8:58:50 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
2020-04-03  8:58:50 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1574] with error Data structure corruption
...
2020-04-03  8:58:50 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2020-04-03  8:58:50 0 [Note] Plugin 'FEEDBACK' is disabled.
 
RQG
git clone https://github.com/mleich1/rqg --branch experimental RQG_mleich1
origin/experimental d417497b476e263428422b4640cf63b8e9d16afe 2020-03-30T17:17:54+02:00
 
MariaDB
origin/10.5 0870b75af7d01f2e8132278aeb0873dd6c65d1be 2020-04-03T08:50:46+03:00
build with debug
 
cd <top directory of tree with RQG>
perl rqg.pl \
--duration=120 \
--queries=10000000 \
--threads=1 \
--no_mask \
--seed=random \
--engine=InnoDB \
--gendata=conf/engines/many_indexes.zz \
--rpl_mode=none \
--mysqld=--innodb_page_size=32K \
--mysqld=--net_read_timeout=30 \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--innodb-lock-wait-timeout=50 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--mysqld=--net_write_timeout=60 \
--mysqld=--wait_timeout=28800 \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--connect_timeout=60 \
--mysqld=--log-bin \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--loose_innodb_use_native_aio=1 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--innodb_stats_persistent=off \
--mysqld=--log-output=none \
--mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
--reporters=CrashRecovery1,Deadlock1,ErrorLog,None,ServerDead \
--validators=None \
--grammar=rqg.yy \
--workdir=<local setting> \
--vardir=<local setting> \
--mtr-build-thread=<local setting> \
--basedir1=<path to binaries>
 
The uploaded RQG protocol "rqg.log" shows exact that RQG run.



 Comments   
Comment by Marko Mäkelä [ 2020-04-03 ]

In the case that I analyzed, we have an INSERT_REUSE_REDUNDANT redo log record whose preceding record points to unused space (after the PAGE_HEAP_TOP) in a page.

In a debugging session, I observed the following:

  1. The latest checkpoint LSN was 114599300.
  2. The problematic page 5:86 carries FIL_PAGE_LSN=0x6be1508=113120520.
  3. The problematic INSERT_REUSE_REDUNDANT record is the very first record for the page since the checkpoint, at LSN 114599768.
  4. In the rr replay of the server that was killed, I found the time when the FIL_PAGE_LSN corresponding to the page image in the file was written. The page was assigned to $1.
  5. With dump binary memory I extracted and compared the page images, and they matched (except for the last 8 bytes, which is normal).
  6. With a watchpoint on PAGE_HEAP_TOP in the page frame, I found some INSERT operations that would grow the page. Log was written for them. That was before the checkpoint LSN, so recovery would not see them.
  7. break log_write_checkpoint_info caught the write of the checkpoint header, with end_lsn=117659470.
  8. But, there was no call to write out the page from the buffer pool.
  9. I restarted after watch -l $1.page.in_flush_list. That was hit in buf_flush_freed_page().
  10. I restarted after setting a watchpoint on $1.page.status. (This had to be set on a numeric (unsigned*) address, because the used gdb version had truoble with enum.)
  11. At LSN 115022079 (note: this is between the LSN of the recovered log record 114599768 and the checkpoint end_lsn=117659470) we would set $1.page.status=buf_page_t::FREED without first writing a FREE_PAGE record in fseg_free_page_low().
  12. If the FREE_PAGE record would have been written, recovery would have seen that (because the log had been completely written until at least the checkpoint end_lsn=117659470) and would have avoided reading the freed page altogether.

The following patch fixes the problem:

diff --git a/storage/innobase/fsp/fsp0fsp.cc b/storage/innobase/fsp/fsp0fsp.cc
index 86b199c5267..7d970c12814 100644
--- a/storage/innobase/fsp/fsp0fsp.cc
+++ b/storage/innobase/fsp/fsp0fsp.cc
@@ -2605,6 +2605,8 @@ fseg_free_page_low(
 			    xdes, xoffset, mtr);
 		fsp_free_extent(space, offset, mtr);
 	}
+
+	mtr->free(page_id_t(space->id, offset));
 }
 
 #ifndef BTR_CUR_HASH_ADAPT

Generated at Thu Feb 08 09:12:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.