[MDEV-30479] OPT_PAGE_CHECKSUM mismatch during recovery after innodb_undo_log_truncate=ON Created: 2023-01-26  Updated: 2023-07-10  Resolved: 2023-02-06

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0
Fix Version/s: 10.11.3, 11.0.1, 10.5.20, 10.6.13, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Critical
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: corruption, recovery, rr-profile-analyzed

Attachments: File TBR-1646.cc     File data.tar.xz    
Issue Links:
Blocks
Problem/Incident
causes MDEV-30657 [ERROR] InnoDB: Not applying UNDO_APP... Closed
causes MDEV-31487 Recovery or backup failure after inno... Closed
Relates
relates to MDEV-29986 Set innodb_undo_tablespaces=3 by default Closed

 Description   

origin/10.7 d1a4f6e627b3ad2c86946da08b8e82909707da7c 2023-01-17T11:41:29+02:00
rr
origin/bb-10.6-MDEV-30400 de4030e4d49805a7ded5c0bfee01cc3fd7623522 2023-01-24T14:09:21+02:00
Scenario:
1. Start the DB server and generate some initial data
2. Several sessions run a DML mix
3. During 2. is ongoing SIGKILL the DB server and try to restart
The restart fails with
# 2023-01-26T12:43:46 [195747] | [rr 260971 8174]2023-01-26 12:42:58 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=2, page number=577]
# 2023-01-26T12:43:46 [195747] | [rr 260971 8178]2023-01-26 12:42:58 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
# 2023-01-26T12:43:46 [195747] | [rr 260971 8269]2023-01-26 12:42:58 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=2, page number=577]; set innodb_force_recovery to ignore
# 2023-01-26T12:43:46 [195747] | [rr 260971 8273]2023-01-26 12:42:58 0 [ERROR] InnoDB: Failed to read page 577 from file './/undo002': Page read from tablespace is corrupted.
# 2023-01-26T12:43:46 [195747] | [rr 260971 17093]2023-01-26 12:43:00 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1496] with error Data structure corruption
# 2023-01-26T12:43:46 [195747] | [rr 260971 17097]2023-01-26 12:43:00 0 [Note] InnoDB: Starting shutdown...
# 2023-01-26T12:43:46 [195747] | [rr 260971 17665]2023-01-26 12:43:01 0 [ERROR] Plugin 'InnoDB' init function returned error.
 
pluto:/data/results/1674736672/TBR-1646$ _RR_TRACE_DIR=./1/rr/ rr replay --mark-stdio
 
RQG
===
# git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
#
# GIT_SHOW: HEAD -> master, origin/master, origin/HEAD 36f1d5c64870abcdbe123ec98869f7045533d458 2023-01-23T16:20:15+01:00
# rqg.pl  : Version 4.2.1 (2022-12)
#
# $RQG_HOME/rqg.pl \
# --grammar=conf/engines/many_indexes.yy \
# --gendata=conf/engines/many_indexes.zz \
# --reporters=CrashRecovery \
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --no-mask \
# --queries=10000000 \
# --seed=random \
# --reporters=Backtrace \
# --reporters=ErrorLog \
# --reporters=Deadlock \
# --validators=None \
# --mysqld=--log_output=none \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --engine=InnoDB \
# --restart_timeout=240 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--plugin-load-add=provider_lzo.so \
# --mysqld=--plugin-load-add=provider_bzip2.so \
# --mysqld=--plugin-load-add=provider_lzma.so \
# --mysqld=--plugin-load-add=provider_snappy.so \
# --mysqld=--plugin-load-add=provider_lz4.so \
# --duration=300 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--innodb_file_per_table=1 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--innodb_stats_persistent=off \
# --mysqld=--innodb_adaptive_hash_index=off \
# --mysqld=--innodb_sort_buffer_size=65536 \
# --redefine=conf/mariadb/redefine_checks_off.yy \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
# --mysqld=--loose-max-statement-time=30 \
# --threads=2 \
# --mysqld=--innodb-use-native-aio=0 \
# --mysqld=--loose-gdb \
# --mysqld=--loose-debug-gdb \
# --rr=Extended \
# --rr_options=--wait \
# --mysqld=--innodb_undo_tablespaces=3 \
# --mysqld=--innodb_undo_log_truncate=ON \
# --mysqld=--loose_innodb_change_buffering=all \
# --mysqld=--innodb_rollback_on_timeout=OFF \
# --mysqld=--innodb_page_size=4K \
# --mysqld=--innodb-buffer-pool-size=256M \
# <local settings>
I had 3 replays of the problem per ~ 280 runs of the test.



 Comments   
Comment by Marko Mäkelä [ 2023-01-27 ]

The page in question was written here:

ssh pluto
rr replay /data/results/1674736672/TBR-1646/1/rr/mysqld-1

run 1928000
watch -l log_sys.lsn._M_i
reverse-continue
backtrace

Thread 1 hit Hardware watchpoint 1: -location log_sys.lsn._M_i
 
Old value = 93515647
New value = 93515562
0x0000563239a5a349 in std::__atomic_base<unsigned long>::store (
    __m=std::memory_order_release, __i=93515647, this=<optimized out>)
    at /usr/include/c++/9/bits/atomic_base.h:390
390	      store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept
#5  0x0000563239a4cb77 in mtr_t::commit (this=this@entry=0x7fbb382d6cb0)
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/mtr/mtr0mtr.cc:148
#6  0x0000563239d04be4 in trx_purge_free_segment (
    rseg=rseg@entry=0x56323b7c12c0 <trx_sys+28544>, hdr_addr=...)
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:469
#7  0x0000563239d062a7 in trx_purge_truncate_rseg_history (rseg=..., limit=...)
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:549
#8  0x0000563239d10371 in trx_purge_truncate_history ()
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:643
#9  0x0000563239d16812 in trx_purge (n_tasks=<optimized out>, 

At this point of execution, we can access the block here:

frame 4
set $block=(buf_block_t*)m_memo.small[5].object

Between this and the recovery, the page contents differs quite a bit:

--- /dev/fd/63	2023-01-27 10:22:59.045904382 +0200
+++ /dev/fd/62	2023-01-27 10:22:59.025903573 +0200
@@ -1,6 +1,6 @@
 000000 00 00 00 00 00 00 02 41 ff ff ff ff ff ff ff ff
-000010 00 00 00 00 05 92 ef 2a 00 03 00 00 00 00 00 00
-000020 00 00 00 00 00 02 ff ff ff ff 00 00 00 00 01 8e
+000010 00 00 00 00 03 82 bf 89 00 03 00 00 00 00 00 00
+000020 00 00 00 00 00 02 00 00 02 16 00 26 00 00 01 af
 000030 00 26 00 00 00 00 00 00 00 d6 00 00 00 00 00 00
 000040 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
 *
@@ -11,7 +11,14 @@
 000270 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 000280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 *
-0006f0 00 00 00 00 00 00 00 00 00 d7 00 00 00 00 00 00
+0004b0 00 00 00 00 00 00 00 00 00 a4 00 00 00 00 00 00
+0004c0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
+*
+0004e0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
+0004f0 69 d2 00 00 02 47 ff ff ff ff ff ff ff ff ff ff
+000500 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
+*
+0006f0 ff ff 00 00 00 00 00 00 00 d7 00 00 00 00 00 00
 000700 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
 *
 000720 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
@@ -21,5 +28,12 @@
 000930 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 000940 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 *
-000ff0 00 00 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b
+000db0 00 00 00 00 00 00 00 00 00 d8 00 00 00 00 00 00
+000dc0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
+*
+000de0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
+000df0 69 d2 00 00 00 f9 ff ff ff ff ff ff ff ff ff ff
+000e00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
+*
+000ff0 ff ff 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b
 001000

The mini-transaction generated the following 85 bytes of log:

(rr) p/x *m_log.m_first_block.m_data@m_log.m_size
$6 = {0x39, 0x2, 0x25, 0x32, 0xff, 0xff, 0xff, 0xff, 0x0, 0x0, 0xb7, 0x0, 
  0xff, 0xff, 0xff, 0xff, 0x0, 0x0, 0x34, 0x2, 0x25, 0x31, 0x0, 0x34, 0x2, 
  0x25, 0x2d, 0x0, 0x47, 0x2, 0x81, 0xc1, 0x82, 0x32, 0x4, 0xff, 0x34, 0x2, 
  0x0, 0x3d, 0x5b, 0x2, 0x2, 0x37, 0x35, 0x2, 0x0, 0x80, 0x3b, 0xea, 0x48, 
  0x2, 0x81, 0xc1, 0x81, 0xf2, 0x81, 0xc0, 0x0, 0x77, 0x2, 0x25, 0x0, 0x67, 
  0x57, 0x32, 0xe1, 0x78, 0x2, 0x81, 0xc1, 0x0, 0x19, 0xee, 0x8b, 0x3c, 0x77, 
  0x2, 0x0, 0x0, 0x67, 0x32, 0x82, 0x1f, 0x0}

Let us filter out the records for this page 2:0x241:
0x47, 0x2, 0x81, 0xc1, 0x82, 0x32, 0x4, 0xff, MEMSET, writing 4×0xff bytes at offset 690
0x48, 0x2, 0x81, 0xc1, 0x81, 0xf2, 0x81, 0xc0, 0x0, MEMSET, writing 576×0x0 bytes at offset 626
0x78, 0x2, 0x81, 0xc1, 0x0, 0x19, 0xee, 0x8b, 0x3c, PAGE_CHECKSUM 0x19ee8b3c (this trips the recovery failure)
At the start of the mini-transaction (and before applying any redo log to it on recovery), the page contents differs quite a bit as well:

--- /dev/fd/63	2023-01-27 10:48:35.250176986 +0200
+++ /dev/fd/62	2023-01-27 10:48:35.250176986 +0200
@@ -1,6 +1,6 @@
 000000 00 00 00 00 00 00 02 41 ff ff ff ff ff ff ff ff
-000010 00 00 00 00 05 92 ee fa 00 03 00 00 00 00 00 00
-000020 00 00 00 00 00 02 ff ff ff ff 00 00 00 00 01 8e
+000010 00 00 00 00 03 82 bf 89 00 03 00 00 00 00 00 00
+000020 00 00 00 00 00 02 00 00 02 16 00 26 00 00 01 af
 000030 00 26 00 00 00 00 00 00 00 d6 00 00 00 00 00 00
 000040 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
 *
@@ -8,17 +8,17 @@
 000070 69 d2 00 00 00 f7 ff ff ff ff ff ff ff ff ff ff
 000080 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 *
-000270 ff ff 00 00 00 00 00 00 00 e7 00 00 00 00 00 00
-000280 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
+000270 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+000280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 *
-0002a0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
-0002b0 69 d2 00 00 00 37 00 00 00 38 ff ff ff ff ff ff
-0002c0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
+0004b0 00 00 00 00 00 00 00 00 00 a4 00 00 00 00 00 00
+0004c0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
 *
-0004b0 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-0004c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+0004e0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
+0004f0 69 d2 00 00 02 47 ff ff ff ff ff ff ff ff ff ff
+000500 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 *
-0006f0 00 00 00 00 00 00 00 00 00 d7 00 00 00 00 00 00
+0006f0 ff ff 00 00 00 00 00 00 00 d7 00 00 00 00 00 00
 000700 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
 *
 000720 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
@@ -28,5 +28,12 @@
 000930 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 000940 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 *
-000ff0 00 00 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b
+000db0 00 00 00 00 00 00 00 00 00 d8 00 00 00 00 00 00
+000dc0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
+*
+000de0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
+000df0 69 d2 00 00 00 f9 ff ff ff ff ff ff ff ff ff ff
+000e00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
+*
+000ff0 ff ff 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b
 001000

So, we will have to track earlier changes to the page, to find where some change was not logged.

Comment by Marko Mäkelä [ 2023-01-27 ]

The FIL_PAGE_LSN of the page that was read in recovery was 0x382bf89 (58900361). That change was written to the buffer pool here:

(rr) when
Current event: 1276639
(rr) bt
#0  buf_flush_note_modification (end=58900361, start=58900260, 
    b=0x7fbb4551cb40)
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/include/buf0flu.h:118
#1  mtr_t::commit (this=this@entry=0x7fbb396e0cb0)
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/mtr/mtr0mtr.cc:196
#2  0x0000563239d04be4 in trx_purge_free_segment (
    rseg=rseg@entry=0x56323b7c0cc0 <trx_sys+27008>, hdr_addr=...)
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:469
#3  0x0000563239d062a7 in trx_purge_truncate_rseg_history (rseg=..., limit=...)
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:549
#4  0x0000563239d10371 in trx_purge_truncate_history ()
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:643
#5  0x0000563239d16812 in trx_purge (n_tasks=<optimized out>, 

A little later, this same thread would invoke mtr_t::commit_shrink() on a different tablespace (3, undo003). The last write of our problematic page happened after that:

Thread 5 hit Hardware watchpoint 8: -location $block.page.oldest_modification_
 
Old value = 
      {m = {<std::__atomic_base<unsigned long>> = {_M_i = 22754130}, <No data fields>}}
New value = 
      {m = {<std::__atomic_base<unsigned long>> = {_M_i = 1}, <No data fields>}}
0x0000563239ee7eca in Atomic_relaxed<unsigned long>::store (
    o=std::memory_order_release, i=1, this=0x7fbb4551cb50)
    at /usr/include/c++/9/bits/atomic_base.h:390
390	      store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept
1: log_sys.lsn._M_i = 71354177
(rr) when
Current event: 1543240

Probably soon after this, the checkpoint had been advanced to what the recovery reported:

2023-01-26 12:42:54 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=71646474,71825142

Some next interesting events for the page occur at:

LSN when remarks
75680224 1677754 update of FIL_PAGE_LSN
75700082 1677868 update of FIL_PAGE_LSN
89776724 1890477 mtr_t::page_checksum() calculates checksum 0xd732b6f5
89777019 1890477 update of FIL_PAGE_LSN
91986805 1903303 mtr_t::page_checksum() calculates checksum 0x2beb6bbd
91986862 1903303 update of FIL_PAGE_LSN
91986862 1903308 mtr_t::page_checksum() calculates checksum 0xeb825f4
91986912 1903315 update of FIL_PAGE_LSN
93515647 1928020 trx_purge_truncate_history() invokes buf_page_t::reset_oldest_modification(), discarding changes to this page!
93515647 1928220 mtr_t::commit_shrink() is invoked on this tablespace
116667020 2494112 the server process receives SIGKILL

The results of mtr_t::page_checksum() calls will be appended to the end of the log at mtr_t::commit(), right before the FIL_PAGE_LSN is updated, so they should be consistent.

At LSN=93491365 and when 1926356 we also got a call to mtr_t::commit_shrink() to shrink undo001, a.k.a. tablespace 1.

Something appears to be wrong on the recovery side. Recovery somehow misses the fact that the tablespace was shrunk, and tries to apply a log record that had been written at LSN=93515562, which was before LSN=93515647, which is when shrinking the tablespace was in progress. The recv_sys.recovered_lsn reached 116666679, which only misses 1 or 2 last 512-byte log blocks that would have been written after the SIGKILL. There was a message about completed truncation of undo002 long before that.

Comment by Marko Mäkelä [ 2023-01-27 ]

On recovery, we have the following:

(rr) p recv_sys.truncated_undo_spaces
$13 = {{lsn = 93491365, pages = 640}, {lsn = 93515647, pages = 640}, {lsn = 0, pages = 0} <repeats 125 times>}

So, we are missing a check for the LSN somewhere. The page read was initiated here:

#5  0x000055580c887b3f in buf_page_init_for_read (mode=mode@entry=132, 
    page_id=..., zip_size=zip_size@entry=0, unzip=unzip@entry=true)
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/buf/buf0rea.cc:121
#6  0x000055580c88cf67 in buf_read_page_low (err=err@entry=0x7ffce40ae060, 
    space=space@entry=0x612000023440, sync=sync@entry=false, 
    mode=mode@entry=132, page_id=..., zip_size=zip_size@entry=0, 
    unzip=<optimized out>)
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/buf/buf0rea.cc:300
#7  0x000055580c894c95 in buf_read_recv_pages (space_id=<optimized out>, 
    page_nos=page_nos@entry=0x7ffce40ae720, n=n@entry=1)
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/buf/buf0rea.cc:711
#8  0x000055580c352be5 in recv_read_in_area (page_id=..., i=
      {first = {m_id = 12884901888}, second = {state = page_recv_t::RECV_NOT_PROCESSED, last_offset = 1, log = {head = 0x7ff54a6d92c8, tail = 0x7ff54dd7e600}}}) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/include/buf0types.h:131
#9  0x000055580c36ea76 in recv_sys_t::apply (this=0x55580e0966a0 <recv_sys>, 
    last_batch=last_batch@entry=true)
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/log/log0recv.cc:3344

Already at this point, the latest data about undo tablespace truncation is available, and hence we should not have initiated the page read. To be on the safe side, we might both want to avoid initiating the page read, and check in recv_recover_page() if an unnecessary read took place.

Comment by Marko Mäkelä [ 2023-01-27 ]

At the time the page was added to the read queue, we did not know about the truncation yet:

Thread 1 hit Breakpoint 9, recv_sys_t::parse (this=this@entry=0x55580e0966a0 <recv_sys>, checkpoint_lsn=<optimized out>, store=store@entry=0x7ffce40ae0f0, apply=apply@entry=true)
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/log/log0recv.cc:2559
2559	            cached_pages_it= pages.emplace(id, page_recv_t()).first;
3: x/i $pc
=> 0x55580c37e457 <recv_sys_t::parse(unsigned long, store_t*, bool)+24577>:	lea    -0x1840(%r12),%rax
4: /x id.m_id = 0x200000241
(rr) p recv_sys.truncated_undo_spaces 
$18 = {{lsn = 0, pages = 0} <repeats 127 times>}
(rr) when
Current event: 4941

That became known later:

#0  0x000055580c37c927 in recv_sys_t::parse (this=this@entry=0x55580e0966a0 <recv_sys>, checkpoint_lsn=<optimized out>, store=store@entry=0x7ffce40ae0f0, apply=apply@entry=true)
    at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/log/log0recv.cc:2427
#1  0x000055580c381d92 in recv_scan_log_recs (store=store@entry=0x7ffce40ae0f0, log_block=<optimized out>, checkpoint_lsn=<optimized out>, checkpoint_lsn@entry=71646474, start_lsn=start_lsn@entry=93518848, 
    end_lsn=<optimized out>, contiguous_lsn=contiguous_lsn@entry=0x7ffce40ae220, group_scanned_lsn=<optimized out>) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/log/log0recv.cc:3696

I think that at this point, we must remove the garbage entries from recv_sys.pages.

Comment by Marko Mäkelä [ 2023-01-27 ]

data.tar.xz is a copy of the data directory that failed to recover. This turns out to be due to an off-by-one error:

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 2becd00cec6..30181728035 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -2086,7 +2086,7 @@ inline bool page_recv_t::trim(lsn_t start_lsn)
 {
   while (log.head)
   {
-    if (log.head->lsn >= start_lsn) return false;
+    if (log.head->lsn > start_lsn) return false;
     last_offset= 1; /* the next record must not be same_page */
     log_rec_t *next= log.head->next;
     recv_sys.free(log.head);

For this data directory, a mini-transaction that happened to end exactly at the LSN where the undo tablespace truncation starts should have been removed, and the page should not have been read at all:

Thread 1 hit Hardware access (read/write) watchpoint 9: *$33
 
Value = 93515647
page_recv_t::trim (this=0x55de97c8dd38, start_lsn=93515647) at /mariadb/10.5/storage/innobase/log/log0recv.cc:2089
2089	    if (log.head->lsn >= start_lsn) return false;
(rr) p *this
$34 = {state = page_recv_t::RECV_NOT_PROCESSED, last_offset = 1, log = {head = 0x7fd79c6dfec8, tail = 0x7fd79c6dff18}}
(rr) p *(log_phys_t*)log.head
$35 = {<log_rec_t> = {next = 0x7fd79c6dfef0, lsn = 93515647}, start_lsn = 93515562}

With this simple fix (applied to 10.6 or 10.7, which implements MDEV-18976), the data directory will recover fine:

2023-01-27 15:44:26 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=71646474,71825142
2023-01-27 15:44:27 0 [Note] InnoDB: Opened 3 undo tablespaces
2023-01-27 15:44:27 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces
2023-01-27 15:44:27 0 [Note] InnoDB: Starting final batch to recover 4916 pages from redo log.
2023-01-27 15:44:28 0 [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
2023-01-27 15:44:28 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
2023-01-27 15:44:28 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2023-01-27 15:44:28 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2023-01-27 15:44:28 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2023-01-27 15:44:28 0 [Note] InnoDB: 10.6.12 started; log sequence number 116666679; transaction id 6439

Comment by Matthias Leich [ 2023-02-02 ]

The source tree
origin/bb-10.6-MDEV-30479 09435a493eab0e4a7c12e05284c61aa0d12f8128 2023-02-01T14:12:08+02:00
performed well in RQG testing.

Comment by Marko Mäkelä [ 2023-06-26 ]

Unfortunately, this one-character fix caused the bug MDEV-31487. I will reanalyze the trace and try to come up with a better fix.

Generated at Thu Feb 08 10:16:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.