[MDEV-25745] InnoDB recovery fails with [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.5.4, 10.5.2, 10.5.3, 10.5.5, 10.5.6, 10.5.7, 10.5.8, 10.5.9, 10.5.10, 10.6.0, 10.6.1
Fix Version/s: 10.6.2, 10.5.11
Component/s: Galera, Storage Engine - InnoDB
Labels:
- corruption
- rr-profile
Environment:
Operating System: Amazon Linux 2 AMI
Hardware: Amazon AWS t3a.medium
3-Node Galera Cluster

Description

We have several production databases that we migrated early 2021 from single node setup to a 3-node Galera Cluster for high availability. During the migration project we implemented a tool for managing the cluster and one of the features of that tool is "cycle-db-cluster" functionality that replaces the oldest node of the cluster with a totally new one. Since the project we have kept the cluster up to date by replacing nodes periodically using this automated process.

At 2021-05-11 the replace process failed as a new 10.5.10 node could not join our cluster that consisted of 10.5.9 nodes. In mysqld.log the new node reported [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=14408]. We tried to repeat the process several times also using MariaDB version 10.5.9 and a new virtual server each time but the issue persisted. Our mysqldumps worked just fine so we scheduled a service break for 2021-05-12 and started a new cluster using MariaDB 10.5.10 and data from mysqldumps. An important detail is that we imported mysqldumps to a 1-node cluster and added 2 nodes after imports had finished – so joining definitely worked shortly after import. We also replaced the first node of the cluster by using the "cycle-db-cluster" functionality ~1 hour after the service break.

At 2021-05-20 we tried to replace a node in the cluster for the first time since the day we set it up. Again joining a new node to the cluster failed and mysqld.log in the new joining node contains [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=2366]. So now we have almost new 3-node 10.5.10 cluster where we can't join new nodes. The cluster itself continues to perform just fine and mysqldumps succeed. For me it seems that the system detects a corruption for some reason while the data inside databases is actually totally OK.

Attachments:

20210520-cycle-db-cluster.log:

Contains log of the automated cycle-db-cluster process. It contains details how a new db server was set up from scratch all the way to attempting to start the mysqld process.

Conf.zip:

Database configuration files that we use for our nodes. These have a few variables that are replaced automatically during node setup process.

20210520-clusterjoin-failure-mysqld.log:

mysqld.log from the node join failure at 2021-05-20.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

20210520-clusterjoin-failure-mysqld.log
17 kB
2021-05-20 15:00
20210520-cycle-db-cluster.log
135 kB
2021-05-20 15:00
conf.zip
2 kB
2021-05-20 15:02

Issue Links

is caused by

MDEV-21724 Optimize page_cur_insert_rec_low() redo logging

Closed

is duplicated by

MDEV-25795 INSERT_REUSE_REDUNDANT still happening in 10.5.10

Closed

relates to

MDEV-25031 Not applying INSERT_REUSE_REDUNDANT due to corruption on page

Closed

Activity

Ascending order - Click to sort in descending order

Tomi Tukiainen created issue - 2021-05-20 15:10

Marko Mäkelä added a comment - 2021-05-20 15:27

~~MDEV-25031~~ was an error that affected the recovery from a crash or a physical backup in 10.5.

In 20210520-clusterjoin-failure-mysqld.log it indeed seems that a 10.5.10 server is failing to start up:

mariadb-10.5.10

2021-05-20 13:24:02 0 [Note] InnoDB: Starting final batch to recover 28591 pages from redo log.

2021-05-20 13:24:02 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=2366]

Can you provide a copy of the data directory so that I can diagnose this? I would need at least the ib_logfile0 and the first 2367 pages of the system tablespace file(s).

Marko Mäkelä added a comment - 2021-05-20 15:27 MDEV-25031 was an error that affected the recovery from a crash or a physical backup in 10.5. In 20210520-clusterjoin-failure-mysqld.log it indeed seems that a 10.5.10 server is failing to start up: mariadb-10.5.10 2021-05-20 13:24:02 0 [Note] InnoDB: Starting final batch to recover 28591 pages from redo log. 2021-05-20 13:24:02 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=2366] Can you provide a copy of the data directory so that I can diagnose this? I would need at least the ib_logfile0 and the first 2367 pages of the system tablespace file(s).

Marko Mäkelä made changes - 2021-05-20 15:27

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-25031~~ [ ~~MDEV-25031~~ ]

Tomi Tukiainen added a comment - 2021-05-20 15:56

Thanks for responding so quickly to this issue! I sent you a link to the files you requested in an e-mail.

Tomi Tukiainen added a comment - 2021-05-20 15:56 Thanks for responding so quickly to this issue! I sent you a link to the files you requested in an e-mail.

LuborJ added a comment - 2021-05-22 20:15 - edited

Same issue here after mariabackup SST:

May 22 04:19:46 server2 -wsrep-sst-joiner: Move successful, removing /data/mysql//.sst

May 22 04:19:46 server2 -wsrep-sst-joiner: Galera co-ords from recovery: 168f7b8e-dcca-11e3-a9d4-5e95c6d961f7:106462667529

May 22 04:19:46 server2 -wsrep-sst-joiner: Total time on joiner: 0 seconds

May 22 04:19:46 server2 -wsrep-sst-joiner: Removing the sst_in_progress file

May 22 04:19:46 server2 mariadbd[1685]: 2021-05-22  4:19:46 3 [Note] WSREP: SST received

May 22 04:19:46 server2 mariadbd[1685]: 2021-05-22  4:19:46 3 [Note] WSREP: Server status change joiner -> initializing

May 22 04:19:46 server2 mariadbd[1685]: 2021-05-22  4:19:46 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22  4:19:47 0 [Note] mariadbd: Aria engine: starting recovery

May 22 04:19:47 server2 mariadbd[1685]: tables to flush: 1 0

May 22 04:19:47 server2 mariadbd[1685]:  (0.0 seconds);

May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22  4:19:47 0 [Note] mariadbd: Aria engine: recovery done

May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22  4:19:47 0 [Warning] The parameter innodb_buffer_pool_instances is deprecated and has no effect.

May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22  4:19:47 0 [Note] InnoDB: Uses event mutexes

May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22  4:19:47 0 [Note] InnoDB: Compressed tables use zlib 1.2.11

May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22  4:19:47 0 [Note] InnoDB: Number of pools: 1

May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22  4:19:47 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions

May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22  4:19:47 0 [Note] mariadbd: O_TMPFILE is not supported on /data/tmp (disabling future attempts)

May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22  4:19:47 0 [Note] InnoDB: Using Linux native AIO

May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22  4:19:47 0 [Note] InnoDB: Initializing buffer pool, total size = 32212254720, chunk size = 134217728

May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22  4:19:47 0 [Note] InnoDB: Completed initialization of buffer pool

May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22  4:19:47 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=23948611322110,23948611322110

May 22 04:19:51 server2 mariadbd[1685]: 2021-05-22  4:19:51 0 [Note] InnoDB: Starting final batch to recover 32275 pages from redo log.

May 22 04:19:52 server2 mariadbd[1685]: 2021-05-22  4:19:52 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=33844463]

May 22 04:19:52 server2 mariadbd[1685]: 2021-05-22  4:19:52 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.

May 22 04:19:53 server2 mariadbd[1685]: 2021-05-22  4:19:53 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption

May 22 04:19:53 server2 mariadbd[1685]: 2021-05-22  4:19:53 0 [Note] InnoDB: Starting shutdown...

May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22  4:19:54 0 [ERROR] Plugin 'InnoDB' init function returned error.

May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22  4:19:54 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22  4:19:54 0 [Note] Plugin 'FEEDBACK' is disabled.

May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22  4:19:54 0 [ERROR] Unknown/unsupported storage engine: InnoDB

May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22  4:19:54 0 [ERROR] Aborting

May 22 04:19:54 server2 mariadbd[1685]: terminate called after throwing an instance of 'wsrep::runtime_error'

May 22 04:19:54 server2 mariadbd[1685]:   what():  State wait was interrupted

May 22 04:19:54 server2 mariadbd[1685]: 210522  4:19:54 [ERROR] mysqld got signal 6 ;

May 22 04:19:54 server2 mariadbd[1685]: This could be because you hit a bug. It is also possible that this binary

May 22 04:19:54 server2 mariadbd[1685]: or one of the libraries it was linked against is corrupt, improperly built,

May 22 04:19:54 server2 mariadbd[1685]: or misconfigured. This error can also be caused by malfunctioning hardware.

May 22 04:19:54 server2 mariadbd[1685]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs

May 22 04:19:54 server2 mariadbd[1685]: We will try our best to scrape up some info that will hopefully help

May 22 04:19:54 server2 mariadbd[1685]: diagnose the problem, but since we have already crashed,

May 22 04:19:54 server2 mariadbd[1685]: something is definitely wrong and this may fail.

May 22 04:19:54 server2 mariadbd[1685]: Server version: 10.5.10-MariaDB-1:10.5.10+maria~buster-log

May 22 04:19:54 server2 mariadbd[1685]: key_buffer_size=134217728

May 22 04:19:54 server2 mariadbd[1685]: read_buffer_size=1048576

May 22 04:19:54 server2 mariadbd[1685]: max_used_connections=0

May 22 04:19:54 server2 mariadbd[1685]: max_threads=10002

May 22 04:19:54 server2 mariadbd[1685]: thread_count=4

May 22 04:19:54 server2 mariadbd[1685]: It is possible that mysqld could use up to

May 22 04:19:54 server2 mariadbd[1685]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 31111928 K  bytes of memory

May 22 04:19:54 server2 mariadbd[1685]: Hope that's ok; if not, decrease some variables in the equation.

May 22 04:19:54 server2 mariadbd[1685]: Thread pointer: 0x7f777001b918

May 22 04:19:54 server2 mariadbd[1685]: Attempting backtrace. You can use the following information to find out

May 22 04:19:54 server2 mariadbd[1685]: where mysqld died. If you see no messages after this, something went

May 22 04:19:54 server2 mariadbd[1685]: terribly wrong...

May 22 04:19:54 server2 mariadbd[1685]: stack_bottom = 0x7f77711fc300 thread_stack 0x49000

May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x56105e3234ee]

May 22 04:19:54 server2 mariadbd[1685]: Printing to addr2line failed

May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x56105ddacca5]

May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f87793a8730]

May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7f8778efe7bb]

May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7f8778ee9535]

May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8c983)[0x7f8779299983]

May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x928c6)[0x7f877929f8c6]

May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92901)[0x7f877929f901]

May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92b34)[0x7f877929fb34]

May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0x6531f6)[0x56105dab01f6]

May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(_ZN5wsrep12server_state12sst_receivedERNS_14client_serviceEi+0xca9)[0x56105e3afd99]

May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0xbecd63)[0x56105e049d63]

May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0xbeda4e)[0x56105e04aa4e]

May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0xb7a7cb)[0x56105dfd77cb]

May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f877939dfa3]

May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f8778fc04cf]

May 22 04:19:54 server2 mariadbd[1685]: Trying to get some variables.

May 22 04:19:54 server2 mariadbd[1685]: Some pointers may be invalid and cause the dump to abort.

May 22 04:19:54 server2 mariadbd[1685]: Query (0x0): (null)

May 22 04:19:54 server2 mariadbd[1685]: Connection ID (thread ID): 3

May 22 04:19:54 server2 mariadbd[1685]: Status: NOT_KILLED

May 22 04:19:54 server2 mariadbd[1685]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off

May 22 04:19:54 server2 mariadbd[1685]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains

May 22 04:19:54 server2 mariadbd[1685]: information that should help you find out what is causing the crash.

May 22 04:19:54 server2 mariadbd[1685]: We think the query pointer is invalid, but we will try to print it anyway.

May 22 04:19:54 server2 mariadbd[1685]: Query:

May 22 04:19:54 server2 mariadbd[1685]: Writing a core file...

May 22 04:19:54 server2 mariadbd[1685]: Working directory at /data/mysql

May 22 04:19:54 server2 mariadbd[1685]: Resource Limits:

May 22 04:19:54 server2 mariadbd[1685]: Limit                     Soft Limit           Hard Limit           Units

May 22 04:19:54 server2 mariadbd[1685]: Max cpu time              unlimited            unlimited            seconds

May 22 04:19:54 server2 mariadbd[1685]: Max file size             unlimited            unlimited            bytes

May 22 04:19:54 server2 mariadbd[1685]: Max data size             unlimited            unlimited            bytes

May 22 04:19:54 server2 mariadbd[1685]: Max stack size            8388608              unlimited            bytes

May 22 04:19:54 server2 mariadbd[1685]: Max core file size        0                    unlimited            bytes

May 22 04:19:54 server2 mariadbd[1685]: Max resident set          unlimited            unlimited            bytes

May 22 04:19:54 server2 mariadbd[1685]: Max processes             500000               500000               processes

May 22 04:19:54 server2 mariadbd[1685]: Max open files            500000               500000               files

May 22 04:19:54 server2 mariadbd[1685]: Max locked memory         65536                65536                bytes

May 22 04:19:54 server2 mariadbd[1685]: Max address space         unlimited            unlimited            bytes

May 22 04:19:54 server2 mariadbd[1685]: Max file locks            unlimited            unlimited            locks

May 22 04:19:54 server2 mariadbd[1685]: Max pending signals       386408               386408               signals

May 22 04:19:54 server2 mariadbd[1685]: Max msgqueue size         819200               819200               bytes

May 22 04:19:54 server2 mariadbd[1685]: Max nice priority         0                    0

May 22 04:19:54 server2 mariadbd[1685]: Max realtime priority     0                    0

May 22 04:19:54 server2 mariadbd[1685]: Max realtime timeout      unlimited            unlimited            us

May 22 04:19:54 server2 mariadbd[1685]: Core pattern: core

May 22 04:19:54 server2 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT

LuborJ added a comment - 2021-05-22 20:15 - edited Same issue here after mariabackup SST: May 22 04:19:46 server2 -wsrep-sst-joiner: Move successful, removing /data/mysql//.sst May 22 04:19:46 server2 -wsrep-sst-joiner: Galera co-ords from recovery: 168f7b8e-dcca-11e3-a9d4-5e95c6d961f7:106462667529 May 22 04:19:46 server2 -wsrep-sst-joiner: Total time on joiner: 0 seconds May 22 04:19:46 server2 -wsrep-sst-joiner: Removing the sst_in_progress file May 22 04:19:46 server2 mariadbd[1685]: 2021-05-22 4:19:46 3 [Note] WSREP: SST received May 22 04:19:46 server2 mariadbd[1685]: 2021-05-22 4:19:46 3 [Note] WSREP: Server status change joiner -> initializing May 22 04:19:46 server2 mariadbd[1685]: 2021-05-22 4:19:46 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] mariadbd: Aria engine: starting recovery May 22 04:19:47 server2 mariadbd[1685]: tables to flush: 1 0 May 22 04:19:47 server2 mariadbd[1685]: (0.0 seconds); May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] mariadbd: Aria engine: recovery done May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Warning] The parameter innodb_buffer_pool_instances is deprecated and has no effect. May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Uses event mutexes May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Number of pools: 1 May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] mariadbd: O_TMPFILE is not supported on /data/tmp (disabling future attempts) May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Using Linux native AIO May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Initializing buffer pool, total size = 32212254720, chunk size = 134217728 May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Completed initialization of buffer pool May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=23948611322110,23948611322110 May 22 04:19:51 server2 mariadbd[1685]: 2021-05-22 4:19:51 0 [Note] InnoDB: Starting final batch to recover 32275 pages from redo log. May 22 04:19:52 server2 mariadbd[1685]: 2021-05-22 4:19:52 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=33844463] May 22 04:19:52 server2 mariadbd[1685]: 2021-05-22 4:19:52 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. May 22 04:19:53 server2 mariadbd[1685]: 2021-05-22 4:19:53 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption May 22 04:19:53 server2 mariadbd[1685]: 2021-05-22 4:19:53 0 [Note] InnoDB: Starting shutdown... May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22 4:19:54 0 [ERROR] Plugin 'InnoDB' init function returned error. May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22 4:19:54 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22 4:19:54 0 [Note] Plugin 'FEEDBACK' is disabled. May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22 4:19:54 0 [ERROR] Unknown/unsupported storage engine: InnoDB May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22 4:19:54 0 [ERROR] Aborting May 22 04:19:54 server2 mariadbd[1685]: terminate called after throwing an instance of 'wsrep::runtime_error' May 22 04:19:54 server2 mariadbd[1685]: what(): State wait was interrupted May 22 04:19:54 server2 mariadbd[1685]: 210522 4:19:54 [ERROR] mysqld got signal 6 ; May 22 04:19:54 server2 mariadbd[1685]: This could be because you hit a bug. It is also possible that this binary May 22 04:19:54 server2 mariadbd[1685]: or one of the libraries it was linked against is corrupt, improperly built, May 22 04:19:54 server2 mariadbd[1685]: or misconfigured. This error can also be caused by malfunctioning hardware. May 22 04:19:54 server2 mariadbd[1685]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs May 22 04:19:54 server2 mariadbd[1685]: We will try our best to scrape up some info that will hopefully help May 22 04:19:54 server2 mariadbd[1685]: diagnose the problem, but since we have already crashed, May 22 04:19:54 server2 mariadbd[1685]: something is definitely wrong and this may fail. May 22 04:19:54 server2 mariadbd[1685]: Server version: 10.5.10-MariaDB-1:10.5.10+maria~buster-log May 22 04:19:54 server2 mariadbd[1685]: key_buffer_size=134217728 May 22 04:19:54 server2 mariadbd[1685]: read_buffer_size=1048576 May 22 04:19:54 server2 mariadbd[1685]: max_used_connections=0 May 22 04:19:54 server2 mariadbd[1685]: max_threads=10002 May 22 04:19:54 server2 mariadbd[1685]: thread_count=4 May 22 04:19:54 server2 mariadbd[1685]: It is possible that mysqld could use up to May 22 04:19:54 server2 mariadbd[1685]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 31111928 K bytes of memory May 22 04:19:54 server2 mariadbd[1685]: Hope that's ok; if not, decrease some variables in the equation. May 22 04:19:54 server2 mariadbd[1685]: Thread pointer: 0x7f777001b918 May 22 04:19:54 server2 mariadbd[1685]: Attempting backtrace. You can use the following information to find out May 22 04:19:54 server2 mariadbd[1685]: where mysqld died. If you see no messages after this, something went May 22 04:19:54 server2 mariadbd[1685]: terribly wrong... May 22 04:19:54 server2 mariadbd[1685]: stack_bottom = 0x7f77711fc300 thread_stack 0x49000 May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x56105e3234ee] May 22 04:19:54 server2 mariadbd[1685]: Printing to addr2line failed May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x56105ddacca5] May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f87793a8730] May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7f8778efe7bb] May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7f8778ee9535] May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8c983)[0x7f8779299983] May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x928c6)[0x7f877929f8c6] May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92901)[0x7f877929f901] May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92b34)[0x7f877929fb34] May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0x6531f6)[0x56105dab01f6] May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(_ZN5wsrep12server_state12sst_receivedERNS_14client_serviceEi+0xca9)[0x56105e3afd99] May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0xbecd63)[0x56105e049d63] May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0xbeda4e)[0x56105e04aa4e] May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0xb7a7cb)[0x56105dfd77cb] May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f877939dfa3] May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f8778fc04cf] May 22 04:19:54 server2 mariadbd[1685]: Trying to get some variables. May 22 04:19:54 server2 mariadbd[1685]: Some pointers may be invalid and cause the dump to abort. May 22 04:19:54 server2 mariadbd[1685]: Query (0x0): (null) May 22 04:19:54 server2 mariadbd[1685]: Connection ID (thread ID): 3 May 22 04:19:54 server2 mariadbd[1685]: Status: NOT_KILLED May 22 04:19:54 server2 mariadbd[1685]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off May 22 04:19:54 server2 mariadbd[1685]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains May 22 04:19:54 server2 mariadbd[1685]: information that should help you find out what is causing the crash. May 22 04:19:54 server2 mariadbd[1685]: We think the query pointer is invalid, but we will try to print it anyway. May 22 04:19:54 server2 mariadbd[1685]: Query: May 22 04:19:54 server2 mariadbd[1685]: Writing a core file... May 22 04:19:54 server2 mariadbd[1685]: Working directory at /data/mysql May 22 04:19:54 server2 mariadbd[1685]: Resource Limits: May 22 04:19:54 server2 mariadbd[1685]: Limit Soft Limit Hard Limit Units May 22 04:19:54 server2 mariadbd[1685]: Max cpu time unlimited unlimited seconds May 22 04:19:54 server2 mariadbd[1685]: Max file size unlimited unlimited bytes May 22 04:19:54 server2 mariadbd[1685]: Max data size unlimited unlimited bytes May 22 04:19:54 server2 mariadbd[1685]: Max stack size 8388608 unlimited bytes May 22 04:19:54 server2 mariadbd[1685]: Max core file size 0 unlimited bytes May 22 04:19:54 server2 mariadbd[1685]: Max resident set unlimited unlimited bytes May 22 04:19:54 server2 mariadbd[1685]: Max processes 500000 500000 processes May 22 04:19:54 server2 mariadbd[1685]: Max open files 500000 500000 files May 22 04:19:54 server2 mariadbd[1685]: Max locked memory 65536 65536 bytes May 22 04:19:54 server2 mariadbd[1685]: Max address space unlimited unlimited bytes May 22 04:19:54 server2 mariadbd[1685]: Max file locks unlimited unlimited locks May 22 04:19:54 server2 mariadbd[1685]: Max pending signals 386408 386408 signals May 22 04:19:54 server2 mariadbd[1685]: Max msgqueue size 819200 819200 bytes May 22 04:19:54 server2 mariadbd[1685]: Max nice priority 0 0 May 22 04:19:54 server2 mariadbd[1685]: Max realtime priority 0 0 May 22 04:19:54 server2 mariadbd[1685]: Max realtime timeout unlimited unlimited us May 22 04:19:54 server2 mariadbd[1685]: Core pattern: core May 22 04:19:54 server2 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT

Marko Mäkelä added a comment - 2021-05-24 11:36

tomitukiainen, thank you, I downloaded the partial copy of the data directory that you provided. Before starting any recovery, I copied the corrupted page:

dd if=ibdata1 bs=16384 skip=2366 count=1 of=page2336.bin

I see that FIL_PAGE_LSN (the 8 bytes at byte offset 16) is 0x4697b0dad (18949541293).

Then, I started up a debug build of recent 10.5 branch to find out which log records are referring to this page:

sql/mariadbd --datadir=… --debug=d,ib_log 2>&1| grep -w 2366 |tee 2366.txt

Finally, after the above run failed (without modifying any files), I started up the server without that debug option, but with innodb_force_recovery=1 so that missing *.ibd files will be ignored. There indeed were messages issued about page 0:2366:

2021-05-24 14:04:44 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=18950648175,18950648175

…

2021-05-24 14:04:46 0 [Note] InnoDB: Starting final batch to recover 1172 pages from redo log.

2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=2366]

2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=2366]

2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366]

2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366]

2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366]

2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366]

2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366]

2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366]

2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366]

2021-05-24 14:04:46 0 [ERROR] InnoDB: n recs wrong 156 165

2021-05-24 14:04:46 0 [ERROR] InnoDB: n recs wrong 156 165

mariadbd: /mariadb/10.5m/storage/innobase/page/page0cur.cc:2530: bool page_apply_insert_redundant(const buf_block_t&, bool, ulint, ulint, size_t, size_t, const void*, size_t): Assertion `page_simple_validate_old(block.frame)' failed.

Now, let us look at the log records for this page:

?func: ib_log: scan 18951635875: rec 20 len 79 page 0:2366

?func: ib_log: scan 18951635875: rec b3 len 4 page 0:2366

…

?func: ib_log: scan 19067167151: rec 3a len 11 page 0:2366

?func: ib_log: scan 19067471753: rec 3a len 11 page 0:2366

So far, everything looks consistent. The page was apparently last written before the checkpoint, and the subsequent modification is after the checkpoint (18949541293<18950648175<18951635875).

The page belongs to the change buffer. To work around this bug (and ~~MDEV-22373~~, which might be related to this), you can set innodb_change_buffering=none and ensure that you have no ROW_FORMAT=REDUNDANT tables. Disabling change buffering may cause some performance regression (see ~~MDEV-14094~~, ~~MDEV-11634~~, ~~MDEV-19514~~).

Because the change buffer will be corrupted, also some secondary index leaf pages may be corrupted. You should try execute CHECK TABLE on each InnoDB table to force a change buffer merge and to check for corruption. Corrupted secondary indexes can be dropped and re-created.

Using the data, I will try to figure out which exact write is failing, but this will take significant time and effort, because I do not have a complete execution trace of the server that was running before the data was copied. If we had an https://rr-project.org/ trace of the killed server in addition to having the data directory, this would be trivial to diagnose.

mleich, can you try to produce such a trace? Run 10.5 or 10.6 with a tiny buffer pool and a heavily indexed table (say, index(a,b,c,d),index(a,b,d,c),…,index(d,c,b,a) and UPDATE … SET a=random_data). Avoid any WHERE condition on any of those indexed columns, so that change buffer merges will be avoided for as long as possible.

Marko Mäkelä added a comment - 2021-05-24 11:36 tomitukiainen , thank you, I downloaded the partial copy of the data directory that you provided. Before starting any recovery, I copied the corrupted page: dd if=ibdata1 bs=16384 skip=2366 count=1 of=page2336.bin I see that FIL_PAGE_LSN (the 8 bytes at byte offset 16) is 0x4697b0dad (18949541293). Then, I started up a debug build of recent 10.5 branch to find out which log records are referring to this page: sql/mariadbd --datadir=… --debug=d,ib_log 2>&1| grep -w 2366 |tee 2366.txt Finally, after the above run failed (without modifying any files), I started up the server without that debug option, but with innodb_force_recovery=1 so that missing *.ibd files will be ignored. There indeed were messages issued about page 0:2366: 2021-05-24 14:04:44 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=18950648175,18950648175 … 2021-05-24 14:04:46 0 [Note] InnoDB: Starting final batch to recover 1172 pages from redo log. 2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=2366] 2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=2366] 2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366] 2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366] 2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366] 2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366] 2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366] 2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366] 2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366] 2021-05-24 14:04:46 0 [ERROR] InnoDB: n recs wrong 156 165 2021-05-24 14:04:46 0 [ERROR] InnoDB: n recs wrong 156 165 mariadbd: /mariadb/10.5m/storage/innobase/page/page0cur.cc:2530: bool page_apply_insert_redundant(const buf_block_t&, bool, ulint, ulint, size_t, size_t, const void*, size_t): Assertion `page_simple_validate_old(block.frame)' failed. Now, let us look at the log records for this page: ?func: ib_log: scan 18951635875: rec 20 len 79 page 0:2366 ?func: ib_log: scan 18951635875: rec b3 len 4 page 0:2366 … ?func: ib_log: scan 19067167151: rec 3a len 11 page 0:2366 ?func: ib_log: scan 19067471753: rec 3a len 11 page 0:2366 So far, everything looks consistent. The page was apparently last written before the checkpoint, and the subsequent modification is after the checkpoint (18949541293<18950648175<18951635875). The page belongs to the change buffer. To work around this bug (and MDEV-22373 , which might be related to this), you can set innodb_change_buffering=none and ensure that you have no ROW_FORMAT=REDUNDANT tables. Disabling change buffering may cause some performance regression (see MDEV-14094 , MDEV-11634 , MDEV-19514 ). Because the change buffer will be corrupted, also some secondary index leaf pages may be corrupted. You should try execute CHECK TABLE on each InnoDB table to force a change buffer merge and to check for corruption. Corrupted secondary indexes can be dropped and re-created. Using the data, I will try to figure out which exact write is failing, but this will take significant time and effort, because I do not have a complete execution trace of the server that was running before the data was copied. If we had an https://rr-project.org/ trace of the killed server in addition to having the data directory, this would be trivial to diagnose. mleich , can you try to produce such a trace? Run 10.5 or 10.6 with a tiny buffer pool and a heavily indexed table (say, index(a,b,c,d),index(a,b,d,c),…,index(d,c,b,a) and UPDATE … SET a=random_data ). Avoid any WHERE condition on any of those indexed columns, so that change buffer merges will be avoided for as long as possible.

Marko Mäkelä made changes - 2021-05-24 11:36

Fix Version/s		10.5 [ 23123 ]
Fix Version/s		10.6 [ 24028 ]
Assignee		Marko Mäkelä [ marko ]
Labels	corruption	corruption need_rr

Marko Mäkelä added a comment - 2021-05-24 12:00

The following record is the first to be found to be inconsistent with the page contents:

#0  0x000056264c48b4d4 in log_phys_t::apply (this=this@entry=0x7f155569b3b0,

    block=

      @0x7f15441952d0: {page = {id_ = {m_id = 2366},

…

    at /mariadb/10.5m/storage/innobase/log/log0recv.cc:397

397	            if (page_apply_insert_redundant(block, subtype & 1, prev_rec,

(rr) p *this

$1 = {<log_rec_t> = {next = 0x7f155569f320, lsn = 18962387413}, start_lsn = 18962387336}

The same mini-transaction is modifying this page twice (I guess, inserting the buffered record into the change buffer and maybe updating PAGE_MAX_TRX_ID):

?func: ib_log: scan 18962387336: rec 20 len 72 page 0:2366

?func: ib_log: scan 18962387336: rec b3 len 4 page 0:2366

The reason for the inconsistency is:

    if (UNIV_UNLIKELY(free_rec + data_size > heap_top))

      goto corrupted;

That is, we are supposed to reuse the space that was occupied by a previously deleted record (the top of the PAGE_FREE stack), but the allocation is extending past the end of PAGE_HEAP_TOP. This looks serious indeed.

mleich, I think that we’d better use innodb_page_size=4k and indexed VARCHAR columns for reproducing this. If we use a large page size or fixed-size records, we’d never hit this. The check would only fail if the last-allocated record in the page was freed and subsequently reallocated. I hope that this corruption is also reproducible with ROW_FORMAT=REDUNDANT tables (and without change buffering).

I do not think I can do more on this until we have an rr replay trace.

Marko Mäkelä added a comment - 2021-05-24 12:00 The following record is the first to be found to be inconsistent with the page contents: #0 0x000056264c48b4d4 in log_phys_t::apply (this=this@entry=0x7f155569b3b0, block= @0x7f15441952d0: {page = {id_ = {m_id = 2366}, … at /mariadb/10.5m/storage/innobase/log/log0recv.cc:397 397 if (page_apply_insert_redundant(block, subtype & 1, prev_rec, (rr) p *this $1 = {<log_rec_t> = {next = 0x7f155569f320, lsn = 18962387413}, start_lsn = 18962387336} The same mini-transaction is modifying this page twice (I guess, inserting the buffered record into the change buffer and maybe updating PAGE_MAX_TRX_ID ): ?func: ib_log: scan 18962387336: rec 20 len 72 page 0:2366 ?func: ib_log: scan 18962387336: rec b3 len 4 page 0:2366 The reason for the inconsistency is: if (UNIV_UNLIKELY(free_rec + data_size > heap_top)) goto corrupted; That is, we are supposed to reuse the space that was occupied by a previously deleted record (the top of the PAGE_FREE stack), but the allocation is extending past the end of PAGE_HEAP_TOP . This looks serious indeed. mleich , I think that we’d better use innodb_page_size=4k and indexed VARCHAR columns for reproducing this. If we use a large page size or fixed-size records, we’d never hit this. The check would only fail if the last-allocated record in the page was freed and subsequently reallocated. I hope that this corruption is also reproducible with ROW_FORMAT=REDUNDANT tables (and without change buffering). I do not think I can do more on this until we have an rr replay trace.

Sergei Golubchik made changes - 2021-05-25 12:50

Fix Version/s

10.6 [ 24028 ]

Marko Mäkelä made changes - 2021-05-27 10:54

Link

This issue is duplicated by ~~MDEV-25795~~ [ ~~MDEV-25795~~ ]

Marko Mäkelä added a comment - 2021-05-28 15:23

mleich was able to produce an rr replay trace for this:

ssh pluto

rr replay /data/Results/1622138407/TBR-1093/dev/shm/vardir/1622138407/123/1/rr/latest-trace

It reports Not applying INSERT_REUSE_REDUNDANT for a change buffer page, due to exactly the same reason as in the data directory that was provided by tomitukiainen.

Next week, I will get the details from the failed recovery, and then debug the trace of the server run that ended in SIGKILL, to figure out what caused the problem.
If we are lucky, this could explain ~~MDEV-25783~~ as well. (For that to be possible, I think that something related to the change buffer merge must skip writing redo log.)

Marko Mäkelä added a comment - 2021-05-28 15:23 mleich was able to produce an rr replay trace for this: ssh pluto rr replay /data/Results/1622138407/TBR-1093/dev/shm/vardir/1622138407/123/1/rr/latest-trace It reports Not applying INSERT_REUSE_REDUNDANT for a change buffer page, due to exactly the same reason as in the data directory that was provided by tomitukiainen . Next week, I will get the details from the failed recovery, and then debug the trace of the server run that ended in SIGKILL, to figure out what caused the problem. If we are lucky, this could explain MDEV-25783 as well. (For that to be possible, I think that something related to the change buffer merge must skip writing redo log.)

Marko Mäkelä made changes - 2021-05-28 15:23

Labels	corruption need_rr	corruption rr-profile
Priority	Major [ 3 ]	Critical [ 2 ]

Marko Mäkelä made changes - 2021-05-28 15:23

Status

Open [ 1 ]

In Progress [ 3 ]

Marko Mäkelä added a comment - 2021-05-31 10:23

Below, I am documenting my workflow at finding the root cause of this failure. Hopefully it will serve an educational purpose. I believe that with rr replay traces of the killed server and misbehaving recovered server, as well as a copy of the data directory before the recovery was started, any recovery bug can be diagnosed.

When we fail to apply the log record, it is for page 1160 in the system tablespace. We can also determine the LSN:

#2  0x000055de458007c8 in page_apply_insert_redundant (block=..., reuse=true, prev=2652, enc_hdr=44, hdr_c=11, data_c=10, data=0x25592084fe0d, data_len=88)

    at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:2321

2321	    ib::error() << (reuse

(rr) p/x block.frame[16]@8

$1 = {0x0, 0x0, 0x0, 0x0, 0x0, 0xa2, 0xd7, 0x1}

(rr) up

#3  0x000055de457b21e1 in log_phys_t::apply (this=0x25592084fde8, block=..., last_offset=@0x606001178b6c: 0) at /data/Server/bb-10.5-marko/storage/innobase/log/log0recv.cc:397

397	            if (page_apply_insert_redundant(block, subtype & 1, prev_rec,

(rr) p/x *this

$2 = {<log_rec_t> = {next = 0x25592084fe68, lsn = 0xa6a6b2}, start_lsn = 0xa6a64a}

Based on this recovery failure, for the rr replay execution trace of the server that ended in SIGKILL we must check every modification to the page between LSN 0xa2d701 and 0xa6a6b2. The log record that we failed to apply had been written by the following (and I do not yet conclude that there was anything wrong with those writes):

#7  0x000055c213bccd96 in mtr_t::commit (this=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/mtr/mtr0mtr.cc:442

#8  0x000055c213b3f878 in ibuf_mtr_commit (mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/include/ibuf0ibuf.ic:64

#9  0x000055c213b4ae1b in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x61a000844508, entry_size=77, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0)

    at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3466

#10 0x000055c213b4b2f7 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x61a000844508, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0)

    at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3603

#11 0x000055c213e02647 in btr_cur_search_to_nth_level_func (index=0x6160000f03f0, level=0, tuple=0x61a000844508, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x208943901ab0, ahi_latch=0x0,

    file=0x55c214795e40 "/data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc", line=2974, mtr=0x208943901e70, autoinc=0) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:1650

#12 0x000055c213c8b3e2 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x6160000f03f0, offsets_heap=0x619000cb9d80, heap=0x619000cb8e80, entry=0x61a000844508, trx_id=0, thr=0x621004e35de0)

    at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:2971

#13 0x000055c213c8c59f in row_ins_sec_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0, check_foreign=true) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3294

#14 0x000055c213c8c784 in row_ins_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3342

#15 0x000055c213c8d14d in row_ins_index_entry_step (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3509

#16 0x000055c213c8d6b6 in row_ins (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3668

#17 0x000055c213c8de86 in row_ins_step (thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3818

#18 0x000055c213cb83ea in row_insert_for_mysql (mysql_rec=0x61f0001f80b8 "\377f", prebuilt=0x6220002b2170, ins_mode=ROW_INS_NORMAL) at /data/Server/bb-10.5-marko/storage/innobase/row/row0mysql.cc:1419

#19 0x000055c213a3b359 in ha_innobase::write_row (this=0x61d0008638b8, record=0x61f0001f80b8 "\377f") at /data/Server/bb-10.5-marko/storage/innobase/handler/ha_innodb.cc:7669

…

(rr) frame 21

#21 0x000055c2129b21fd in write_record (thd=thd@entry=0x62b0000af218, table=table@entry=0x619000bad198, info=info@entry=0x208943903d40, sink=sink@entry=0x0) at /data/Server/bb-10.5-marko/sql/sql_insert.cc:2107

2107	  else if (unlikely((error=table->file->ha_write_row(table->record[0]))))

(rr) p thd.query_string

$5 = {string = {str = 0x62b0000b6238 "INSERT IGNORE INTO `t1` ( `pk`, `col_char` ) VALUES ( NULL, 'f' ) /* E_R Thread1 QNO 1506 CON_ID 16 */", length = 102}, cs = 0x55c21527dc60 <my_charset_latin1>}

(rr) display

2: /x log_sys.lsn._M_i = 0xa6a6b2

(rr) when

Current event: 623570

(rr) reverse-continue

Continuing.

Thread 18 hit Hardware access (read/write) watchpoint 1: *(ulong*)0x64000007d1c0

Value = 9683160

0x000055c213b3a125 in std::__atomic_base<unsigned long>::load (__m=std::memory_order_relaxed, this=0x64000007d1c0) at /usr/include/c++/9/bits/atomic_base.h:419

419		return __atomic_load_n(&_M_i, int(__m));

2: /x log_sys.lsn._M_i = 0xa6a64a

(rr) bt

#0  0x000055c213b3a125 in std::__atomic_base<unsigned long>::load (__m=std::memory_order_relaxed, this=0x64000007d1c0) at /usr/include/c++/9/bits/atomic_base.h:419

#1  Atomic_counter<unsigned long>::operator unsigned long (this=0x64000007d1c0) at /data/Server/bb-10.5-marko/include/my_counter.h:45

#2  0x000055c213b39ac2 in buf_page_t::oldest_modification (this=0x64000007d178) at /data/Server/bb-10.5-marko/storage/innobase/include/buf0buf.h:936

#3  0x000055c213b537da in mtr_t::is_block_dirtied (block=0x64000007d178) at /data/Server/bb-10.5-marko/storage/innobase/include/mtr0mtr.ic:35

#4  0x000055c213b5385c in mtr_t::memo_push (this=0x2089438ffd00, object=0x64000007d178, type=MTR_MEMO_PAGE_X_FIX) at /data/Server/bb-10.5-marko/storage/innobase/include/mtr0mtr.ic:57

#5  0x000055c213e45123 in buf_page_mtr_lock (block=0x64000007d178, rw_latch=2, mtr=0x2089438ffd00, file=0x55c21482fc20 "/data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc", line=370)

    at /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:2866

#6  0x000055c213e461de in buf_page_get_low (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=10, file=0x55c21482fc20 "/data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc", line=370,

    mtr=0x2089438ffd00, err=0x2089438fe430, allow_ibuf_merge=false) at /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:3413

#7  0x000055c213e4671a in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=10, file=0x55c21482fc20 "/data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc", line=370,

    mtr=0x2089438ffd00, err=0x2089438fe430, allow_ibuf_merge=false) at /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:3482

#8  0x000055c213ad6899 in btr_block_get_func (index=..., page=1160, mode=2, merge=true, file=0x55c21482fc20 "/data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc", line=370, mtr=0x2089438ffd00)

    at /data/Server/bb-10.5-marko/storage/innobase/include/btr0btr.h:237

#9  0x000055c213dfd9ff in btr_cur_latch_leaves (block=0x64000007d178, latch_mode=36, cursor=0x2089438ffb50, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:370

#10 0x000055c213e0333a in btr_cur_search_to_nth_level_func (index=0x616000006ff0, level=0, tuple=0x619000cb93f0, mode=PAGE_CUR_LE, latch_mode=36, cursor=0x2089438ffb50, ahi_latch=0x0,

    file=0x55c21472bfa0 "/data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, mtr=0x2089438ffd00, autoinc=0) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:1848

#11 0x000055c213b407c8 in btr_pcur_open_low (index=0x616000006ff0, level=0, tuple=0x619000cb93f0, mode=PAGE_CUR_LE, latch_mode=36, cursor=0x2089438ffb50,

    file=0x55c21472bfa0 "/data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, autoinc=0, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/include/btr0pcur.ic:441

#12 0x000055c213b4a525 in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x61a000844508, entry_size=77, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0)

    at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3276

…

(rr) when

Current event: 623569

It seems that the same mini-transaction is modifying the change buffer page two times. The recovery failure occurred for one of those 2 log records.

Let us follow the trace in the forward direction from an earlier point of time:

(rr) display

2: /x log_sys.lsn._M_i = 0x890396

(rr) cond 1 log_sys.lsn._M_i>=0xa2d701

(rr) continue

Thread 45 hit Hardware access (read/write) watchpoint 1: *(ulong*)0x64000007d1c0

Value = 9683160

std::__atomic_base<unsigned long>::load (__m=std::memory_order_relaxed, this=0x64000007d1c0) at /usr/include/c++/9/bits/atomic_base.h:419

419		return __atomic_load_n(&_M_i, int(__m));

2: /x log_sys.lsn._M_i = 0xa2d701

(rr) when

Current event: 575927

(rr) frame 26

#26 0x000055c2129b21fd in write_record (thd=thd@entry=0x62b000150218, table=table@entry=0x6190012de598, info=info@entry=0x44a50e105d40, sink=sink@entry=0x0) at /data/Server/bb-10.5-marko/sql/sql_insert.cc:2107

2107	  else if (unlikely((error=table->file->ha_write_row(table->record[0]))))

(rr) p thd.query_string

$6 = {string = {str = 0x62b000157238 "INSERT IGNORE INTO `t1` ( `pk` ) VALUES ( 10090 ) /* E_R Thread6 QNO 1528 CON_ID 21 */", length = 86}, cs = 0x55c21527dc60 <my_charset_latin1>}

This was the last change for which the change buffer page had been written out. From this point onwards, the actions of the recovery must have diverged from the modifications to the page.

Because the LSN range is very wide here, we’d better ignore accesses of block->page.oldest_modification and instead check each change of PAGE_HEAP_TOP of the page.

(rr) break buf_page_read_complete thread 1

Breakpoint 3 at 0x55de45a541b3: file /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc, line 4100.

(rr) disable 2

(rr) reverse-continue

Continuing.

Thread 1 hit Breakpoint 3, buf_page_read_complete (bpage=0x55de4575e8c8 <mtr_t::memo_push(void*, mtr_memo_type_t)+230>, node=...) at /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:4100

4100	{

(rr) n

4101	  const page_id_t id(bpage->id());

(rr)

4117	  const byte *frame= bpage->zip.data

(rr)

4118	    ? bpage->zip.data

(rr)

4117	  const byte *frame= bpage->zip.data

(rr)

4123	  if (!buf_page_decrypt_after_read(bpage, node))

(rr) p/x frame[40]@2

$6 = {0x8, 0xf3}

(rr) set $f=frame

(rr) display/x $f[40]@2

1: /x $f[40]@2 = {0x8, 0xf3}

(rr) dump binary memory recovered-page.bin $f $f+srv_page_size

(rr) enable

(rr) continue

At the time of the failed recovery, PAGE_HEAP_TOP was 0xf4a (3914). At the time of the page flush (as well as at the start of the recovery), the field was 0x8f3 (2291).

The contents of the page only slightly differs between the start of the recovery and what ought to be the corresponding point of logical time before the SIGKILL, in the last 8 bytes of the page. The least 32 bits of the LSN and the page checksum are not updated until the page is about to be written from the buffer pool. I saved the srv_page_size=4096 bytes of dump binary memory and compared the files:

diff -u <(od -Ax -t x1 recover_start.bin) <(od -Ax -t x1 before_kill.bin)

--- /dev/fd/63	2021-05-31 06:10:33.779551414 +0000

+++ /dev/fd/62	2021-05-31 06:10:33.779551414 +0000

@@ -109,5 +109,5 @@

 0008f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

 000fe0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 74

-000ff0 05 b3 03 ff 02 4b 00 65 00 a2 d7 01 8e 48 0a 5d

+000ff0 05 b3 03 ff 02 4b 00 65 00 00 00 00 00 00 00 00

So, we should have a sound starting point for examining the divergence.

Interestingly, near the start of the recovery, we have a page_create_low() that will reset the PAGE_HEAP_TOP to 0x7d (125). The PAGE_HEAP_TOP will grow to 0x8ba (2059) until it is reset in another page_create_low(). It will monotonically grow until we seemingly get a glitch: 3878, 3932, 4055, 3932, 4043, 2305, 2414, …, 3803, 3914, 4025, 3914, (failure).

In the rr replay trace that was terminated by SIGKILL, we can observe the changes of PAGE_HEAP_TOP as follows:

(rr) set $f=block->frame

(rr) display/x $f[40]@2

3: /x $f[40]@2 = {0x8, 0xf3}

(rr) watch -l $f[40]@2

Hardware watchpoint 3: -location $f[40]@2

(rr) c

Continuing.

[Switching to Thread 314150.314741]

Thread 18 hit Hardware watchpoint 3: -location $f[40]@2

Old value = <incomplete sequence \363>

New value = "\000"

__memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:261

261	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.

2: /x log_sys.lsn._M_i = 0xa33520

3: /x $f[40]@2 = {0x0, 0x0}

(rr) command 3

Type commands for breakpoint(s) 3, one per line.

End with a line saying just "end".

>when

>c

>end

(rr) c

Continuing.

(rr) c

Everything seems to be in sync, and also the first monotonicity glitch turns out to be benign. First, the PAGE_HEAP_TOP increases to 4025 because the change buffer tree shrinks due to a change buffer merge during purge (and the records from an adjacent change buffer page are merged to our page). The decerase to 3914 occurs because the very last record of the page is being merged to an index page that is being accessed by dict_stats_analyze_index() (updating persistent statistics).

The real divergence occurs later. I saved some rr replay output of monitoring changes to PAGE_HEAP_TOP and compared them:

diff -u <(sed -ne 's/.*n=\([1-9][0-9]*\)) at.*/\1/p' heap-top-recovered|uniq) <(sed -ne 's/.*n=\([1-9][0-9]*\)) at.*/\1/p' heap-top-write|uniq)

@@ -57,6 +57,25 @@
4055
3932
4043
+234
+…
+2196
2305
2414
2523
@@ -75,3 +94,4 @@
3914
4025
3914
+4023

For some reason, recovery did not log updates of PAGE_HEAP_TOP from 234 to 2196. The last record must be what recovery was supposed to apply, but did not due to the flagged corruption. Let us check whether redo logging was properly enabled during this page_create_low():

Thread 45 hit Hardware watchpoint 3: -location $f[40]@2

Old value = "\017\\"

New value = <incomplete sequence \313>

mach_write_to_2 (b=0x64000047b028 <incomplete sequence \313>, n=4043) at /data/Server/bb-10.5-marko/storage/innobase/include/mach0data.ic:62

62	}

2: /x log_sys.lsn._M_i = 0xa4f621

3: /x $f[40]@2 = {0xf, 0xcb}

Current event: 598722

[Switching to Thread 314150.314325]

Thread 8 hit Hardware watchpoint 3: -location $f[40]@2

Old value = <incomplete sequence \313>

New value = "\000"

__memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:261

261	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.

2: /x log_sys.lsn._M_i = 0xa6230e

3: /x $f[40]@2 = {0x0, 0x0}

Current event: 614654

Thread 8 hit Hardware watchpoint 3: -location $f[40]@2

Old value = "\000"

New value = "\000}"

page_create_low (block=0x64000007d178, comp=false) at /data/Server/bb-10.5-marko/storage/innobase/page/page0page.cc:311

311			memcpy(page + PAGE_DATA, infimum_supremum_redundant,

2: /x log_sys.lsn._M_i = 0xa6230e

3: /x $f[40]@2 = {0x0, 0x7d}

Current event: 614654

That call is within a change buffer page reorganize:

#3  0x000055c213bff4a4 in page_create (block=0x64000007d178, mtr=0x1ff90f650b70, comp=false) at /data/Server/bb-10.5-marko/storage/innobase/page/page0page.cc:331

#4  0x000055c213dd847e in btr_page_reorganize_low (cursor=0x1ff90f64f740, index=0x616000006ff0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:1371

#5  0x000055c213dda7c2 in btr_page_reorganize_block (z_level=6, block=0x64000007d178, index=0x616000006ff0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:1606

#6  0x000055c213dea411 in btr_can_merge_with_page (cursor=0x1ff90f650800, page_no=1160, merge_block=0x1ff90f64f940, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:5167

#7  0x000055c213de1e48 in btr_compress (cursor=0x1ff90f650800, adjust=0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:3483

#8  0x000055c213e100d7 in btr_cur_compress_if_useful (cursor=0x1ff90f650800, adjust=0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:5468

#9  0x000055c213e11b57 in btr_cur_pessimistic_delete (err=0x1ff90f650690, has_reserved_extents=1, cursor=0x1ff90f650800, flags=0, rollback=false, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:5908

#10 0x000055c213b4e8e9 in ibuf_delete_rec (page_id=..., pcur=0x1ff90f650800, search_tuple=0x616000874bf0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:4125

#11 0x000055c213b50024 in ibuf_merge_or_delete_for_page (block=0x640000043928, page_id=..., zip_size=0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:4471

Starting with MDEV-21725, the page reorganize is being logged in an optimized way, instead of being logged as individual inserts. Before ~~MDEV-12353~~, the operation was covered by a logical log record.

Immediately after the page reorganize operation, the page contents between recovery and the original write differ as follows:

diff -u <(od -Ax -t x1 /dev/shm/r-after-reorg.bin) <(od -Ax -t x1 /dev/shm/w-after-reorg.bin)

--- /dev/fd/63	2021-05-31 08:20:19.561244871 +0000

+++ /dev/fd/62	2021-05-31 08:20:19.561244871 +0000

@@ -1,5 +1,5 @@

 000000 00 00 00 00 00 00 04 88 00 00 04 89 00 00 04 87

-000010 00 00 00 00 00 a2 d7 01 45 bf 00 00 00 00 00 00

+000010 00 00 00 00 00 a5 07 15 45 bf 00 00 00 00 00 00

 000020 00 00 00 00 00 00 00 05 08 94 00 15 00 00 00 00

 000030 08 33 00 02 00 12 00 13 00 00 00 00 00 00 21 5b

 000040 00 00 ff ff ff ff 00 00 00 00 ff ff ff ff 00 00

@@ -112,5 +112,5 @@

 0008a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

 000fe0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 74

-000ff0 05 38 03 84 01 d0 00 65 00 a2 d7 01 8e 48 0a 5d

+000ff0 05 38 03 84 01 d0 00 65 00 00 00 00 00 00 00 00

On recovery, the FIL_PAGE_LSN is 0xa2d701, which was the initial FIL_PAGE_LSN of the page. The field might only be updated by recovery after all log has been applied to the page. We have the LSN and checksum difference at the end of the page as well. It is nothing to worry about.

Similarly, the page contents immediately before the failed log apply are equivalent:

@@ -1,5 +1,5 @@

 000000 00 00 00 00 00 00 04 88 00 00 04 89 ff ff ff ff

-000010 00 00 00 00 00 a2 d7 01 45 bf 00 00 00 00 00 00

+000010 00 00 00 00 00 a6 49 aa 45 bf 00 00 00 00 00 00

 000020 00 00 00 00 00 00 00 0a 0f 4a 00 25 0e ea 00 6f

 000030 00 00 00 02 00 23 00 22 00 00 00 00 00 00 22 d5

 000040 00 00 ff ff ff ff 00 00 00 00 ff ff ff ff 00 00

@@ -195,5 +195,5 @@

 000ef0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

 000fe0 00 00 00 00 00 74 0d bc 0c 08 0a 54 08 a0 06 ec

-000ff0 05 38 03 84 01 d0 00 65 00 a2 d7 01 8e 48 0a 5d

+000ff0 05 38 03 84 01 d0 00 65 00 00 00 00 00 00 00 00

So, the problem is only with the last record. We must look closer at this:

Current event: 630234

(rr) bt

#0  mach_write_to_2 (b=0x64000047b028 "\017\267", n=4023) at /data/Server/bb-10.5-marko/storage/innobase/include/mach0data.ic:62

#1  0x000055c213bf9f52 in page_mem_alloc_heap<>(buf_block_t *, ulint, ulint *) (block=0x64000007d178, need=109, heap_no=0x1f9a2c2ca1d0) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1071

#2  0x000055c213bf1062 in page_cur_insert_rec_low (cur=0x1f9a2c2cab58, index=0x616000006ff0, rec=0x6110003ae9fc "", offsets=0x61600066e1f0, mtr=0x1f9a2c2cad00) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1388

#3  0x000055c213dfb53d in page_cur_tuple_insert (cursor=0x1f9a2c2cab58, tuple=0x61900228cbf0, index=0x616000006ff0, offsets=0x1f9a2c2caab0, heap=0x1f9a2c2caa90, n_ext=0, mtr=0x1f9a2c2cad00) at /data/Server/bb-10.5-marko/storage/innobase/include/page0cur.ic:285

#4  0x000055c213e09bfd in btr_cur_optimistic_insert (flags=3, cursor=0x1f9a2c2cab50, offsets=0x1f9a2c2caab0, heap=0x1f9a2c2caa90, entry=0x61900228cbf0, rec=0x1f9a2c2caaf0, big_rec=0x1f9a2c2caa70, n_ext=0, thr=0x621004e35de0, mtr=0x1f9a2c2cad00) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:3562

#5  0x000055c213b4aad5 in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x61a000844508, entry_size=77, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3403

#6  0x000055c213b4b2f7 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x61a000844508, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3603

A glaring difference is that reuse=false becomes reuse=true on recovery:

…

(rr) step

1595	      mtr->page_insert(*block, reuse,

(rr) step

mtr_t::page_insert (this=0x64000047bf52, block=..., reuse=false, prev_rec=1, info_bits=0 '\000', n_fields_s=94292043288653, hdr_c=5, data_c=8, hdr=0x64000047bf4a "aY\031\t\005\004", hdr_l=1, data=0x6110003aea04 "\200", data_l=89) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1103

The actual mystery is: why the log that was written differs from what was buffered by recovery:

{0x20, 0x55, 0x0, 0x84, 0x8, 0x4, 0x8c, 0xd7, 0x2c, 0x5, 0x8, 0x61, 0x80, 0x0, 0x0, 0x0, 0x1, 0x83, 0xfe, 0x0, 0x40, 0x80, 0x3f, 0x86, 0x8, 0x0, 0x8, 0x80, 0x0, 0x62, 0x0 <repeats 69 times>, 0xf8, 0xfd}

{0x20, 0x53, 0x0, 0x84, 0x8, 0x5, 0x89, 0xdc, 0x2c, 0x5, 0xa, 0x61, 0x2, 0x0, 0x1, 0x83, 0xfe, 0x0, 0x40, 0x80, 0x3f, 0x86, 0x8, 0x0, 0x8, 0x80, 0x0, 0x66, 0x0 <repeats 69 times>, 0xf8, 0xfa}

The least significant bit of the first differing byte encodes the reuse flag. Let us check which string was present in the ib_logfile0 before recovery was started. (Note: this will miss records that would span the block boundary.)

od -Ax -t x1 -w512 ib_logfile0 |grep '20 55 00 84 08 04 8c d7 2c'|cut -d\  -f1

od -Ax -t x1 -w512 ib_logfile0 |grep '20 53 00 84 08 05 89 dc 2c'|cut -d\  -f1

Surprisingly, we find exactly one match for both. The first match occurs at the end of the 512-byte block that starts at file offset 0xa70800. The last 4 bytes are a log block checksum:

20 55 00 84 08 04 8c d7 2c 05 08 61 80 00 00 15 84 96 bc

The record continues after the 4-byte checksum and the 12-byte header of the following block:

00 01 83 fe 00 40 80 3f 86 08 00 08 80 00 62 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f8 fd b2 27 e2 00

This looks like an exact match of the record that we wrote. Before the NUL byte that marks the end of the minitransaction, there is another record: a WRITE to the same page.
Now, let us look at the other match, which looks like what recovery was reading. It is located in the 512-byte block starting at byte offset 0xa68c00 of ib_logfile0:

od -Ax -t x1 -w512 -N 99 -j 0xa68c4a ib_logfile0

a68c4a 20 53 00 84 08 05 89 dc 2c 05 0a 61 02 00 01 83 fe 00 40 80 3f 86 08 00 08 80 00 66 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f8 fa

It is an exact match of the record that was read by recovery! The problem appears to be in the write side after all. I must debug further to find out when those bytes were written, and for which mini-transaction.

Marko Mäkelä added a comment - 2021-05-31 10:23 Below, I am documenting my workflow at finding the root cause of this failure. Hopefully it will serve an educational purpose. I believe that with rr replay traces of the killed server and misbehaving recovered server, as well as a copy of the data directory before the recovery was started, any recovery bug can be diagnosed. When we fail to apply the log record, it is for page 1160 in the system tablespace. We can also determine the LSN: #2 0x000055de458007c8 in page_apply_insert_redundant (block=..., reuse=true, prev=2652, enc_hdr=44, hdr_c=11, data_c=10, data=0x25592084fe0d, data_len=88) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:2321 2321 ib::error() << (reuse (rr) p/x block.frame[16]@8 $1 = {0x0, 0x0, 0x0, 0x0, 0x0, 0xa2, 0xd7, 0x1} (rr) up #3 0x000055de457b21e1 in log_phys_t::apply (this=0x25592084fde8, block=..., last_offset=@0x606001178b6c: 0) at /data/Server/bb-10.5-marko/storage/innobase/log/log0recv.cc:397 397 if (page_apply_insert_redundant(block, subtype & 1, prev_rec, (rr) p/x *this $2 = {<log_rec_t> = {next = 0x25592084fe68, lsn = 0xa6a6b2}, start_lsn = 0xa6a64a} Based on this recovery failure, for the rr replay execution trace of the server that ended in SIGKILL we must check every modification to the page between LSN 0xa2d701 and 0xa6a6b2. The log record that we failed to apply had been written by the following (and I do not yet conclude that there was anything wrong with those writes): #7 0x000055c213bccd96 in mtr_t::commit (this=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/mtr/mtr0mtr.cc:442 #8 0x000055c213b3f878 in ibuf_mtr_commit (mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/include/ibuf0ibuf.ic:64 #9 0x000055c213b4ae1b in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x61a000844508, entry_size=77, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3466 #10 0x000055c213b4b2f7 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x61a000844508, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3603 #11 0x000055c213e02647 in btr_cur_search_to_nth_level_func (index=0x6160000f03f0, level=0, tuple=0x61a000844508, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x208943901ab0, ahi_latch=0x0, file=0x55c214795e40 "/data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc", line=2974, mtr=0x208943901e70, autoinc=0) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:1650 #12 0x000055c213c8b3e2 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x6160000f03f0, offsets_heap=0x619000cb9d80, heap=0x619000cb8e80, entry=0x61a000844508, trx_id=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:2971 #13 0x000055c213c8c59f in row_ins_sec_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0, check_foreign=true) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3294 #14 0x000055c213c8c784 in row_ins_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3342 #15 0x000055c213c8d14d in row_ins_index_entry_step (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3509 #16 0x000055c213c8d6b6 in row_ins (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3668 #17 0x000055c213c8de86 in row_ins_step (thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3818 #18 0x000055c213cb83ea in row_insert_for_mysql (mysql_rec=0x61f0001f80b8 "\377f", prebuilt=0x6220002b2170, ins_mode=ROW_INS_NORMAL) at /data/Server/bb-10.5-marko/storage/innobase/row/row0mysql.cc:1419 #19 0x000055c213a3b359 in ha_innobase::write_row (this=0x61d0008638b8, record=0x61f0001f80b8 "\377f") at /data/Server/bb-10.5-marko/storage/innobase/handler/ha_innodb.cc:7669 … (rr) frame 21 #21 0x000055c2129b21fd in write_record (thd=thd@entry=0x62b0000af218, table=table@entry=0x619000bad198, info=info@entry=0x208943903d40, sink=sink@entry=0x0) at /data/Server/bb-10.5-marko/sql/sql_insert.cc:2107 2107 else if (unlikely((error=table->file->ha_write_row(table->record[0])))) (rr) p thd.query_string $5 = {string = {str = 0x62b0000b6238 "INSERT IGNORE INTO `t1` ( `pk`, `col_char` ) VALUES ( NULL, 'f' ) /* E_R Thread1 QNO 1506 CON_ID 16 */", length = 102}, cs = 0x55c21527dc60 <my_charset_latin1>} (rr) display 2: /x log_sys.lsn._M_i = 0xa6a6b2 (rr) when Current event: 623570 (rr) reverse-continue Continuing. Thread 18 hit Hardware access (read/write) watchpoint 1: *(ulong*)0x64000007d1c0 Value = 9683160 0x000055c213b3a125 in std::__atomic_base<unsigned long>::load (__m=std::memory_order_relaxed, this=0x64000007d1c0) at /usr/include/c++/9/bits/atomic_base.h:419 419 return __atomic_load_n(&_M_i, int(__m)); 2: /x log_sys.lsn._M_i = 0xa6a64a (rr) bt #0 0x000055c213b3a125 in std::__atomic_base<unsigned long>::load (__m=std::memory_order_relaxed, this=0x64000007d1c0) at /usr/include/c++/9/bits/atomic_base.h:419 #1 Atomic_counter<unsigned long>::operator unsigned long (this=0x64000007d1c0) at /data/Server/bb-10.5-marko/include/my_counter.h:45 #2 0x000055c213b39ac2 in buf_page_t::oldest_modification (this=0x64000007d178) at /data/Server/bb-10.5-marko/storage/innobase/include/buf0buf.h:936 #3 0x000055c213b537da in mtr_t::is_block_dirtied (block=0x64000007d178) at /data/Server/bb-10.5-marko/storage/innobase/include/mtr0mtr.ic:35 #4 0x000055c213b5385c in mtr_t::memo_push (this=0x2089438ffd00, object=0x64000007d178, type=MTR_MEMO_PAGE_X_FIX) at /data/Server/bb-10.5-marko/storage/innobase/include/mtr0mtr.ic:57 #5 0x000055c213e45123 in buf_page_mtr_lock (block=0x64000007d178, rw_latch=2, mtr=0x2089438ffd00, file=0x55c21482fc20 "/data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc", line=370) at /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:2866 #6 0x000055c213e461de in buf_page_get_low (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=10, file=0x55c21482fc20 "/data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc", line=370, mtr=0x2089438ffd00, err=0x2089438fe430, allow_ibuf_merge=false) at /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:3413 #7 0x000055c213e4671a in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=10, file=0x55c21482fc20 "/data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc", line=370, mtr=0x2089438ffd00, err=0x2089438fe430, allow_ibuf_merge=false) at /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:3482 #8 0x000055c213ad6899 in btr_block_get_func (index=..., page=1160, mode=2, merge=true, file=0x55c21482fc20 "/data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc", line=370, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/include/btr0btr.h:237 #9 0x000055c213dfd9ff in btr_cur_latch_leaves (block=0x64000007d178, latch_mode=36, cursor=0x2089438ffb50, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:370 #10 0x000055c213e0333a in btr_cur_search_to_nth_level_func (index=0x616000006ff0, level=0, tuple=0x619000cb93f0, mode=PAGE_CUR_LE, latch_mode=36, cursor=0x2089438ffb50, ahi_latch=0x0, file=0x55c21472bfa0 "/data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, mtr=0x2089438ffd00, autoinc=0) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:1848 #11 0x000055c213b407c8 in btr_pcur_open_low (index=0x616000006ff0, level=0, tuple=0x619000cb93f0, mode=PAGE_CUR_LE, latch_mode=36, cursor=0x2089438ffb50, file=0x55c21472bfa0 "/data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, autoinc=0, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/include/btr0pcur.ic:441 #12 0x000055c213b4a525 in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x61a000844508, entry_size=77, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3276 … (rr) when Current event: 623569 It seems that the same mini-transaction is modifying the change buffer page two times. The recovery failure occurred for one of those 2 log records. Let us follow the trace in the forward direction from an earlier point of time: (rr) display 2: /x log_sys.lsn._M_i = 0x890396 (rr) cond 1 log_sys.lsn._M_i>=0xa2d701 (rr) continue Thread 45 hit Hardware access (read/write) watchpoint 1: *(ulong*)0x64000007d1c0 Value = 9683160 std::__atomic_base<unsigned long>::load (__m=std::memory_order_relaxed, this=0x64000007d1c0) at /usr/include/c++/9/bits/atomic_base.h:419 419 return __atomic_load_n(&_M_i, int(__m)); 2: /x log_sys.lsn._M_i = 0xa2d701 (rr) when Current event: 575927 (rr) frame 26 #26 0x000055c2129b21fd in write_record (thd=thd@entry=0x62b000150218, table=table@entry=0x6190012de598, info=info@entry=0x44a50e105d40, sink=sink@entry=0x0) at /data/Server/bb-10.5-marko/sql/sql_insert.cc:2107 2107 else if (unlikely((error=table->file->ha_write_row(table->record[0])))) (rr) p thd.query_string $6 = {string = {str = 0x62b000157238 "INSERT IGNORE INTO `t1` ( `pk` ) VALUES ( 10090 ) /* E_R Thread6 QNO 1528 CON_ID 21 */", length = 86}, cs = 0x55c21527dc60 <my_charset_latin1>} This was the last change for which the change buffer page had been written out. From this point onwards, the actions of the recovery must have diverged from the modifications to the page. Because the LSN range is very wide here, we’d better ignore accesses of block->page.oldest_modification and instead check each change of PAGE_HEAP_TOP of the page. (rr) break buf_page_read_complete thread 1 Breakpoint 3 at 0x55de45a541b3: file /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc, line 4100. (rr) disable 2 (rr) reverse-continue Continuing. Thread 1 hit Breakpoint 3, buf_page_read_complete (bpage=0x55de4575e8c8 <mtr_t::memo_push(void*, mtr_memo_type_t)+230>, node=...) at /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:4100 4100 { (rr) n 4101 const page_id_t id(bpage->id()); (rr) 4117 const byte *frame= bpage->zip.data (rr) 4118 ? bpage->zip.data (rr) 4117 const byte *frame= bpage->zip.data (rr) 4123 if (!buf_page_decrypt_after_read(bpage, node)) (rr) p/x frame[40]@2 $6 = {0x8, 0xf3} (rr) set $f=frame (rr) display/x $f[40]@2 1: /x $f[40]@2 = {0x8, 0xf3} (rr) dump binary memory recovered-page.bin $f $f+srv_page_size (rr) enable (rr) continue At the time of the failed recovery, PAGE_HEAP_TOP was 0xf4a (3914). At the time of the page flush (as well as at the start of the recovery), the field was 0x8f3 (2291). The contents of the page only slightly differs between the start of the recovery and what ought to be the corresponding point of logical time before the SIGKILL, in the last 8 bytes of the page. The least 32 bits of the LSN and the page checksum are not updated until the page is about to be written from the buffer pool. I saved the srv_page_size=4096 bytes of dump binary memory and compared the files: diff -u <(od -Ax -t x1 recover_start.bin) <(od -Ax -t x1 before_kill.bin) --- /dev/fd/63 2021-05-31 06:10:33.779551414 +0000 +++ /dev/fd/62 2021-05-31 06:10:33.779551414 +0000 @@ -109,5 +109,5 @@ 0008f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 000fe0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 74 -000ff0 05 b3 03 ff 02 4b 00 65 00 a2 d7 01 8e 48 0a 5d +000ff0 05 b3 03 ff 02 4b 00 65 00 00 00 00 00 00 00 00 001000 So, we should have a sound starting point for examining the divergence. Interestingly, near the start of the recovery, we have a page_create_low() that will reset the PAGE_HEAP_TOP to 0x7d (125). The PAGE_HEAP_TOP will grow to 0x8ba (2059) until it is reset in another page_create_low() . It will monotonically grow until we seemingly get a glitch: 3878, 3932, 4055, 3932, 4043, 2305 , 2414, …, 3803, 3914, 4025 , 3914, (failure). In the rr replay trace that was terminated by SIGKILL, we can observe the changes of PAGE_HEAP_TOP as follows: (rr) set $f=block->frame (rr) display/x $f[40]@2 3: /x $f[40]@2 = {0x8, 0xf3} (rr) watch -l $f[40]@2 Hardware watchpoint 3: -location $f[40]@2 (rr) c Continuing. [Switching to Thread 314150.314741] Thread 18 hit Hardware watchpoint 3: -location $f[40]@2 Old value = <incomplete sequence \363> New value = "\000" __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:261 261 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory. 2: /x log_sys.lsn._M_i = 0xa33520 3: /x $f[40]@2 = {0x0, 0x0} (rr) command 3 Type commands for breakpoint(s) 3, one per line. End with a line saying just "end". >when >c >end (rr) c Continuing. (rr) c Everything seems to be in sync, and also the first monotonicity glitch turns out to be benign. First, the PAGE_HEAP_TOP increases to 4025 because the change buffer tree shrinks due to a change buffer merge during purge (and the records from an adjacent change buffer page are merged to our page). The decerase to 3914 occurs because the very last record of the page is being merged to an index page that is being accessed by dict_stats_analyze_index() (updating persistent statistics). The real divergence occurs later. I saved some rr replay output of monitoring changes to PAGE_HEAP_TOP and compared them: diff -u <(sed -ne 's/.*n=$[1-9][0-9]*$) at.*/\1/p' heap-top-recovered|uniq) <(sed -ne 's/.*n=$[1-9][0-9]*$) at.*/\1/p' heap-top-write|uniq) @@ -57,6 +57,25 @@ 4055 3932 4043 +234 +… +2196 2305 2414 2523 @@ -75,3 +94,4 @@ 3914 4025 3914 +4023 For some reason, recovery did not log updates of PAGE_HEAP_TOP from 234 to 2196. The last record must be what recovery was supposed to apply, but did not due to the flagged corruption. Let us check whether redo logging was properly enabled during this page_create_low() : Thread 45 hit Hardware watchpoint 3: -location $f[40]@2 Old value = "\017\\" New value = <incomplete sequence \313> mach_write_to_2 (b=0x64000047b028 <incomplete sequence \313>, n=4043) at /data/Server/bb-10.5-marko/storage/innobase/include/mach0data.ic:62 62 } 2: /x log_sys.lsn._M_i = 0xa4f621 3: /x $f[40]@2 = {0xf, 0xcb} Current event: 598722 [Switching to Thread 314150.314325] Thread 8 hit Hardware watchpoint 3: -location $f[40]@2 Old value = <incomplete sequence \313> New value = "\000" __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:261 261 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory. 2: /x log_sys.lsn._M_i = 0xa6230e 3: /x $f[40]@2 = {0x0, 0x0} Current event: 614654 Thread 8 hit Hardware watchpoint 3: -location $f[40]@2 Old value = "\000" New value = "\000}" page_create_low (block=0x64000007d178, comp=false) at /data/Server/bb-10.5-marko/storage/innobase/page/page0page.cc:311 311 memcpy(page + PAGE_DATA, infimum_supremum_redundant, 2: /x log_sys.lsn._M_i = 0xa6230e 3: /x $f[40]@2 = {0x0, 0x7d} Current event: 614654 That call is within a change buffer page reorganize: #3 0x000055c213bff4a4 in page_create (block=0x64000007d178, mtr=0x1ff90f650b70, comp=false) at /data/Server/bb-10.5-marko/storage/innobase/page/page0page.cc:331 #4 0x000055c213dd847e in btr_page_reorganize_low (cursor=0x1ff90f64f740, index=0x616000006ff0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:1371 #5 0x000055c213dda7c2 in btr_page_reorganize_block (z_level=6, block=0x64000007d178, index=0x616000006ff0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:1606 #6 0x000055c213dea411 in btr_can_merge_with_page (cursor=0x1ff90f650800, page_no=1160, merge_block=0x1ff90f64f940, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:5167 #7 0x000055c213de1e48 in btr_compress (cursor=0x1ff90f650800, adjust=0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:3483 #8 0x000055c213e100d7 in btr_cur_compress_if_useful (cursor=0x1ff90f650800, adjust=0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:5468 #9 0x000055c213e11b57 in btr_cur_pessimistic_delete (err=0x1ff90f650690, has_reserved_extents=1, cursor=0x1ff90f650800, flags=0, rollback=false, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:5908 #10 0x000055c213b4e8e9 in ibuf_delete_rec (page_id=..., pcur=0x1ff90f650800, search_tuple=0x616000874bf0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:4125 #11 0x000055c213b50024 in ibuf_merge_or_delete_for_page (block=0x640000043928, page_id=..., zip_size=0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:4471 Starting with MDEV-21725 , the page reorganize is being logged in an optimized way, instead of being logged as individual inserts. Before MDEV-12353 , the operation was covered by a logical log record. Immediately after the page reorganize operation, the page contents between recovery and the original write differ as follows: diff -u <(od -Ax -t x1 /dev/shm/r-after-reorg.bin) <(od -Ax -t x1 /dev/shm/w-after-reorg.bin) --- /dev/fd/63 2021-05-31 08:20:19.561244871 +0000 +++ /dev/fd/62 2021-05-31 08:20:19.561244871 +0000 @@ -1,5 +1,5 @@ 000000 00 00 00 00 00 00 04 88 00 00 04 89 00 00 04 87 -000010 00 00 00 00 00 a2 d7 01 45 bf 00 00 00 00 00 00 +000010 00 00 00 00 00 a5 07 15 45 bf 00 00 00 00 00 00 000020 00 00 00 00 00 00 00 05 08 94 00 15 00 00 00 00 000030 08 33 00 02 00 12 00 13 00 00 00 00 00 00 21 5b 000040 00 00 ff ff ff ff 00 00 00 00 ff ff ff ff 00 00 @@ -112,5 +112,5 @@ 0008a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 000fe0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 74 -000ff0 05 38 03 84 01 d0 00 65 00 a2 d7 01 8e 48 0a 5d +000ff0 05 38 03 84 01 d0 00 65 00 00 00 00 00 00 00 00 001000 On recovery, the FIL_PAGE_LSN is 0xa2d701, which was the initial FIL_PAGE_LSN of the page. The field might only be updated by recovery after all log has been applied to the page. We have the LSN and checksum difference at the end of the page as well. It is nothing to worry about. Similarly, the page contents immediately before the failed log apply are equivalent: @@ -1,5 +1,5 @@ 000000 00 00 00 00 00 00 04 88 00 00 04 89 ff ff ff ff -000010 00 00 00 00 00 a2 d7 01 45 bf 00 00 00 00 00 00 +000010 00 00 00 00 00 a6 49 aa 45 bf 00 00 00 00 00 00 000020 00 00 00 00 00 00 00 0a 0f 4a 00 25 0e ea 00 6f 000030 00 00 00 02 00 23 00 22 00 00 00 00 00 00 22 d5 000040 00 00 ff ff ff ff 00 00 00 00 ff ff ff ff 00 00 @@ -195,5 +195,5 @@ 000ef0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 000fe0 00 00 00 00 00 74 0d bc 0c 08 0a 54 08 a0 06 ec -000ff0 05 38 03 84 01 d0 00 65 00 a2 d7 01 8e 48 0a 5d +000ff0 05 38 03 84 01 d0 00 65 00 00 00 00 00 00 00 00 001000 So, the problem is only with the last record. We must look closer at this: Current event: 630234 (rr) bt #0 mach_write_to_2 (b=0x64000047b028 "\017\267", n=4023) at /data/Server/bb-10.5-marko/storage/innobase/include/mach0data.ic:62 #1 0x000055c213bf9f52 in page_mem_alloc_heap<>(buf_block_t *, ulint, ulint *) (block=0x64000007d178, need=109, heap_no=0x1f9a2c2ca1d0) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1071 #2 0x000055c213bf1062 in page_cur_insert_rec_low (cur=0x1f9a2c2cab58, index=0x616000006ff0, rec=0x6110003ae9fc "", offsets=0x61600066e1f0, mtr=0x1f9a2c2cad00) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1388 #3 0x000055c213dfb53d in page_cur_tuple_insert (cursor=0x1f9a2c2cab58, tuple=0x61900228cbf0, index=0x616000006ff0, offsets=0x1f9a2c2caab0, heap=0x1f9a2c2caa90, n_ext=0, mtr=0x1f9a2c2cad00) at /data/Server/bb-10.5-marko/storage/innobase/include/page0cur.ic:285 #4 0x000055c213e09bfd in btr_cur_optimistic_insert (flags=3, cursor=0x1f9a2c2cab50, offsets=0x1f9a2c2caab0, heap=0x1f9a2c2caa90, entry=0x61900228cbf0, rec=0x1f9a2c2caaf0, big_rec=0x1f9a2c2caa70, n_ext=0, thr=0x621004e35de0, mtr=0x1f9a2c2cad00) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:3562 #5 0x000055c213b4aad5 in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x61a000844508, entry_size=77, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3403 #6 0x000055c213b4b2f7 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x61a000844508, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3603 A glaring difference is that reuse=false becomes reuse=true on recovery: … (rr) step 1595 mtr->page_insert(*block, reuse, (rr) step mtr_t::page_insert (this=0x64000047bf52, block=..., reuse=false, prev_rec=1, info_bits=0 '\000', n_fields_s=94292043288653, hdr_c=5, data_c=8, hdr=0x64000047bf4a "aY\031\t\005\004", hdr_l=1, data=0x6110003aea04 "\200", data_l=89) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1103 The actual mystery is: why the log that was written differs from what was buffered by recovery: {0x20, 0x55, 0x0, 0x84, 0x8, 0x4, 0x8c, 0xd7, 0x2c, 0x5, 0x8, 0x61, 0x80, 0x0, 0x0, 0x0, 0x1, 0x83, 0xfe, 0x0, 0x40, 0x80, 0x3f, 0x86, 0x8, 0x0, 0x8, 0x80, 0x0, 0x62, 0x0 <repeats 69 times>, 0xf8, 0xfd} {0x20, 0x53, 0x0, 0x84, 0x8, 0x5, 0x89, 0xdc, 0x2c, 0x5, 0xa, 0x61, 0x2, 0x0, 0x1, 0x83, 0xfe, 0x0, 0x40, 0x80, 0x3f, 0x86, 0x8, 0x0, 0x8, 0x80, 0x0, 0x66, 0x0 <repeats 69 times>, 0xf8, 0xfa} The least significant bit of the first differing byte encodes the reuse flag. Let us check which string was present in the ib_logfile0 before recovery was started. (Note: this will miss records that would span the block boundary.) od -Ax -t x1 -w512 ib_logfile0 |grep '20 55 00 84 08 04 8c d7 2c'|cut -d\ -f1 od -Ax -t x1 -w512 ib_logfile0 |grep '20 53 00 84 08 05 89 dc 2c'|cut -d\ -f1 Surprisingly, we find exactly one match for both. The first match occurs at the end of the 512-byte block that starts at file offset 0xa70800. The last 4 bytes are a log block checksum: 20 55 00 84 08 04 8c d7 2c 05 08 61 80 00 00 15 84 96 bc The record continues after the 4-byte checksum and the 12-byte header of the following block: 00 01 83 fe 00 40 80 3f 86 08 00 08 80 00 62 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f8 fd b2 27 e2 00 This looks like an exact match of the record that we wrote. Before the NUL byte that marks the end of the minitransaction, there is another record: a WRITE to the same page. Now, let us look at the other match, which looks like what recovery was reading. It is located in the 512-byte block starting at byte offset 0xa68c00 of ib_logfile0 : od -Ax -t x1 -w512 -N 99 -j 0xa68c4a ib_logfile0 a68c4a 20 53 00 84 08 05 89 dc 2c 05 0a 61 02 00 01 83 fe 00 40 80 3f 86 08 00 08 80 00 66 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f8 fa It is an exact match of the record that was read by recovery! The problem appears to be in the write side after all. I must debug further to find out when those bytes were written, and for which mini-transaction.

Marko Mäkelä added a comment - 2021-05-31 12:00

Because no wrap-around of the redo log has taken place (the payload of ib_logfile0 file ends at 0xa99000), it should be rather simple to map the LSN to byte offsets in the file. Right before the SIGKILL, we got the following writes to the log file:

Thread 23 hit Breakpoint 2, log_write_buf (buf=0x6d6302ac9000 "\200", len=512, start_lsn=11115520, new_data_offset=376) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562

562		ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0);

Current event: 670464

[Switching to Thread 314150.340446]

Thread 1 hit Breakpoint 2, log_write_buf (buf=0x4b191636c000 "\200", len=2048, start_lsn=11115520, new_data_offset=430) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562

562		ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0);

Current event: 670870

[Switching to Thread 314150.340452]

Thread 31 hit Breakpoint 2, log_write_buf (buf=0x6d6302ac9000 "\200", len=512, start_lsn=11117056, new_data_offset=116) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562

562		ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0);

Current event: 671278

[Switching to Thread 314150.340446]

Thread 1 hit Breakpoint 2, log_write_buf (buf=0x4b191636c000 "\200", len=1024, start_lsn=11117056, new_data_offset=312) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562

562		ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0);

Current event: 671593

Thread 18 received signal SIGKILL, Killed.

At the time of the SIGKILL, the log_sys.lsn was 0xa9a526. The last start_lsn above is 0xa9a200, and the log_sys.write_lsn is 0xa9a526. It looks like the LSN is the file byte offset plus 0x1a00 (6656). Hence, the record for LSN 0xa723ed should be in the file at offset 0xa709ed. The unexpected record at 0xa68c4a must correspond to the LSN 0xa6a64a. Here we have it:

1: /x log_sys.lsn._M_i = 0xa6a64a

Current event: 623570

(rr) bt

#0  mtr_t::finish_write (this=0x55c213ae4877 <ilist<mtr_buf_t::block_t, void>::back() const+217>, len=35773916118888) at /data/Server/bb-10.5-marko/storage/innobase/mtr/mtr0mtr.cc:863

#1  0x000055c213bcc798 in mtr_t::commit (this=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/mtr/mtr0mtr.cc:408

#2  0x000055c213b3f878 in ibuf_mtr_commit (mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/include/ibuf0ibuf.ic:64

#3  0x000055c213b4ae1b in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x61a000844508, entry_size=77, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3466

#4  0x000055c213b4b2f7 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x61a000844508, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3603

#5  0x000055c213e02647 in btr_cur_search_to_nth_level_func (index=0x6160000f03f0, level=0, tuple=0x61a000844508, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x208943901ab0, ahi_latch=0x0, file=0x55c214795e40 "/data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc", line=2974, mtr=0x208943901e70, autoinc=0)

    at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:1650

#6  0x000055c213c8b3e2 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x6160000f03f0, offsets_heap=0x619000cb9d80, heap=0x619000cb8e80, entry=0x61a000844508, trx_id=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:2971

#7  0x000055c213c8c59f in row_ins_sec_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0, check_foreign=true) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3294

#8  0x000055c213c8c784 in row_ins_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3342

#9  0x000055c213c8d14d in row_ins_index_entry_step (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3509

#10 0x000055c213c8d6b6 in row_ins (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3668

#11 0x000055c213c8de86 in row_ins_step (thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3818

#12 0x000055c213cb83ea in row_insert_for_mysql (mysql_rec=0x61f0001f80b8 "\377f", prebuilt=0x6220002b2170, ins_mode=ROW_INS_NORMAL) at /data/Server/bb-10.5-marko/storage/innobase/row/row0mysql.cc:1419

#13 0x000055c213a3b359 in ha_innobase::write_row (this=0x61d0008638b8, record=0x61f0001f80b8 "\377f") at /data/Server/bb-10.5-marko/storage/innobase/handler/ha_innodb.cc:7669

This was not found during my initial search, because I only watched the changes of the header field PAGE_HEAP_TOP. A little before the mtr_t::commit(), we actually did write the record that recovery failed to apply (note: reuse=true):

#0  mtr_t::page_insert (this=0x2089438ffd00, block=..., reuse=true, prev_rec=2652, info_bits=0 '\000', n_fields_s=13, hdr_c=5, data_c=10, hdr=0x64000047bedb "aY\031\t\005\004", hdr_l=1, data=0x611000383e46 "\002", data_l=87) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1118

#1  0x000055c213bf2303 in page_cur_insert_rec_low (cur=0x2089438ffb58, index=0x616000006ff0, rec=0x611000383e3c "", offsets=0x61600006a2f0, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1595

Now, let us compare the ‘before’ images of the page, before this mini-transaction was started, and before we failed to apply the log record:

diff -u <(od -Ax -t x1 recovery-before-last.bin) <(od -Ax -t x1 write-before.bin)

--- /dev/fd/63	2021-05-31 11:43:47.758752427 +0000

+++ /dev/fd/62	2021-05-31 11:43:47.758752427 +0000

@@ -1,5 +1,5 @@

 000000 00 00 00 00 00 00 04 88 00 00 04 89 ff ff ff ff

-000010 00 00 00 00 00 a2 d7 01 45 bf 00 00 00 00 00 00

+000010 00 00 00 00 00 a6 4e aa 45 bf 00 00 00 00 00 00

 000020 00 00 00 00 00 00 00 0a 0f 4a 00 25 0e ea 00 6f

 000030 00 00 00 02 00 23 00 22 00 00 00 00 00 00 22 d5

 000040 00 00 ff ff ff ff 00 00 00 00 ff ff ff ff 00 00

@@ -195,5 +195,5 @@

 000ef0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

 000fe0 00 00 00 00 00 74 0d bc 0c 08 0a 54 08 a0 06 ec

-000ff0 05 38 03 84 01 d0 00 65 00 a2 d7 01 8e 48 0a 5d

+000ff0 05 38 03 84 01 d0 00 65 00 00 00 00 00 00 00 00

Again, only the LSN and checksum fields differ. Recovery failed, because the reuse of the record at the top of the PAGE_FREE stack exceeded the PAGE_HEAP_TOP 0xf4a. Let us check if that actually was the case. The PAGE_FREE points to 0xeea, which is the only record in the PAGE_FREE stack. PAGE_GARBAGE is 0x6f (111) bytes. At the time of writing a log record for the write, we notably have a negative free_offset:

0x000055c213bf22fe in page_cur_insert_rec_low (cur=0x2089438ffb58, index=0x616000006ff0, rec=0x611000383e3c "", offsets=0x61600006a2f0, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1595

1595	      mtr->page_insert(*block, reuse,

1: /x log_sys.lsn._M_i = 0xa6a64a

(rr) i lo

r = 0x611000383e46 "\002"

c = 0x64000047bacb "\001"

c_end = 0x64000047bb22 "aY\031\t\005\004"

data_common = 10

block = 0x64000007d178

rec_size = 109

reuse = true

free_offset = -3

heap_no = 36

insert_buf = 0x64000047bedb "aY\031\t\005\004"

comp = false

extra_size = 12

next_rec = 0x64000047bc08 ""

page_last_insert = 0x64000047b030 <incomplete sequence \347>

last_insert = 0

page_n_recs = 0x64000047b036 ""

insert_rec = 0x64000047bee7 ""

data_size = 97

hdr_common = 5

n_owned = 4

info_status = 0 '\000'

We are attempting to insert a record of 12+97 bytes, while the record in the PAGE_FREE stack was 15+96 bytes. The origin of the record will move by 3 bytes from 0xeea to 0xee7 accordingly (see insert_rec above). All seems well and good on the write side. It is the recovery side that appears to be broken. Note: this LSN 0xa6a64a matches the start_lsn in the recovery.

Marko Mäkelä added a comment - 2021-05-31 12:00 Because no wrap-around of the redo log has taken place (the payload of ib_logfile0 file ends at 0xa99000), it should be rather simple to map the LSN to byte offsets in the file. Right before the SIGKILL, we got the following writes to the log file: Thread 23 hit Breakpoint 2, log_write_buf (buf=0x6d6302ac9000 "\200", len=512, start_lsn=11115520, new_data_offset=376) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562 562 ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0); Current event: 670464 [Switching to Thread 314150.340446] Thread 1 hit Breakpoint 2, log_write_buf (buf=0x4b191636c000 "\200", len=2048, start_lsn=11115520, new_data_offset=430) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562 562 ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0); Current event: 670870 [Switching to Thread 314150.340452] Thread 31 hit Breakpoint 2, log_write_buf (buf=0x6d6302ac9000 "\200", len=512, start_lsn=11117056, new_data_offset=116) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562 562 ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0); Current event: 671278 [Switching to Thread 314150.340446] Thread 1 hit Breakpoint 2, log_write_buf (buf=0x4b191636c000 "\200", len=1024, start_lsn=11117056, new_data_offset=312) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562 562 ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0); Current event: 671593 Thread 18 received signal SIGKILL, Killed. At the time of the SIGKILL, the log_sys.lsn was 0xa9a526. The last start_lsn above is 0xa9a200, and the log_sys.write_lsn is 0xa9a526 . It looks like the LSN is the file byte offset plus 0x1a00 (6656). Hence, the record for LSN 0xa723ed should be in the file at offset 0xa709ed. The unexpected record at 0xa68c4a must correspond to the LSN 0xa6a64a. Here we have it: 1: /x log_sys.lsn._M_i = 0xa6a64a Current event: 623570 (rr) bt #0 mtr_t::finish_write (this=0x55c213ae4877 <ilist<mtr_buf_t::block_t, void>::back() const+217>, len=35773916118888) at /data/Server/bb-10.5-marko/storage/innobase/mtr/mtr0mtr.cc:863 #1 0x000055c213bcc798 in mtr_t::commit (this=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/mtr/mtr0mtr.cc:408 #2 0x000055c213b3f878 in ibuf_mtr_commit (mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/include/ibuf0ibuf.ic:64 #3 0x000055c213b4ae1b in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x61a000844508, entry_size=77, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3466 #4 0x000055c213b4b2f7 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x61a000844508, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3603 #5 0x000055c213e02647 in btr_cur_search_to_nth_level_func (index=0x6160000f03f0, level=0, tuple=0x61a000844508, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x208943901ab0, ahi_latch=0x0, file=0x55c214795e40 "/data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc", line=2974, mtr=0x208943901e70, autoinc=0) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:1650 #6 0x000055c213c8b3e2 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x6160000f03f0, offsets_heap=0x619000cb9d80, heap=0x619000cb8e80, entry=0x61a000844508, trx_id=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:2971 #7 0x000055c213c8c59f in row_ins_sec_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0, check_foreign=true) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3294 #8 0x000055c213c8c784 in row_ins_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3342 #9 0x000055c213c8d14d in row_ins_index_entry_step (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3509 #10 0x000055c213c8d6b6 in row_ins (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3668 #11 0x000055c213c8de86 in row_ins_step (thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3818 #12 0x000055c213cb83ea in row_insert_for_mysql (mysql_rec=0x61f0001f80b8 "\377f", prebuilt=0x6220002b2170, ins_mode=ROW_INS_NORMAL) at /data/Server/bb-10.5-marko/storage/innobase/row/row0mysql.cc:1419 #13 0x000055c213a3b359 in ha_innobase::write_row (this=0x61d0008638b8, record=0x61f0001f80b8 "\377f") at /data/Server/bb-10.5-marko/storage/innobase/handler/ha_innodb.cc:7669 This was not found during my initial search, because I only watched the changes of the header field PAGE_HEAP_TOP . A little before the mtr_t::commit() , we actually did write the record that recovery failed to apply (note: reuse=true ): #0 mtr_t::page_insert (this=0x2089438ffd00, block=..., reuse=true, prev_rec=2652, info_bits=0 '\000', n_fields_s=13, hdr_c=5, data_c=10, hdr=0x64000047bedb "aY\031\t\005\004", hdr_l=1, data=0x611000383e46 "\002", data_l=87) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1118 #1 0x000055c213bf2303 in page_cur_insert_rec_low (cur=0x2089438ffb58, index=0x616000006ff0, rec=0x611000383e3c "", offsets=0x61600006a2f0, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1595 Now, let us compare the ‘before’ images of the page, before this mini-transaction was started, and before we failed to apply the log record: diff -u <(od -Ax -t x1 recovery-before-last.bin) <(od -Ax -t x1 write-before.bin) --- /dev/fd/63 2021-05-31 11:43:47.758752427 +0000 +++ /dev/fd/62 2021-05-31 11:43:47.758752427 +0000 @@ -1,5 +1,5 @@ 000000 00 00 00 00 00 00 04 88 00 00 04 89 ff ff ff ff -000010 00 00 00 00 00 a2 d7 01 45 bf 00 00 00 00 00 00 +000010 00 00 00 00 00 a6 4e aa 45 bf 00 00 00 00 00 00 000020 00 00 00 00 00 00 00 0a 0f 4a 00 25 0e ea 00 6f 000030 00 00 00 02 00 23 00 22 00 00 00 00 00 00 22 d5 000040 00 00 ff ff ff ff 00 00 00 00 ff ff ff ff 00 00 @@ -195,5 +195,5 @@ 000ef0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 000fe0 00 00 00 00 00 74 0d bc 0c 08 0a 54 08 a0 06 ec -000ff0 05 38 03 84 01 d0 00 65 00 a2 d7 01 8e 48 0a 5d +000ff0 05 38 03 84 01 d0 00 65 00 00 00 00 00 00 00 00 001000 Again, only the LSN and checksum fields differ. Recovery failed, because the reuse of the record at the top of the PAGE_FREE stack exceeded the PAGE_HEAP_TOP 0xf4a. Let us check if that actually was the case. The PAGE_FREE points to 0xeea, which is the only record in the PAGE_FREE stack. PAGE_GARBAGE is 0x6f (111) bytes. At the time of writing a log record for the write, we notably have a negative free_offset : 0x000055c213bf22fe in page_cur_insert_rec_low (cur=0x2089438ffb58, index=0x616000006ff0, rec=0x611000383e3c "", offsets=0x61600006a2f0, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1595 1595 mtr->page_insert(*block, reuse, 1: /x log_sys.lsn._M_i = 0xa6a64a (rr) i lo r = 0x611000383e46 "\002" c = 0x64000047bacb "\001" c_end = 0x64000047bb22 "aY\031\t\005\004" data_common = 10 block = 0x64000007d178 rec_size = 109 reuse = true free_offset = -3 heap_no = 36 insert_buf = 0x64000047bedb "aY\031\t\005\004" comp = false extra_size = 12 next_rec = 0x64000047bc08 "" page_last_insert = 0x64000047b030 <incomplete sequence \347> last_insert = 0 page_n_recs = 0x64000047b036 "" insert_rec = 0x64000047bee7 "" data_size = 97 hdr_common = 5 n_owned = 4 info_status = 0 '\000' We are attempting to insert a record of 12+97 bytes, while the record in the PAGE_FREE stack was 15+96 bytes. The origin of the record will move by 3 bytes from 0xeea to 0xee7 accordingly (see insert_rec above). All seems well and good on the write side. It is the recovery side that appears to be broken. Note: this LSN 0xa6a64a matches the start_lsn in the recovery.

Marko Mäkelä added a comment - 2021-05-31 12:13

After long debugging, I think that I found the bug. It is obvious when you look at the preceding and following lines. This affects the 10.5 and 10.6 series only.

diff --git a/storage/innobase/page/page0cur.cc b/storage/innobase/page/page0cur.cc

index 025d4b9b967..f75d1a8be77 100644

--- a/storage/innobase/page/page0cur.cc

+++ b/storage/innobase/page/page0cur.cc

@@ -2443,7 +2443,7 @@ bool page_apply_insert_redundant(const buf_block_t &block, bool reuse,

     if (UNIV_UNLIKELY(free_rec - fextra_size < heap_bot))

       goto corrupted;

     const ulint fdata_size= rec_get_data_size_old(free_rec);

-    if (UNIV_UNLIKELY(free_rec + data_size > heap_top))

+    if (UNIV_UNLIKELY(free_rec + fdata_size > heap_top))

       goto corrupted;

     if (UNIV_UNLIKELY(extra_size + data_size > fextra_size + fdata_size))

       goto corrupted;

Marko Mäkelä added a comment - 2021-05-31 12:13 After long debugging, I think that I found the bug. It is obvious when you look at the preceding and following lines. This affects the 10.5 and 10.6 series only. diff --git a/storage/innobase/page/page0cur.cc b/storage/innobase/page/page0cur.cc index 025d4b9b967..f75d1a8be77 100644 --- a/storage/innobase/page/page0cur.cc +++ b/storage/innobase/page/page0cur.cc @@ -2443,7 +2443,7 @@ bool page_apply_insert_redundant(const buf_block_t &block, bool reuse, if (UNIV_UNLIKELY(free_rec - fextra_size < heap_bot)) goto corrupted; const ulint fdata_size= rec_get_data_size_old(free_rec); - if (UNIV_UNLIKELY(free_rec + data_size > heap_top)) + if (UNIV_UNLIKELY(free_rec + fdata_size > heap_top)) goto corrupted; if (UNIV_UNLIKELY(extra_size + data_size > fextra_size + fdata_size)) goto corrupted;

Marko Mäkelä made changes - 2021-05-31 12:48

Link

This issue is caused by ~~MDEV-21724~~ [ ~~MDEV-21724~~ ]

Marko Mäkelä made changes - 2021-05-31 12:50

Fix Version/s		10.6 [ 24028 ]
Affects Version/s		10.6.1 [ 24437 ]
Affects Version/s		10.6.0 [ 24431 ]
Affects Version/s		10.5.8 [ 25023 ]
Affects Version/s		10.5.7 [ 25019 ]
Affects Version/s		10.5.6 [ 24508 ]
Affects Version/s		10.5.5 [ 24423 ]
Affects Version/s		10.5.3 [ 24263 ]
Affects Version/s		10.5.2 [ 24030 ]
Affects Version/s		10.5.4 [ 24264 ]
Summary	Joining Nodes to Galera Cluster Fails with [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id:	InnoDB recovery fails with [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption

Marko Mäkelä made changes - 2021-05-31 17:41

Fix Version/s		10.6.2 [ 25800 ]
Fix Version/s		10.5.11 [ 25734 ]
Fix Version/s	10.5 [ 23123 ]
Fix Version/s	10.6 [ 24028 ]
Resolution		Fixed [ 1 ]
Status	In Progress [ 3 ]	Closed [ 6 ]

Ralf Gebhardt made changes - 2021-05-31 20:04

Link

This issue blocks MENT-1208 [ MENT-1208 ]

Sergei Golubchik made changes - 2021-12-06 21:53

Workflow

MariaDB v3 [ 122043 ]

MariaDB v4 [ 159319 ]

People

Assignee:: Marko Mäkelä

Reporter:: Tomi Tukiainen

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2021-05-20 15:10

Updated:: 2021-06-01 09:32

Resolved:: 2021-05-31 17:41

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration