Operating System: Amazon Linux 2 AMI
Hardware: Amazon AWS t3a.medium
3-Node Galera Cluster
Description
We have several production databases that we migrated early 2021 from single node setup to a 3-node Galera Cluster for high availability. During the migration project we implemented a tool for managing the cluster and one of the features of that tool is "cycle-db-cluster" functionality that replaces the oldest node of the cluster with a totally new one. Since the project we have kept the cluster up to date by replacing nodes periodically using this automated process.
At 2021-05-11 the replace process failed as a new 10.5.10 node could not join our cluster that consisted of 10.5.9 nodes. In mysqld.log the new node reported [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=14408]. We tried to repeat the process several times also using MariaDB version 10.5.9 and a new virtual server each time but the issue persisted. Our mysqldumps worked just fine so we scheduled a service break for 2021-05-12 and started a new cluster using MariaDB 10.5.10 and data from mysqldumps. An important detail is that we imported mysqldumps to a 1-node cluster and added 2 nodes after imports had finished – so joining definitely worked shortly after import. We also replaced the first node of the cluster by using the "cycle-db-cluster" functionality ~1 hour after the service break.
At 2021-05-20 we tried to replace a node in the cluster for the first time since the day we set it up. Again joining a new node to the cluster failed and mysqld.log in the new joining node contains [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=2366]. So now we have almost new 3-node 10.5.10 cluster where we can't join new nodes. The cluster itself continues to perform just fine and mysqldumps succeed. For me it seems that the system detects a corruption for some reason while the data inside databases is actually totally OK.
Attachments:
20210520-cycle-db-cluster.log:
Contains log of the automated cycle-db-cluster process. It contains details how a new db server was set up from scratch all the way to attempting to start the mysqld process.
Conf.zip:
Database configuration files that we use for our nodes. These have a few variables that are replaced automatically during node setup process.
20210520-clusterjoin-failure-mysqld.log:
mysqld.log from the node join failure at 2021-05-20.
2021-05-20 13:24:02 0 [Note] InnoDB: Starting final batch to recover 28591 pages from redo log.
2021-05-20 13:24:02 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=2366]
Can you provide a copy of the data directory so that I can diagnose this? I would need at least the ib_logfile0 and the first 2367 pages of the system tablespace file(s).
Marko Mäkelä
added a comment - MDEV-25031 was an error that affected the recovery from a crash or a physical backup in 10.5.
In 20210520-clusterjoin-failure-mysqld.log it indeed seems that a 10.5.10 server is failing to start up:
mariadb-10.5.10
2021-05-20 13:24:02 0 [Note] InnoDB: Starting final batch to recover 28591 pages from redo log.
2021-05-20 13:24:02 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=2366]
Can you provide a copy of the data directory so that I can diagnose this? I would need at least the ib_logfile0 and the first 2367 pages of the system tablespace file(s).
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Number of pools: 1
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] mariadbd: O_TMPFILE is not supported on /data/tmp (disabling future attempts)
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Using Linux native AIO
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Initializing buffer pool, total size = 32212254720, chunk size = 134217728
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Completed initialization of buffer pool
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=23948611322110,23948611322110
May 22 04:19:51 server2 mariadbd[1685]: 2021-05-22 4:19:51 0 [Note] InnoDB: Starting final batch to recover 32275 pages from redo log.
May 22 04:19:52 server2 mariadbd[1685]: 2021-05-22 4:19:52 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=33844463]
May 22 04:19:52 server2 mariadbd[1685]: 2021-05-22 4:19:52 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
May 22 04:19:53 server2 mariadbd[1685]: 2021-05-22 4:19:53 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption
May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22 4:19:54 0 [ERROR] Aborting
May 22 04:19:54 server2 mariadbd[1685]: terminate called after throwing an instance of 'wsrep::runtime_error'
May 22 04:19:54 server2 mariadbd[1685]: what(): State wait was interrupted
May 22 04:19:54 server2 mariadbd[1685]: 210522 4:19:54 [ERROR] mysqld got signal 6 ;
May 22 04:19:54 server2 mariadbd[1685]: This could be because you hit a bug. It is also possible that this binary
May 22 04:19:54 server2 mariadbd[1685]: or one of the libraries it was linked against is corrupt, improperly built,
May 22 04:19:54 server2 mariadbd[1685]: or misconfigured. This error can also be caused by malfunctioning hardware.
May 22 04:19:54 server2 mariadbd[1685]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
May 22 04:19:54 server2 mariadbd[1685]: We will try our best to scrape up some info that will hopefully help
May 22 04:19:54 server2 mariadbd[1685]: diagnose the problem, but since we have already crashed,
May 22 04:19:54 server2 mariadbd[1685]: something is definitely wrong and this may fail.
May 22 04:19:54 server2 mariadbd[1685]: Server version: 10.5.10-MariaDB-1:10.5.10+maria~buster-log
May 22 04:19:54 server2 mariadbd[1685]: key_buffer_size=134217728
May 22 04:19:54 server2 mariadbd[1685]: read_buffer_size=1048576
May 22 04:19:54 server2 mariadbd[1685]: max_used_connections=0
May 22 04:19:54 server2 mariadbd[1685]: max_threads=10002
May 22 04:19:54 server2 mariadbd[1685]: thread_count=4
May 22 04:19:54 server2 mariadbd[1685]: It is possible that mysqld could use up to
May 22 04:19:54 server2 mariadbd[1685]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 31111928 K bytes of memory
May 22 04:19:54 server2 mariadbd[1685]: Hope that's ok; if not, decrease some variables in the equation.
May 22 04:19:54 server2 mariadbd[1685]: Thread pointer: 0x7f777001b918
May 22 04:19:54 server2 mariadbd[1685]: Attempting backtrace. You can use the following information to find out
May 22 04:19:54 server2 mariadbd[1685]: where mysqld died. If you see no messages after this, something went
May 22 04:19:54 server2 mariadbd[1685]: terribly wrong...
May 22 04:19:54 server2 mariadbd[1685]: stack_bottom = 0x7f77711fc300 thread_stack 0x49000
May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x56105e3234ee]
May 22 04:19:54 server2 mariadbd[1685]: Printing to addr2line failed
May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x56105ddacca5]
May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f87793a8730]
May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7f8778efe7bb]
May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7f8778ee9535]
May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8c983)[0x7f8779299983]
May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x928c6)[0x7f877929f8c6]
May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92901)[0x7f877929f901]
May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92b34)[0x7f877929fb34]
May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0x6531f6)[0x56105dab01f6]
May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(_ZN5wsrep12server_state12sst_receivedERNS_14client_serviceEi+0xca9)[0x56105e3afd99]
May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0xbecd63)[0x56105e049d63]
May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0xbeda4e)[0x56105e04aa4e]
May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0xb7a7cb)[0x56105dfd77cb]
May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f877939dfa3]
May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f8778fc04cf]
May 22 04:19:54 server2 mariadbd[1685]: Trying to get some variables.
May 22 04:19:54 server2 mariadbd[1685]: Some pointers may be invalid and cause the dump to abort.
May 22 04:19:54 server2 mariadbd[1685]: Query (0x0): (null)
May 22 04:19:54 server2 mariadbd[1685]: Connection ID (thread ID): 3
May 22 04:19:54 server2 mariadbd[1685]: Status: NOT_KILLED
May 22 04:19:54 server2 mariadbd[1685]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
May 22 04:19:54 server2 mariadbd[1685]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
May 22 04:19:54 server2 mariadbd[1685]: information that should help you find out what is causing the crash.
May 22 04:19:54 server2 mariadbd[1685]: We think the query pointer is invalid, but we will try to print it anyway.
May 22 04:19:54 server2 mariadbd[1685]: Query:
May 22 04:19:54 server2 mariadbd[1685]: Writing a core file...
May 22 04:19:54 server2 mariadbd[1685]: Working directory at /data/mysql
May 22 04:19:54 server2 mariadbd[1685]: Resource Limits:
May 22 04:19:54 server2 mariadbd[1685]: Limit Soft Limit Hard Limit Units
May 22 04:19:54 server2 mariadbd[1685]: Max cpu time unlimited unlimited seconds
May 22 04:19:54 server2 mariadbd[1685]: Max file size unlimited unlimited bytes
May 22 04:19:54 server2 mariadbd[1685]: Max data size unlimited unlimited bytes
May 22 04:19:54 server2 mariadbd[1685]: Max stack size 8388608 unlimited bytes
May 22 04:19:54 server2 mariadbd[1685]: Max core file size 0 unlimited bytes
May 22 04:19:54 server2 mariadbd[1685]: Max resident set unlimited unlimited bytes
May 22 04:19:54 server2 mariadbd[1685]: Max processes 500000 500000 processes
May 22 04:19:54 server2 mariadbd[1685]: Max open files 500000 500000 files
May 22 04:19:54 server2 mariadbd[1685]: Max locked memory 65536 65536 bytes
May 22 04:19:54 server2 mariadbd[1685]: Max address space unlimited unlimited bytes
May 22 04:19:54 server2 mariadbd[1685]: Max file locks unlimited unlimited locks
May 22 04:19:54 server2 mariadbd[1685]: Max pending signals 386408 386408 signals
May 22 04:19:54 server2 mariadbd[1685]: Max msgqueue size 819200 819200 bytes
May 22 04:19:54 server2 mariadbd[1685]: Max nice priority 0 0
May 22 04:19:54 server2 mariadbd[1685]: Max realtime priority 0 0
May 22 04:19:54 server2 mariadbd[1685]: Max realtime timeout unlimited unlimited us
May 22 04:19:54 server2 mariadbd[1685]: Core pattern: core
May 22 04:19:54 server2 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT
LuborJ
added a comment - - edited Same issue here after mariabackup SST:
May 22 04:19:46 server2 -wsrep-sst-joiner: Move successful, removing /data/mysql//.sst
May 22 04:19:46 server2 -wsrep-sst-joiner: Galera co-ords from recovery: 168f7b8e-dcca-11e3-a9d4-5e95c6d961f7:106462667529
May 22 04:19:46 server2 -wsrep-sst-joiner: Total time on joiner: 0 seconds
May 22 04:19:46 server2 -wsrep-sst-joiner: Removing the sst_in_progress file
May 22 04:19:46 server2 mariadbd[1685]: 2021-05-22 4:19:46 3 [Note] WSREP: SST received
May 22 04:19:46 server2 mariadbd[1685]: 2021-05-22 4:19:46 3 [Note] WSREP: Server status change joiner -> initializing
May 22 04:19:46 server2 mariadbd[1685]: 2021-05-22 4:19:46 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] mariadbd: Aria engine: starting recovery
May 22 04:19:47 server2 mariadbd[1685]: tables to flush: 1 0
May 22 04:19:47 server2 mariadbd[1685]: (0.0 seconds);
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] mariadbd: Aria engine: recovery done
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Warning] The parameter innodb_buffer_pool_instances is deprecated and has no effect.
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Uses event mutexes
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Number of pools: 1
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] mariadbd: O_TMPFILE is not supported on /data/tmp (disabling future attempts)
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Using Linux native AIO
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Initializing buffer pool, total size = 32212254720, chunk size = 134217728
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Completed initialization of buffer pool
May 22 04:19:47 server2 mariadbd[1685]: 2021-05-22 4:19:47 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=23948611322110,23948611322110
May 22 04:19:51 server2 mariadbd[1685]: 2021-05-22 4:19:51 0 [Note] InnoDB: Starting final batch to recover 32275 pages from redo log.
May 22 04:19:52 server2 mariadbd[1685]: 2021-05-22 4:19:52 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=33844463]
May 22 04:19:52 server2 mariadbd[1685]: 2021-05-22 4:19:52 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
May 22 04:19:53 server2 mariadbd[1685]: 2021-05-22 4:19:53 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption
May 22 04:19:53 server2 mariadbd[1685]: 2021-05-22 4:19:53 0 [Note] InnoDB: Starting shutdown...
May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22 4:19:54 0 [ERROR] Plugin 'InnoDB' init function returned error.
May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22 4:19:54 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22 4:19:54 0 [Note] Plugin 'FEEDBACK' is disabled.
May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22 4:19:54 0 [ERROR] Unknown/unsupported storage engine: InnoDB
May 22 04:19:54 server2 mariadbd[1685]: 2021-05-22 4:19:54 0 [ERROR] Aborting
May 22 04:19:54 server2 mariadbd[1685]: terminate called after throwing an instance of 'wsrep::runtime_error'
May 22 04:19:54 server2 mariadbd[1685]: what(): State wait was interrupted
May 22 04:19:54 server2 mariadbd[1685]: 210522 4:19:54 [ERROR] mysqld got signal 6 ;
May 22 04:19:54 server2 mariadbd[1685]: This could be because you hit a bug. It is also possible that this binary
May 22 04:19:54 server2 mariadbd[1685]: or one of the libraries it was linked against is corrupt, improperly built,
May 22 04:19:54 server2 mariadbd[1685]: or misconfigured. This error can also be caused by malfunctioning hardware.
May 22 04:19:54 server2 mariadbd[1685]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
May 22 04:19:54 server2 mariadbd[1685]: We will try our best to scrape up some info that will hopefully help
May 22 04:19:54 server2 mariadbd[1685]: diagnose the problem, but since we have already crashed,
May 22 04:19:54 server2 mariadbd[1685]: something is definitely wrong and this may fail.
May 22 04:19:54 server2 mariadbd[1685]: Server version: 10.5.10-MariaDB-1:10.5.10+maria~buster-log
May 22 04:19:54 server2 mariadbd[1685]: key_buffer_size=134217728
May 22 04:19:54 server2 mariadbd[1685]: read_buffer_size=1048576
May 22 04:19:54 server2 mariadbd[1685]: max_used_connections=0
May 22 04:19:54 server2 mariadbd[1685]: max_threads=10002
May 22 04:19:54 server2 mariadbd[1685]: thread_count=4
May 22 04:19:54 server2 mariadbd[1685]: It is possible that mysqld could use up to
May 22 04:19:54 server2 mariadbd[1685]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 31111928 K bytes of memory
May 22 04:19:54 server2 mariadbd[1685]: Hope that's ok; if not, decrease some variables in the equation.
May 22 04:19:54 server2 mariadbd[1685]: Thread pointer: 0x7f777001b918
May 22 04:19:54 server2 mariadbd[1685]: Attempting backtrace. You can use the following information to find out
May 22 04:19:54 server2 mariadbd[1685]: where mysqld died. If you see no messages after this, something went
May 22 04:19:54 server2 mariadbd[1685]: terribly wrong...
May 22 04:19:54 server2 mariadbd[1685]: stack_bottom = 0x7f77711fc300 thread_stack 0x49000
May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x56105e3234ee]
May 22 04:19:54 server2 mariadbd[1685]: Printing to addr2line failed
May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x56105ddacca5]
May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f87793a8730]
May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7f8778efe7bb]
May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7f8778ee9535]
May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8c983)[0x7f8779299983]
May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x928c6)[0x7f877929f8c6]
May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92901)[0x7f877929f901]
May 22 04:19:54 server2 mariadbd[1685]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92b34)[0x7f877929fb34]
May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0x6531f6)[0x56105dab01f6]
May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(_ZN5wsrep12server_state12sst_receivedERNS_14client_serviceEi+0xca9)[0x56105e3afd99]
May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0xbecd63)[0x56105e049d63]
May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0xbeda4e)[0x56105e04aa4e]
May 22 04:19:54 server2 mariadbd[1685]: /usr/sbin/mariadbd(+0xb7a7cb)[0x56105dfd77cb]
May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f877939dfa3]
May 22 04:19:54 server2 mariadbd[1685]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f8778fc04cf]
May 22 04:19:54 server2 mariadbd[1685]: Trying to get some variables.
May 22 04:19:54 server2 mariadbd[1685]: Some pointers may be invalid and cause the dump to abort.
May 22 04:19:54 server2 mariadbd[1685]: Query (0x0): (null)
May 22 04:19:54 server2 mariadbd[1685]: Connection ID (thread ID): 3
May 22 04:19:54 server2 mariadbd[1685]: Status: NOT_KILLED
May 22 04:19:54 server2 mariadbd[1685]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
May 22 04:19:54 server2 mariadbd[1685]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
May 22 04:19:54 server2 mariadbd[1685]: information that should help you find out what is causing the crash.
May 22 04:19:54 server2 mariadbd[1685]: We think the query pointer is invalid, but we will try to print it anyway.
May 22 04:19:54 server2 mariadbd[1685]: Query:
May 22 04:19:54 server2 mariadbd[1685]: Writing a core file...
May 22 04:19:54 server2 mariadbd[1685]: Working directory at /data/mysql
May 22 04:19:54 server2 mariadbd[1685]: Resource Limits:
May 22 04:19:54 server2 mariadbd[1685]: Limit Soft Limit Hard Limit Units
May 22 04:19:54 server2 mariadbd[1685]: Max cpu time unlimited unlimited seconds
May 22 04:19:54 server2 mariadbd[1685]: Max file size unlimited unlimited bytes
May 22 04:19:54 server2 mariadbd[1685]: Max data size unlimited unlimited bytes
May 22 04:19:54 server2 mariadbd[1685]: Max stack size 8388608 unlimited bytes
May 22 04:19:54 server2 mariadbd[1685]: Max core file size 0 unlimited bytes
May 22 04:19:54 server2 mariadbd[1685]: Max resident set unlimited unlimited bytes
May 22 04:19:54 server2 mariadbd[1685]: Max processes 500000 500000 processes
May 22 04:19:54 server2 mariadbd[1685]: Max open files 500000 500000 files
May 22 04:19:54 server2 mariadbd[1685]: Max locked memory 65536 65536 bytes
May 22 04:19:54 server2 mariadbd[1685]: Max address space unlimited unlimited bytes
May 22 04:19:54 server2 mariadbd[1685]: Max file locks unlimited unlimited locks
May 22 04:19:54 server2 mariadbd[1685]: Max pending signals 386408 386408 signals
May 22 04:19:54 server2 mariadbd[1685]: Max msgqueue size 819200 819200 bytes
May 22 04:19:54 server2 mariadbd[1685]: Max nice priority 0 0
May 22 04:19:54 server2 mariadbd[1685]: Max realtime priority 0 0
May 22 04:19:54 server2 mariadbd[1685]: Max realtime timeout unlimited unlimited us
May 22 04:19:54 server2 mariadbd[1685]: Core pattern: core
May 22 04:19:54 server2 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT
tomitukiainen, thank you, I downloaded the partial copy of the data directory that you provided. Before starting any recovery, I copied the corrupted page:
Finally, after the above run failed (without modifying any files), I started up the server without that debug option, but with innodb_force_recovery=1 so that missing *.ibd files will be ignored. There indeed were messages issued about page 0:2366:
Now, let us look at the log records for this page:
?func: ib_log: scan 18951635875: rec 20 len 79 page 0:2366
?func: ib_log: scan 18951635875: rec b3 len 4 page 0:2366
…
?func: ib_log: scan 19067167151: rec 3a len 11 page 0:2366
?func: ib_log: scan 19067471753: rec 3a len 11 page 0:2366
So far, everything looks consistent. The page was apparently last written before the checkpoint, and the subsequent modification is after the checkpoint (18949541293<18950648175<18951635875).
The page belongs to the change buffer. To work around this bug (and MDEV-22373, which might be related to this), you can set innodb_change_buffering=none and ensure that you have no ROW_FORMAT=REDUNDANT tables. Disabling change buffering may cause some performance regression (see MDEV-14094, MDEV-11634, MDEV-19514).
Because the change buffer will be corrupted, also some secondary index leaf pages may be corrupted. You should try execute CHECK TABLE on each InnoDB table to force a change buffer merge and to check for corruption. Corrupted secondary indexes can be dropped and re-created.
Using the data, I will try to figure out which exact write is failing, but this will take significant time and effort, because I do not have a complete execution trace of the server that was running before the data was copied. If we had an https://rr-project.org/ trace of the killed server in addition to having the data directory, this would be trivial to diagnose.
mleich, can you try to produce such a trace? Run 10.5 or 10.6 with a tiny buffer pool and a heavily indexed table (say, index(a,b,c,d),index(a,b,d,c),…,index(d,c,b,a) and UPDATE … SET a=random_data). Avoid any WHERE condition on any of those indexed columns, so that change buffer merges will be avoided for as long as possible.
Marko Mäkelä
added a comment - tomitukiainen , thank you, I downloaded the partial copy of the data directory that you provided. Before starting any recovery, I copied the corrupted page:
dd if=ibdata1 bs=16384 skip=2366 count=1 of=page2336.bin
I see that FIL_PAGE_LSN (the 8 bytes at byte offset 16) is 0x4697b0dad (18949541293).
Then, I started up a debug build of recent 10.5 branch to find out which log records are referring to this page:
sql/mariadbd --datadir=… --debug=d,ib_log 2>&1| grep -w 2366 |tee 2366.txt
Finally, after the above run failed (without modifying any files), I started up the server without that debug option, but with innodb_force_recovery=1 so that missing *.ibd files will be ignored. There indeed were messages issued about page 0:2366:
2021-05-24 14:04:44 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=18950648175,18950648175
…
2021-05-24 14:04:46 0 [Note] InnoDB: Starting final batch to recover 1172 pages from redo log.
2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=2366]
2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=2366]
2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366]
2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366]
2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366]
2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366]
2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366]
2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366]
2021-05-24 14:04:46 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=2366]
2021-05-24 14:04:46 0 [ERROR] InnoDB: n recs wrong 156 165
2021-05-24 14:04:46 0 [ERROR] InnoDB: n recs wrong 156 165
mariadbd: /mariadb/10.5m/storage/innobase/page/page0cur.cc:2530: bool page_apply_insert_redundant(const buf_block_t&, bool, ulint, ulint, size_t, size_t, const void*, size_t): Assertion `page_simple_validate_old(block.frame)' failed.
Now, let us look at the log records for this page:
?func: ib_log: scan 18951635875: rec 20 len 79 page 0:2366
?func: ib_log: scan 18951635875: rec b3 len 4 page 0:2366
…
?func: ib_log: scan 19067167151: rec 3a len 11 page 0:2366
?func: ib_log: scan 19067471753: rec 3a len 11 page 0:2366
So far, everything looks consistent. The page was apparently last written before the checkpoint, and the subsequent modification is after the checkpoint (18949541293<18950648175<18951635875).
The page belongs to the change buffer. To work around this bug (and MDEV-22373 , which might be related to this), you can set innodb_change_buffering=none and ensure that you have no ROW_FORMAT=REDUNDANT tables. Disabling change buffering may cause some performance regression (see MDEV-14094 , MDEV-11634 , MDEV-19514 ).
Because the change buffer will be corrupted, also some secondary index leaf pages may be corrupted. You should try execute CHECK TABLE on each InnoDB table to force a change buffer merge and to check for corruption. Corrupted secondary indexes can be dropped and re-created.
Using the data, I will try to figure out which exact write is failing, but this will take significant time and effort, because I do not have a complete execution trace of the server that was running before the data was copied. If we had an https://rr-project.org/ trace of the killed server in addition to having the data directory, this would be trivial to diagnose.
mleich , can you try to produce such a trace? Run 10.5 or 10.6 with a tiny buffer pool and a heavily indexed table (say, index(a,b,c,d),index(a,b,d,c),…,index(d,c,b,a) and UPDATE … SET a=random_data ). Avoid any WHERE condition on any of those indexed columns, so that change buffer merges will be avoided for as long as possible.
The same mini-transaction is modifying this page twice (I guess, inserting the buffered record into the change buffer and maybe updating PAGE_MAX_TRX_ID):
?func: ib_log: scan 18962387336: rec 20 len 72 page 0:2366
?func: ib_log: scan 18962387336: rec b3 len 4 page 0:2366
The reason for the inconsistency is:
if (UNIV_UNLIKELY(free_rec + data_size > heap_top))
goto corrupted;
That is, we are supposed to reuse the space that was occupied by a previously deleted record (the top of the PAGE_FREE stack), but the allocation is extending past the end of PAGE_HEAP_TOP. This looks serious indeed.
mleich, I think that we’d better use innodb_page_size=4k and indexed VARCHAR columns for reproducing this. If we use a large page size or fixed-size records, we’d never hit this. The check would only fail if the last-allocated record in the page was freed and subsequently reallocated. I hope that this corruption is also reproducible with ROW_FORMAT=REDUNDANT tables (and without change buffering).
I do not think I can do more on this until we have an rr replay trace.
Marko Mäkelä
added a comment - The following record is the first to be found to be inconsistent with the page contents:
#0 0x000056264c48b4d4 in log_phys_t::apply (this=this@entry=0x7f155569b3b0,
block=
@0x7f15441952d0: {page = {id_ = {m_id = 2366},
…
at /mariadb/10.5m/storage/innobase/log/log0recv.cc:397
397 if (page_apply_insert_redundant(block, subtype & 1, prev_rec,
(rr) p *this
$1 = {<log_rec_t> = {next = 0x7f155569f320, lsn = 18962387413}, start_lsn = 18962387336}
The same mini-transaction is modifying this page twice (I guess, inserting the buffered record into the change buffer and maybe updating PAGE_MAX_TRX_ID ):
?func: ib_log: scan 18962387336: rec 20 len 72 page 0:2366
?func: ib_log: scan 18962387336: rec b3 len 4 page 0:2366
The reason for the inconsistency is:
if (UNIV_UNLIKELY(free_rec + data_size > heap_top))
goto corrupted;
That is, we are supposed to reuse the space that was occupied by a previously deleted record (the top of the PAGE_FREE stack), but the allocation is extending past the end of PAGE_HEAP_TOP . This looks serious indeed.
mleich , I think that we’d better use innodb_page_size=4k and indexed VARCHAR columns for reproducing this. If we use a large page size or fixed-size records, we’d never hit this. The check would only fail if the last-allocated record in the page was freed and subsequently reallocated. I hope that this corruption is also reproducible with ROW_FORMAT=REDUNDANT tables (and without change buffering).
I do not think I can do more on this until we have an rr replay trace.
It reports Not applying INSERT_REUSE_REDUNDANT for a change buffer page, due to exactly the same reason as in the data directory that was provided by tomitukiainen.
Next week, I will get the details from the failed recovery, and then debug the trace of the server run that ended in SIGKILL, to figure out what caused the problem.
If we are lucky, this could explain MDEV-25783 as well. (For that to be possible, I think that something related to the change buffer merge must skip writing redo log.)
Marko Mäkelä
added a comment - mleich was able to produce an rr replay trace for this:
ssh pluto
rr replay /data/Results/1622138407/TBR-1093/dev/shm/vardir/1622138407/123/1/rr/latest-trace
It reports Not applying INSERT_REUSE_REDUNDANT for a change buffer page, due to exactly the same reason as in the data directory that was provided by tomitukiainen .
Next week, I will get the details from the failed recovery, and then debug the trace of the server run that ended in SIGKILL, to figure out what caused the problem.
If we are lucky, this could explain MDEV-25783 as well. (For that to be possible, I think that something related to the change buffer merge must skip writing redo log.)
Below, I am documenting my workflow at finding the root cause of this failure. Hopefully it will serve an educational purpose. I believe that with rr replay traces of the killed server and misbehaving recovered server, as well as a copy of the data directory before the recovery was started, any recovery bug can be diagnosed.
When we fail to apply the log record, it is for page 1160 in the system tablespace. We can also determine the LSN:
at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:2321
2321 ib::error() << (reuse
(rr) p/x block.frame[16]@8
$1 = {0x0, 0x0, 0x0, 0x0, 0x0, 0xa2, 0xd7, 0x1}
(rr) up
#3 0x000055de457b21e1 in log_phys_t::apply (this=0x25592084fde8, block=..., last_offset=@0x606001178b6c: 0) at /data/Server/bb-10.5-marko/storage/innobase/log/log0recv.cc:397
397 if (page_apply_insert_redundant(block, subtype & 1, prev_rec,
Based on this recovery failure, for the rr replay execution trace of the server that ended in SIGKILL we must check every modification to the page between LSN 0xa2d701 and 0xa6a6b2. The log record that we failed to apply had been written by the following (and I do not yet conclude that there was anything wrong with those writes):
#7 0x000055c213bccd96 in mtr_t::commit (this=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/mtr/mtr0mtr.cc:442
#8 0x000055c213b3f878 in ibuf_mtr_commit (mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/include/ibuf0ibuf.ic:64
at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:2971
#13 0x000055c213c8c59f in row_ins_sec_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0, check_foreign=true) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3294
#14 0x000055c213c8c784 in row_ins_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3342
#15 0x000055c213c8d14d in row_ins_index_entry_step (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3509
#16 0x000055c213c8d6b6 in row_ins (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3668
#17 0x000055c213c8de86 in row_ins_step (thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3818
#18 0x000055c213cb83ea in row_insert_for_mysql (mysql_rec=0x61f0001f80b8 "\377f", prebuilt=0x6220002b2170, ins_mode=ROW_INS_NORMAL) at /data/Server/bb-10.5-marko/storage/innobase/row/row0mysql.cc:1419
#19 0x000055c213a3b359 in ha_innobase::write_row (this=0x61d0008638b8, record=0x61f0001f80b8 "\377f") at /data/Server/bb-10.5-marko/storage/innobase/handler/ha_innodb.cc:7669
…
(rr) frame 21
#21 0x000055c2129b21fd in write_record (thd=thd@entry=0x62b0000af218, table=table@entry=0x619000bad198, info=info@entry=0x208943903d40, sink=sink@entry=0x0) at /data/Server/bb-10.5-marko/sql/sql_insert.cc:2107
2107 else if (unlikely((error=table->file->ha_write_row(table->record[0]))))
Thread 18 hit Hardware access (read/write) watchpoint 1: *(ulong*)0x64000007d1c0
Value = 9683160
0x000055c213b3a125 in std::__atomic_base<unsigned long>::load (__m=std::memory_order_relaxed, this=0x64000007d1c0) at /usr/include/c++/9/bits/atomic_base.h:419
419 return __atomic_load_n(&_M_i, int(__m));
2: /x log_sys.lsn._M_i = 0xa6a64a
(rr) bt
#0 0x000055c213b3a125 in std::__atomic_base<unsigned long>::load (__m=std::memory_order_relaxed, this=0x64000007d1c0) at /usr/include/c++/9/bits/atomic_base.h:419
#1 Atomic_counter<unsigned long>::operator unsigned long (this=0x64000007d1c0) at /data/Server/bb-10.5-marko/include/my_counter.h:45
#2 0x000055c213b39ac2 in buf_page_t::oldest_modification (this=0x64000007d178) at /data/Server/bb-10.5-marko/storage/innobase/include/buf0buf.h:936
#3 0x000055c213b537da in mtr_t::is_block_dirtied (block=0x64000007d178) at /data/Server/bb-10.5-marko/storage/innobase/include/mtr0mtr.ic:35
#4 0x000055c213b5385c in mtr_t::memo_push (this=0x2089438ffd00, object=0x64000007d178, type=MTR_MEMO_PAGE_X_FIX) at /data/Server/bb-10.5-marko/storage/innobase/include/mtr0mtr.ic:57
#5 0x000055c213e45123 in buf_page_mtr_lock (block=0x64000007d178, rw_latch=2, mtr=0x2089438ffd00, file=0x55c21482fc20 "/data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc", line=370)
at /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:2866
at /data/Server/bb-10.5-marko/storage/innobase/include/btr0btr.h:237
#9 0x000055c213dfd9ff in btr_cur_latch_leaves (block=0x64000007d178, latch_mode=36, cursor=0x2089438ffb50, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:370
at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3276
…
(rr) when
Current event: 623569
It seems that the same mini-transaction is modifying the change buffer page two times. The recovery failure occurred for one of those 2 log records.
Let us follow the trace in the forward direction from an earlier point of time:
(rr) display
2: /x log_sys.lsn._M_i = 0x890396
(rr) cond 1 log_sys.lsn._M_i>=0xa2d701
(rr) continue
Thread 45 hit Hardware access (read/write) watchpoint 1: *(ulong*)0x64000007d1c0
Value = 9683160
std::__atomic_base<unsigned long>::load (__m=std::memory_order_relaxed, this=0x64000007d1c0) at /usr/include/c++/9/bits/atomic_base.h:419
419 return __atomic_load_n(&_M_i, int(__m));
2: /x log_sys.lsn._M_i = 0xa2d701
(rr) when
Current event: 575927
(rr) frame 26
#26 0x000055c2129b21fd in write_record (thd=thd@entry=0x62b000150218, table=table@entry=0x6190012de598, info=info@entry=0x44a50e105d40, sink=sink@entry=0x0) at /data/Server/bb-10.5-marko/sql/sql_insert.cc:2107
2107 else if (unlikely((error=table->file->ha_write_row(table->record[0]))))
This was the last change for which the change buffer page had been written out. From this point onwards, the actions of the recovery must have diverged from the modifications to the page.
Because the LSN range is very wide here, we’d better ignore accesses of block->page.oldest_modification and instead check each change of PAGE_HEAP_TOP of the page.
(rr) break buf_page_read_complete thread 1
Breakpoint 3 at 0x55de45a541b3: file /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc, line 4100.
(rr) disable 2
(rr) reverse-continue
Continuing.
Thread 1 hit Breakpoint 3, buf_page_read_complete (bpage=0x55de4575e8c8 <mtr_t::memo_push(void*, mtr_memo_type_t)+230>, node=...) at /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:4100
4100 {
(rr) n
4101 const page_id_t id(bpage->id());
(rr)
4117 const byte *frame= bpage->zip.data
(rr)
4118 ? bpage->zip.data
(rr)
4117 const byte *frame= bpage->zip.data
(rr)
4123 if (!buf_page_decrypt_after_read(bpage, node))
At the time of the failed recovery, PAGE_HEAP_TOP was 0xf4a (3914). At the time of the page flush (as well as at the start of the recovery), the field was 0x8f3 (2291).
The contents of the page only slightly differs between the start of the recovery and what ought to be the corresponding point of logical time before the SIGKILL, in the last 8 bytes of the page. The least 32 bits of the LSN and the page checksum are not updated until the page is about to be written from the buffer pool. I saved the srv_page_size=4096 bytes of dump binary memory and compared the files:
So, we should have a sound starting point for examining the divergence.
Interestingly, near the start of the recovery, we have a page_create_low() that will reset the PAGE_HEAP_TOP to 0x7d (125). The PAGE_HEAP_TOP will grow to 0x8ba (2059) until it is reset in another page_create_low(). It will monotonically grow until we seemingly get a glitch: 3878, 3932, 4055, 3932, 4043, 2305, 2414, …, 3803, 3914, 4025, 3914, (failure).
In the rr replay trace that was terminated by SIGKILL, we can observe the changes of PAGE_HEAP_TOP as follows:
(rr) set $f=block->frame
(rr) display/x $f[40]@2
3: /x $f[40]@2 = {0x8, 0xf3}
(rr) watch -l $f[40]@2
Hardware watchpoint 3: -location $f[40]@2
(rr) c
Continuing.
[Switching to Thread 314150.314741]
Thread 18 hit Hardware watchpoint 3: -location $f[40]@2
Old value = <incomplete sequence \363>
New value = "\000"
__memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:261
261 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
2: /x log_sys.lsn._M_i = 0xa33520
3: /x $f[40]@2 = {0x0, 0x0}
(rr) command 3
Type commands for breakpoint(s) 3, one per line.
End with a line saying just "end".
>when
>c
>end
(rr) c
Continuing.
(rr) c
Everything seems to be in sync, and also the first monotonicity glitch turns out to be benign. First, the PAGE_HEAP_TOP increases to 4025 because the change buffer tree shrinks due to a change buffer merge during purge (and the records from an adjacent change buffer page are merged to our page). The decerase to 3914 occurs because the very last record of the page is being merged to an index page that is being accessed by dict_stats_analyze_index() (updating persistent statistics).
The real divergence occurs later. I saved some rr replay output of monitoring changes to PAGE_HEAP_TOP and compared them:
For some reason, recovery did not log updates of PAGE_HEAP_TOP from 234 to 2196. The last record must be what recovery was supposed to apply, but did not due to the flagged corruption. Let us check whether redo logging was properly enabled during this page_create_low():
Thread 45 hit Hardware watchpoint 3: -location $f[40]@2
Old value = "\017\\"
New value = <incomplete sequence \313>
mach_write_to_2 (b=0x64000047b028 <incomplete sequence \313>, n=4043) at /data/Server/bb-10.5-marko/storage/innobase/include/mach0data.ic:62
62 }
2: /x log_sys.lsn._M_i = 0xa4f621
3: /x $f[40]@2 = {0xf, 0xcb}
Current event: 598722
[Switching to Thread 314150.314325]
Thread 8 hit Hardware watchpoint 3: -location $f[40]@2
Old value = <incomplete sequence \313>
New value = "\000"
__memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:261
261 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
2: /x log_sys.lsn._M_i = 0xa6230e
3: /x $f[40]@2 = {0x0, 0x0}
Current event: 614654
Thread 8 hit Hardware watchpoint 3: -location $f[40]@2
Old value = "\000"
New value = "\000}"
page_create_low (block=0x64000007d178, comp=false) at /data/Server/bb-10.5-marko/storage/innobase/page/page0page.cc:311
That call is within a change buffer page reorganize:
#3 0x000055c213bff4a4 in page_create (block=0x64000007d178, mtr=0x1ff90f650b70, comp=false) at /data/Server/bb-10.5-marko/storage/innobase/page/page0page.cc:331
#4 0x000055c213dd847e in btr_page_reorganize_low (cursor=0x1ff90f64f740, index=0x616000006ff0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:1371
#5 0x000055c213dda7c2 in btr_page_reorganize_block (z_level=6, block=0x64000007d178, index=0x616000006ff0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:1606
#6 0x000055c213dea411 in btr_can_merge_with_page (cursor=0x1ff90f650800, page_no=1160, merge_block=0x1ff90f64f940, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:5167
#7 0x000055c213de1e48 in btr_compress (cursor=0x1ff90f650800, adjust=0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:3483
#8 0x000055c213e100d7 in btr_cur_compress_if_useful (cursor=0x1ff90f650800, adjust=0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:5468
#9 0x000055c213e11b57 in btr_cur_pessimistic_delete (err=0x1ff90f650690, has_reserved_extents=1, cursor=0x1ff90f650800, flags=0, rollback=false, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:5908
#10 0x000055c213b4e8e9 in ibuf_delete_rec (page_id=..., pcur=0x1ff90f650800, search_tuple=0x616000874bf0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:4125
#11 0x000055c213b50024 in ibuf_merge_or_delete_for_page (block=0x640000043928, page_id=..., zip_size=0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:4471
Starting with MDEV-21725, the page reorganize is being logged in an optimized way, instead of being logged as individual inserts. Before MDEV-12353, the operation was covered by a logical log record.
Immediately after the page reorganize operation, the page contents between recovery and the original write differ as follows:
On recovery, the FIL_PAGE_LSN is 0xa2d701, which was the initial FIL_PAGE_LSN of the page. The field might only be updated by recovery after all log has been applied to the page. We have the LSN and checksum difference at the end of the page as well. It is nothing to worry about.
Similarly, the page contents immediately before the failed log apply are equivalent:
The least significant bit of the first differing byte encodes the reuse flag. Let us check which string was present in the ib_logfile0 before recovery was started. (Note: this will miss records that would span the block boundary.)
od -Ax -t x1 -w512 ib_logfile0 |grep '20 53 00 84 08 05 89 dc 2c'|cut -d\ -f1
Surprisingly, we find exactly one match for both. The first match occurs at the end of the 512-byte block that starts at file offset 0xa70800. The last 4 bytes are a log block checksum:
This looks like an exact match of the record that we wrote. Before the NUL byte that marks the end of the minitransaction, there is another record: a WRITE to the same page.
Now, let us look at the other match, which looks like what recovery was reading. It is located in the 512-byte block starting at byte offset 0xa68c00 of ib_logfile0:
It is an exact match of the record that was read by recovery! The problem appears to be in the write side after all. I must debug further to find out when those bytes were written, and for which mini-transaction.
Marko Mäkelä
added a comment - Below, I am documenting my workflow at finding the root cause of this failure. Hopefully it will serve an educational purpose. I believe that with rr replay traces of the killed server and misbehaving recovered server, as well as a copy of the data directory before the recovery was started, any recovery bug can be diagnosed.
When we fail to apply the log record, it is for page 1160 in the system tablespace. We can also determine the LSN:
#2 0x000055de458007c8 in page_apply_insert_redundant (block=..., reuse=true, prev=2652, enc_hdr=44, hdr_c=11, data_c=10, data=0x25592084fe0d, data_len=88)
at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:2321
2321 ib::error() << (reuse
(rr) p/x block.frame[16]@8
$1 = {0x0, 0x0, 0x0, 0x0, 0x0, 0xa2, 0xd7, 0x1}
(rr) up
#3 0x000055de457b21e1 in log_phys_t::apply (this=0x25592084fde8, block=..., last_offset=@0x606001178b6c: 0) at /data/Server/bb-10.5-marko/storage/innobase/log/log0recv.cc:397
397 if (page_apply_insert_redundant(block, subtype & 1, prev_rec,
(rr) p/x *this
$2 = {<log_rec_t> = {next = 0x25592084fe68, lsn = 0xa6a6b2}, start_lsn = 0xa6a64a}
Based on this recovery failure, for the rr replay execution trace of the server that ended in SIGKILL we must check every modification to the page between LSN 0xa2d701 and 0xa6a6b2. The log record that we failed to apply had been written by the following (and I do not yet conclude that there was anything wrong with those writes):
#7 0x000055c213bccd96 in mtr_t::commit (this=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/mtr/mtr0mtr.cc:442
#8 0x000055c213b3f878 in ibuf_mtr_commit (mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/include/ibuf0ibuf.ic:64
#9 0x000055c213b4ae1b in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x61a000844508, entry_size=77, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0)
at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3466
#10 0x000055c213b4b2f7 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x61a000844508, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0)
at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3603
#11 0x000055c213e02647 in btr_cur_search_to_nth_level_func (index=0x6160000f03f0, level=0, tuple=0x61a000844508, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x208943901ab0, ahi_latch=0x0,
file=0x55c214795e40 "/data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc", line=2974, mtr=0x208943901e70, autoinc=0) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:1650
#12 0x000055c213c8b3e2 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x6160000f03f0, offsets_heap=0x619000cb9d80, heap=0x619000cb8e80, entry=0x61a000844508, trx_id=0, thr=0x621004e35de0)
at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:2971
#13 0x000055c213c8c59f in row_ins_sec_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0, check_foreign=true) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3294
#14 0x000055c213c8c784 in row_ins_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3342
#15 0x000055c213c8d14d in row_ins_index_entry_step (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3509
#16 0x000055c213c8d6b6 in row_ins (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3668
#17 0x000055c213c8de86 in row_ins_step (thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3818
#18 0x000055c213cb83ea in row_insert_for_mysql (mysql_rec=0x61f0001f80b8 "\377f", prebuilt=0x6220002b2170, ins_mode=ROW_INS_NORMAL) at /data/Server/bb-10.5-marko/storage/innobase/row/row0mysql.cc:1419
#19 0x000055c213a3b359 in ha_innobase::write_row (this=0x61d0008638b8, record=0x61f0001f80b8 "\377f") at /data/Server/bb-10.5-marko/storage/innobase/handler/ha_innodb.cc:7669
…
(rr) frame 21
#21 0x000055c2129b21fd in write_record (thd=thd@entry=0x62b0000af218, table=table@entry=0x619000bad198, info=info@entry=0x208943903d40, sink=sink@entry=0x0) at /data/Server/bb-10.5-marko/sql/sql_insert.cc:2107
2107 else if (unlikely((error=table->file->ha_write_row(table->record[0]))))
(rr) p thd.query_string
$5 = {string = {str = 0x62b0000b6238 "INSERT IGNORE INTO `t1` ( `pk`, `col_char` ) VALUES ( NULL, 'f' ) /* E_R Thread1 QNO 1506 CON_ID 16 */", length = 102}, cs = 0x55c21527dc60 <my_charset_latin1>}
(rr) display
2: /x log_sys.lsn._M_i = 0xa6a6b2
(rr) when
Current event: 623570
(rr) reverse-continue
Continuing.
Thread 18 hit Hardware access (read/write) watchpoint 1: *(ulong*)0x64000007d1c0
Value = 9683160
0x000055c213b3a125 in std::__atomic_base<unsigned long>::load (__m=std::memory_order_relaxed, this=0x64000007d1c0) at /usr/include/c++/9/bits/atomic_base.h:419
419 return __atomic_load_n(&_M_i, int(__m));
2: /x log_sys.lsn._M_i = 0xa6a64a
(rr) bt
#0 0x000055c213b3a125 in std::__atomic_base<unsigned long>::load (__m=std::memory_order_relaxed, this=0x64000007d1c0) at /usr/include/c++/9/bits/atomic_base.h:419
#1 Atomic_counter<unsigned long>::operator unsigned long (this=0x64000007d1c0) at /data/Server/bb-10.5-marko/include/my_counter.h:45
#2 0x000055c213b39ac2 in buf_page_t::oldest_modification (this=0x64000007d178) at /data/Server/bb-10.5-marko/storage/innobase/include/buf0buf.h:936
#3 0x000055c213b537da in mtr_t::is_block_dirtied (block=0x64000007d178) at /data/Server/bb-10.5-marko/storage/innobase/include/mtr0mtr.ic:35
#4 0x000055c213b5385c in mtr_t::memo_push (this=0x2089438ffd00, object=0x64000007d178, type=MTR_MEMO_PAGE_X_FIX) at /data/Server/bb-10.5-marko/storage/innobase/include/mtr0mtr.ic:57
#5 0x000055c213e45123 in buf_page_mtr_lock (block=0x64000007d178, rw_latch=2, mtr=0x2089438ffd00, file=0x55c21482fc20 "/data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc", line=370)
at /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:2866
#6 0x000055c213e461de in buf_page_get_low (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=10, file=0x55c21482fc20 "/data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc", line=370,
mtr=0x2089438ffd00, err=0x2089438fe430, allow_ibuf_merge=false) at /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:3413
#7 0x000055c213e4671a in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=10, file=0x55c21482fc20 "/data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc", line=370,
mtr=0x2089438ffd00, err=0x2089438fe430, allow_ibuf_merge=false) at /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:3482
#8 0x000055c213ad6899 in btr_block_get_func (index=..., page=1160, mode=2, merge=true, file=0x55c21482fc20 "/data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc", line=370, mtr=0x2089438ffd00)
at /data/Server/bb-10.5-marko/storage/innobase/include/btr0btr.h:237
#9 0x000055c213dfd9ff in btr_cur_latch_leaves (block=0x64000007d178, latch_mode=36, cursor=0x2089438ffb50, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:370
#10 0x000055c213e0333a in btr_cur_search_to_nth_level_func (index=0x616000006ff0, level=0, tuple=0x619000cb93f0, mode=PAGE_CUR_LE, latch_mode=36, cursor=0x2089438ffb50, ahi_latch=0x0,
file=0x55c21472bfa0 "/data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, mtr=0x2089438ffd00, autoinc=0) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:1848
#11 0x000055c213b407c8 in btr_pcur_open_low (index=0x616000006ff0, level=0, tuple=0x619000cb93f0, mode=PAGE_CUR_LE, latch_mode=36, cursor=0x2089438ffb50,
file=0x55c21472bfa0 "/data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc", line=3276, autoinc=0, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/include/btr0pcur.ic:441
#12 0x000055c213b4a525 in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x61a000844508, entry_size=77, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0)
at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3276
…
(rr) when
Current event: 623569
It seems that the same mini-transaction is modifying the change buffer page two times. The recovery failure occurred for one of those 2 log records.
Let us follow the trace in the forward direction from an earlier point of time:
(rr) display
2: /x log_sys.lsn._M_i = 0x890396
(rr) cond 1 log_sys.lsn._M_i>=0xa2d701
(rr) continue
Thread 45 hit Hardware access (read/write) watchpoint 1: *(ulong*)0x64000007d1c0
Value = 9683160
std::__atomic_base<unsigned long>::load (__m=std::memory_order_relaxed, this=0x64000007d1c0) at /usr/include/c++/9/bits/atomic_base.h:419
419 return __atomic_load_n(&_M_i, int(__m));
2: /x log_sys.lsn._M_i = 0xa2d701
(rr) when
Current event: 575927
(rr) frame 26
#26 0x000055c2129b21fd in write_record (thd=thd@entry=0x62b000150218, table=table@entry=0x6190012de598, info=info@entry=0x44a50e105d40, sink=sink@entry=0x0) at /data/Server/bb-10.5-marko/sql/sql_insert.cc:2107
2107 else if (unlikely((error=table->file->ha_write_row(table->record[0]))))
(rr) p thd.query_string
$6 = {string = {str = 0x62b000157238 "INSERT IGNORE INTO `t1` ( `pk` ) VALUES ( 10090 ) /* E_R Thread6 QNO 1528 CON_ID 21 */", length = 86}, cs = 0x55c21527dc60 <my_charset_latin1>}
This was the last change for which the change buffer page had been written out. From this point onwards, the actions of the recovery must have diverged from the modifications to the page.
Because the LSN range is very wide here, we’d better ignore accesses of block->page.oldest_modification and instead check each change of PAGE_HEAP_TOP of the page.
(rr) break buf_page_read_complete thread 1
Breakpoint 3 at 0x55de45a541b3: file /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc, line 4100.
(rr) disable 2
(rr) reverse-continue
Continuing.
Thread 1 hit Breakpoint 3, buf_page_read_complete (bpage=0x55de4575e8c8 <mtr_t::memo_push(void*, mtr_memo_type_t)+230>, node=...) at /data/Server/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:4100
4100 {
(rr) n
4101 const page_id_t id(bpage->id());
(rr)
4117 const byte *frame= bpage->zip.data
(rr)
4118 ? bpage->zip.data
(rr)
4117 const byte *frame= bpage->zip.data
(rr)
4123 if (!buf_page_decrypt_after_read(bpage, node))
(rr) p/x frame[40]@2
$6 = {0x8, 0xf3}
(rr) set $f=frame
(rr) display/x $f[40]@2
1: /x $f[40]@2 = {0x8, 0xf3}
(rr) dump binary memory recovered-page.bin $f $f+srv_page_size
(rr) enable
(rr) continue
At the time of the failed recovery, PAGE_HEAP_TOP was 0xf4a (3914). At the time of the page flush (as well as at the start of the recovery), the field was 0x8f3 (2291).
The contents of the page only slightly differs between the start of the recovery and what ought to be the corresponding point of logical time before the SIGKILL, in the last 8 bytes of the page. The least 32 bits of the LSN and the page checksum are not updated until the page is about to be written from the buffer pool. I saved the srv_page_size=4096 bytes of dump binary memory and compared the files:
diff -u <(od -Ax -t x1 recover_start.bin) <(od -Ax -t x1 before_kill.bin)
--- /dev/fd/63 2021-05-31 06:10:33.779551414 +0000
+++ /dev/fd/62 2021-05-31 06:10:33.779551414 +0000
@@ -109,5 +109,5 @@
0008f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
000fe0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 74
-000ff0 05 b3 03 ff 02 4b 00 65 00 a2 d7 01 8e 48 0a 5d
+000ff0 05 b3 03 ff 02 4b 00 65 00 00 00 00 00 00 00 00
001000
So, we should have a sound starting point for examining the divergence.
Interestingly, near the start of the recovery, we have a page_create_low() that will reset the PAGE_HEAP_TOP to 0x7d (125). The PAGE_HEAP_TOP will grow to 0x8ba (2059) until it is reset in another page_create_low() . It will monotonically grow until we seemingly get a glitch: 3878, 3932, 4055, 3932, 4043, 2305 , 2414, …, 3803, 3914, 4025 , 3914, (failure).
In the rr replay trace that was terminated by SIGKILL, we can observe the changes of PAGE_HEAP_TOP as follows:
(rr) set $f=block->frame
(rr) display/x $f[40]@2
3: /x $f[40]@2 = {0x8, 0xf3}
(rr) watch -l $f[40]@2
Hardware watchpoint 3: -location $f[40]@2
(rr) c
Continuing.
[Switching to Thread 314150.314741]
Thread 18 hit Hardware watchpoint 3: -location $f[40]@2
Old value = <incomplete sequence \363>
New value = "\000"
__memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:261
261 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
2: /x log_sys.lsn._M_i = 0xa33520
3: /x $f[40]@2 = {0x0, 0x0}
(rr) command 3
Type commands for breakpoint(s) 3, one per line.
End with a line saying just "end".
>when
>c
>end
(rr) c
Continuing.
(rr) c
Everything seems to be in sync, and also the first monotonicity glitch turns out to be benign. First, the PAGE_HEAP_TOP increases to 4025 because the change buffer tree shrinks due to a change buffer merge during purge (and the records from an adjacent change buffer page are merged to our page). The decerase to 3914 occurs because the very last record of the page is being merged to an index page that is being accessed by dict_stats_analyze_index() (updating persistent statistics).
The real divergence occurs later. I saved some rr replay output of monitoring changes to PAGE_HEAP_TOP and compared them:
diff -u <(sed -ne 's/.*n=\([1-9][0-9]*\)) at.*/\1/p' heap-top-recovered|uniq) <(sed -ne 's/.*n=\([1-9][0-9]*\)) at.*/\1/p' heap-top-write|uniq)
@@ -57,6 +57,25 @@
4055
3932
4043
+234
+…
+2196
2305
2414
2523
@@ -75,3 +94,4 @@
3914
4025
3914
+4023
For some reason, recovery did not log updates of PAGE_HEAP_TOP from 234 to 2196. The last record must be what recovery was supposed to apply, but did not due to the flagged corruption. Let us check whether redo logging was properly enabled during this page_create_low() :
Thread 45 hit Hardware watchpoint 3: -location $f[40]@2
Old value = "\017\\"
New value = <incomplete sequence \313>
mach_write_to_2 (b=0x64000047b028 <incomplete sequence \313>, n=4043) at /data/Server/bb-10.5-marko/storage/innobase/include/mach0data.ic:62
62 }
2: /x log_sys.lsn._M_i = 0xa4f621
3: /x $f[40]@2 = {0xf, 0xcb}
Current event: 598722
[Switching to Thread 314150.314325]
Thread 8 hit Hardware watchpoint 3: -location $f[40]@2
Old value = <incomplete sequence \313>
New value = "\000"
__memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:261
261 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
2: /x log_sys.lsn._M_i = 0xa6230e
3: /x $f[40]@2 = {0x0, 0x0}
Current event: 614654
Thread 8 hit Hardware watchpoint 3: -location $f[40]@2
Old value = "\000"
New value = "\000}"
page_create_low (block=0x64000007d178, comp=false) at /data/Server/bb-10.5-marko/storage/innobase/page/page0page.cc:311
311 memcpy(page + PAGE_DATA, infimum_supremum_redundant,
2: /x log_sys.lsn._M_i = 0xa6230e
3: /x $f[40]@2 = {0x0, 0x7d}
Current event: 614654
That call is within a change buffer page reorganize:
#3 0x000055c213bff4a4 in page_create (block=0x64000007d178, mtr=0x1ff90f650b70, comp=false) at /data/Server/bb-10.5-marko/storage/innobase/page/page0page.cc:331
#4 0x000055c213dd847e in btr_page_reorganize_low (cursor=0x1ff90f64f740, index=0x616000006ff0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:1371
#5 0x000055c213dda7c2 in btr_page_reorganize_block (z_level=6, block=0x64000007d178, index=0x616000006ff0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:1606
#6 0x000055c213dea411 in btr_can_merge_with_page (cursor=0x1ff90f650800, page_no=1160, merge_block=0x1ff90f64f940, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:5167
#7 0x000055c213de1e48 in btr_compress (cursor=0x1ff90f650800, adjust=0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:3483
#8 0x000055c213e100d7 in btr_cur_compress_if_useful (cursor=0x1ff90f650800, adjust=0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:5468
#9 0x000055c213e11b57 in btr_cur_pessimistic_delete (err=0x1ff90f650690, has_reserved_extents=1, cursor=0x1ff90f650800, flags=0, rollback=false, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:5908
#10 0x000055c213b4e8e9 in ibuf_delete_rec (page_id=..., pcur=0x1ff90f650800, search_tuple=0x616000874bf0, mtr=0x1ff90f650b70) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:4125
#11 0x000055c213b50024 in ibuf_merge_or_delete_for_page (block=0x640000043928, page_id=..., zip_size=0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:4471
Starting with MDEV-21725 , the page reorganize is being logged in an optimized way, instead of being logged as individual inserts. Before MDEV-12353 , the operation was covered by a logical log record.
Immediately after the page reorganize operation, the page contents between recovery and the original write differ as follows:
diff -u <(od -Ax -t x1 /dev/shm/r-after-reorg.bin) <(od -Ax -t x1 /dev/shm/w-after-reorg.bin)
--- /dev/fd/63 2021-05-31 08:20:19.561244871 +0000
+++ /dev/fd/62 2021-05-31 08:20:19.561244871 +0000
@@ -1,5 +1,5 @@
000000 00 00 00 00 00 00 04 88 00 00 04 89 00 00 04 87
-000010 00 00 00 00 00 a2 d7 01 45 bf 00 00 00 00 00 00
+000010 00 00 00 00 00 a5 07 15 45 bf 00 00 00 00 00 00
000020 00 00 00 00 00 00 00 05 08 94 00 15 00 00 00 00
000030 08 33 00 02 00 12 00 13 00 00 00 00 00 00 21 5b
000040 00 00 ff ff ff ff 00 00 00 00 ff ff ff ff 00 00
@@ -112,5 +112,5 @@
0008a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
000fe0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 74
-000ff0 05 38 03 84 01 d0 00 65 00 a2 d7 01 8e 48 0a 5d
+000ff0 05 38 03 84 01 d0 00 65 00 00 00 00 00 00 00 00
001000
On recovery, the FIL_PAGE_LSN is 0xa2d701, which was the initial FIL_PAGE_LSN of the page. The field might only be updated by recovery after all log has been applied to the page. We have the LSN and checksum difference at the end of the page as well. It is nothing to worry about.
Similarly, the page contents immediately before the failed log apply are equivalent:
@@ -1,5 +1,5 @@
000000 00 00 00 00 00 00 04 88 00 00 04 89 ff ff ff ff
-000010 00 00 00 00 00 a2 d7 01 45 bf 00 00 00 00 00 00
+000010 00 00 00 00 00 a6 49 aa 45 bf 00 00 00 00 00 00
000020 00 00 00 00 00 00 00 0a 0f 4a 00 25 0e ea 00 6f
000030 00 00 00 02 00 23 00 22 00 00 00 00 00 00 22 d5
000040 00 00 ff ff ff ff 00 00 00 00 ff ff ff ff 00 00
@@ -195,5 +195,5 @@
000ef0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
000fe0 00 00 00 00 00 74 0d bc 0c 08 0a 54 08 a0 06 ec
-000ff0 05 38 03 84 01 d0 00 65 00 a2 d7 01 8e 48 0a 5d
+000ff0 05 38 03 84 01 d0 00 65 00 00 00 00 00 00 00 00
001000
So, the problem is only with the last record. We must look closer at this:
Current event: 630234
(rr) bt
#0 mach_write_to_2 (b=0x64000047b028 "\017\267", n=4023) at /data/Server/bb-10.5-marko/storage/innobase/include/mach0data.ic:62
#1 0x000055c213bf9f52 in page_mem_alloc_heap<>(buf_block_t *, ulint, ulint *) (block=0x64000007d178, need=109, heap_no=0x1f9a2c2ca1d0) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1071
#2 0x000055c213bf1062 in page_cur_insert_rec_low (cur=0x1f9a2c2cab58, index=0x616000006ff0, rec=0x6110003ae9fc "", offsets=0x61600066e1f0, mtr=0x1f9a2c2cad00) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1388
#3 0x000055c213dfb53d in page_cur_tuple_insert (cursor=0x1f9a2c2cab58, tuple=0x61900228cbf0, index=0x616000006ff0, offsets=0x1f9a2c2caab0, heap=0x1f9a2c2caa90, n_ext=0, mtr=0x1f9a2c2cad00) at /data/Server/bb-10.5-marko/storage/innobase/include/page0cur.ic:285
#4 0x000055c213e09bfd in btr_cur_optimistic_insert (flags=3, cursor=0x1f9a2c2cab50, offsets=0x1f9a2c2caab0, heap=0x1f9a2c2caa90, entry=0x61900228cbf0, rec=0x1f9a2c2caaf0, big_rec=0x1f9a2c2caa70, n_ext=0, thr=0x621004e35de0, mtr=0x1f9a2c2cad00) at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:3562
#5 0x000055c213b4aad5 in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x61a000844508, entry_size=77, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3403
#6 0x000055c213b4b2f7 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x61a000844508, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3603
A glaring difference is that reuse=false becomes reuse=true on recovery:
…
(rr) step
1595 mtr->page_insert(*block, reuse,
(rr) step
mtr_t::page_insert (this=0x64000047bf52, block=..., reuse=false, prev_rec=1, info_bits=0 '\000', n_fields_s=94292043288653, hdr_c=5, data_c=8, hdr=0x64000047bf4a "aY\031\t\005\004", hdr_l=1, data=0x6110003aea04 "\200", data_l=89) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1103
The actual mystery is: why the log that was written differs from what was buffered by recovery:
{0x20, 0x55, 0x0, 0x84, 0x8, 0x4, 0x8c, 0xd7, 0x2c, 0x5, 0x8, 0x61, 0x80, 0x0, 0x0, 0x0, 0x1, 0x83, 0xfe, 0x0, 0x40, 0x80, 0x3f, 0x86, 0x8, 0x0, 0x8, 0x80, 0x0, 0x62, 0x0 <repeats 69 times>, 0xf8, 0xfd}
{0x20, 0x53, 0x0, 0x84, 0x8, 0x5, 0x89, 0xdc, 0x2c, 0x5, 0xa, 0x61, 0x2, 0x0, 0x1, 0x83, 0xfe, 0x0, 0x40, 0x80, 0x3f, 0x86, 0x8, 0x0, 0x8, 0x80, 0x0, 0x66, 0x0 <repeats 69 times>, 0xf8, 0xfa}
The least significant bit of the first differing byte encodes the reuse flag. Let us check which string was present in the ib_logfile0 before recovery was started. (Note: this will miss records that would span the block boundary.)
od -Ax -t x1 -w512 ib_logfile0 |grep '20 55 00 84 08 04 8c d7 2c'|cut -d\ -f1
od -Ax -t x1 -w512 ib_logfile0 |grep '20 53 00 84 08 05 89 dc 2c'|cut -d\ -f1
Surprisingly, we find exactly one match for both. The first match occurs at the end of the 512-byte block that starts at file offset 0xa70800. The last 4 bytes are a log block checksum:
20 55 00 84 08 04 8c d7 2c 05 08 61 80 00 00 15 84 96 bc
The record continues after the 4-byte checksum and the 12-byte header of the following block:
00 01 83 fe 00 40 80 3f 86 08 00 08 80 00 62 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f8 fd b2 27 e2 00
This looks like an exact match of the record that we wrote. Before the NUL byte that marks the end of the minitransaction, there is another record: a WRITE to the same page.
Now, let us look at the other match, which looks like what recovery was reading. It is located in the 512-byte block starting at byte offset 0xa68c00 of ib_logfile0 :
od -Ax -t x1 -w512 -N 99 -j 0xa68c4a ib_logfile0
a68c4a 20 53 00 84 08 05 89 dc 2c 05 0a 61 02 00 01 83 fe 00 40 80 3f 86 08 00 08 80 00 66 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f8 fa
It is an exact match of the record that was read by recovery! The problem appears to be in the write side after all. I must debug further to find out when those bytes were written, and for which mini-transaction.
Because no wrap-around of the redo log has taken place (the payload of ib_logfile0 file ends at 0xa99000), it should be rather simple to map the LSN to byte offsets in the file. Right before the SIGKILL, we got the following writes to the log file:
Thread 23 hit Breakpoint 2, log_write_buf (buf=0x6d6302ac9000 "\200", len=512, start_lsn=11115520, new_data_offset=376) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562
562 ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0);
Current event: 670464
[Switching to Thread 314150.340446]
Thread 1 hit Breakpoint 2, log_write_buf (buf=0x4b191636c000 "\200", len=2048, start_lsn=11115520, new_data_offset=430) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562
562 ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0);
Current event: 670870
[Switching to Thread 314150.340452]
Thread 31 hit Breakpoint 2, log_write_buf (buf=0x6d6302ac9000 "\200", len=512, start_lsn=11117056, new_data_offset=116) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562
562 ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0);
Current event: 671278
[Switching to Thread 314150.340446]
Thread 1 hit Breakpoint 2, log_write_buf (buf=0x4b191636c000 "\200", len=1024, start_lsn=11117056, new_data_offset=312) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562
562 ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0);
Current event: 671593
Thread 18 received signal SIGKILL, Killed.
At the time of the SIGKILL, the log_sys.lsn was 0xa9a526. The last start_lsn above is 0xa9a200, and the log_sys.write_lsn is 0xa9a526. It looks like the LSN is the file byte offset plus 0x1a00 (6656). Hence, the record for LSN 0xa723ed should be in the file at offset 0xa709ed. The unexpected record at 0xa68c4a must correspond to the LSN 0xa6a64a. Here we have it:
1: /x log_sys.lsn._M_i = 0xa6a64a
Current event: 623570
(rr) bt
#0 mtr_t::finish_write (this=0x55c213ae4877 <ilist<mtr_buf_t::block_t, void>::back() const+217>, len=35773916118888) at /data/Server/bb-10.5-marko/storage/innobase/mtr/mtr0mtr.cc:863
#1 0x000055c213bcc798 in mtr_t::commit (this=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/mtr/mtr0mtr.cc:408
#2 0x000055c213b3f878 in ibuf_mtr_commit (mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/include/ibuf0ibuf.ic:64
#3 0x000055c213b4ae1b in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x61a000844508, entry_size=77, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3466
#4 0x000055c213b4b2f7 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x61a000844508, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3603
at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:1650
#6 0x000055c213c8b3e2 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x6160000f03f0, offsets_heap=0x619000cb9d80, heap=0x619000cb8e80, entry=0x61a000844508, trx_id=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:2971
#7 0x000055c213c8c59f in row_ins_sec_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0, check_foreign=true) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3294
#8 0x000055c213c8c784 in row_ins_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3342
#9 0x000055c213c8d14d in row_ins_index_entry_step (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3509
#10 0x000055c213c8d6b6 in row_ins (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3668
#11 0x000055c213c8de86 in row_ins_step (thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3818
#12 0x000055c213cb83ea in row_insert_for_mysql (mysql_rec=0x61f0001f80b8 "\377f", prebuilt=0x6220002b2170, ins_mode=ROW_INS_NORMAL) at /data/Server/bb-10.5-marko/storage/innobase/row/row0mysql.cc:1419
#13 0x000055c213a3b359 in ha_innobase::write_row (this=0x61d0008638b8, record=0x61f0001f80b8 "\377f") at /data/Server/bb-10.5-marko/storage/innobase/handler/ha_innodb.cc:7669
This was not found during my initial search, because I only watched the changes of the header field PAGE_HEAP_TOP. A little before the mtr_t::commit(), we actually did write the record that recovery failed to apply (note: reuse=true):
Again, only the LSN and checksum fields differ. Recovery failed, because the reuse of the record at the top of the PAGE_FREE stack exceeded the PAGE_HEAP_TOP 0xf4a. Let us check if that actually was the case. The PAGE_FREE points to 0xeea, which is the only record in the PAGE_FREE stack. PAGE_GARBAGE is 0x6f (111) bytes. At the time of writing a log record for the write, we notably have a negative free_offset:
0x000055c213bf22fe in page_cur_insert_rec_low (cur=0x2089438ffb58, index=0x616000006ff0, rec=0x611000383e3c "", offsets=0x61600006a2f0, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1595
We are attempting to insert a record of 12+97 bytes, while the record in the PAGE_FREE stack was 15+96 bytes. The origin of the record will move by 3 bytes from 0xeea to 0xee7 accordingly (see insert_rec above). All seems well and good on the write side. It is the recovery side that appears to be broken. Note: this LSN 0xa6a64a matches the start_lsn in the recovery.
Marko Mäkelä
added a comment - Because no wrap-around of the redo log has taken place (the payload of ib_logfile0 file ends at 0xa99000), it should be rather simple to map the LSN to byte offsets in the file. Right before the SIGKILL, we got the following writes to the log file:
Thread 23 hit Breakpoint 2, log_write_buf (buf=0x6d6302ac9000 "\200", len=512, start_lsn=11115520, new_data_offset=376) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562
562 ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0);
Current event: 670464
[Switching to Thread 314150.340446]
Thread 1 hit Breakpoint 2, log_write_buf (buf=0x4b191636c000 "\200", len=2048, start_lsn=11115520, new_data_offset=430) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562
562 ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0);
Current event: 670870
[Switching to Thread 314150.340452]
Thread 31 hit Breakpoint 2, log_write_buf (buf=0x6d6302ac9000 "\200", len=512, start_lsn=11117056, new_data_offset=116) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562
562 ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0);
Current event: 671278
[Switching to Thread 314150.340446]
Thread 1 hit Breakpoint 2, log_write_buf (buf=0x4b191636c000 "\200", len=1024, start_lsn=11117056, new_data_offset=312) at /data/Server/bb-10.5-marko/storage/innobase/log/log0log.cc:562
562 ut_a(len % OS_FILE_LOG_BLOCK_SIZE == 0);
Current event: 671593
Thread 18 received signal SIGKILL, Killed.
At the time of the SIGKILL, the log_sys.lsn was 0xa9a526. The last start_lsn above is 0xa9a200, and the log_sys.write_lsn is 0xa9a526 . It looks like the LSN is the file byte offset plus 0x1a00 (6656). Hence, the record for LSN 0xa723ed should be in the file at offset 0xa709ed. The unexpected record at 0xa68c4a must correspond to the LSN 0xa6a64a. Here we have it:
1: /x log_sys.lsn._M_i = 0xa6a64a
Current event: 623570
(rr) bt
#0 mtr_t::finish_write (this=0x55c213ae4877 <ilist<mtr_buf_t::block_t, void>::back() const+217>, len=35773916118888) at /data/Server/bb-10.5-marko/storage/innobase/mtr/mtr0mtr.cc:863
#1 0x000055c213bcc798 in mtr_t::commit (this=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/mtr/mtr0mtr.cc:408
#2 0x000055c213b3f878 in ibuf_mtr_commit (mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/include/ibuf0ibuf.ic:64
#3 0x000055c213b4ae1b in ibuf_insert_low (mode=36, op=IBUF_OP_INSERT, no_counter=0, entry=0x61a000844508, entry_size=77, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3466
#4 0x000055c213b4b2f7 in ibuf_insert (op=IBUF_OP_INSERT, entry=0x61a000844508, index=0x6160000f03f0, page_id=..., zip_size=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3603
#5 0x000055c213e02647 in btr_cur_search_to_nth_level_func (index=0x6160000f03f0, level=0, tuple=0x61a000844508, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x208943901ab0, ahi_latch=0x0, file=0x55c214795e40 "/data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc", line=2974, mtr=0x208943901e70, autoinc=0)
at /data/Server/bb-10.5-marko/storage/innobase/btr/btr0cur.cc:1650
#6 0x000055c213c8b3e2 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x6160000f03f0, offsets_heap=0x619000cb9d80, heap=0x619000cb8e80, entry=0x61a000844508, trx_id=0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:2971
#7 0x000055c213c8c59f in row_ins_sec_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0, check_foreign=true) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3294
#8 0x000055c213c8c784 in row_ins_index_entry (index=0x6160000f03f0, entry=0x61a000844508, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3342
#9 0x000055c213c8d14d in row_ins_index_entry_step (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3509
#10 0x000055c213c8d6b6 in row_ins (node=0x6220002b30f0, thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3668
#11 0x000055c213c8de86 in row_ins_step (thr=0x621004e35de0) at /data/Server/bb-10.5-marko/storage/innobase/row/row0ins.cc:3818
#12 0x000055c213cb83ea in row_insert_for_mysql (mysql_rec=0x61f0001f80b8 "\377f", prebuilt=0x6220002b2170, ins_mode=ROW_INS_NORMAL) at /data/Server/bb-10.5-marko/storage/innobase/row/row0mysql.cc:1419
#13 0x000055c213a3b359 in ha_innobase::write_row (this=0x61d0008638b8, record=0x61f0001f80b8 "\377f") at /data/Server/bb-10.5-marko/storage/innobase/handler/ha_innodb.cc:7669
This was not found during my initial search, because I only watched the changes of the header field PAGE_HEAP_TOP . A little before the mtr_t::commit() , we actually did write the record that recovery failed to apply (note: reuse=true ):
#0 mtr_t::page_insert (this=0x2089438ffd00, block=..., reuse=true, prev_rec=2652, info_bits=0 '\000', n_fields_s=13, hdr_c=5, data_c=10, hdr=0x64000047bedb "aY\031\t\005\004", hdr_l=1, data=0x611000383e46 "\002", data_l=87) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1118
#1 0x000055c213bf2303 in page_cur_insert_rec_low (cur=0x2089438ffb58, index=0x616000006ff0, rec=0x611000383e3c "", offsets=0x61600006a2f0, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1595
Now, let us compare the ‘before’ images of the page, before this mini-transaction was started, and before we failed to apply the log record:
diff -u <(od -Ax -t x1 recovery-before-last.bin) <(od -Ax -t x1 write-before.bin)
--- /dev/fd/63 2021-05-31 11:43:47.758752427 +0000
+++ /dev/fd/62 2021-05-31 11:43:47.758752427 +0000
@@ -1,5 +1,5 @@
000000 00 00 00 00 00 00 04 88 00 00 04 89 ff ff ff ff
-000010 00 00 00 00 00 a2 d7 01 45 bf 00 00 00 00 00 00
+000010 00 00 00 00 00 a6 4e aa 45 bf 00 00 00 00 00 00
000020 00 00 00 00 00 00 00 0a 0f 4a 00 25 0e ea 00 6f
000030 00 00 00 02 00 23 00 22 00 00 00 00 00 00 22 d5
000040 00 00 ff ff ff ff 00 00 00 00 ff ff ff ff 00 00
@@ -195,5 +195,5 @@
000ef0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
000fe0 00 00 00 00 00 74 0d bc 0c 08 0a 54 08 a0 06 ec
-000ff0 05 38 03 84 01 d0 00 65 00 a2 d7 01 8e 48 0a 5d
+000ff0 05 38 03 84 01 d0 00 65 00 00 00 00 00 00 00 00
001000
Again, only the LSN and checksum fields differ. Recovery failed, because the reuse of the record at the top of the PAGE_FREE stack exceeded the PAGE_HEAP_TOP 0xf4a. Let us check if that actually was the case. The PAGE_FREE points to 0xeea, which is the only record in the PAGE_FREE stack. PAGE_GARBAGE is 0x6f (111) bytes. At the time of writing a log record for the write, we notably have a negative free_offset :
0x000055c213bf22fe in page_cur_insert_rec_low (cur=0x2089438ffb58, index=0x616000006ff0, rec=0x611000383e3c "", offsets=0x61600006a2f0, mtr=0x2089438ffd00) at /data/Server/bb-10.5-marko/storage/innobase/page/page0cur.cc:1595
1595 mtr->page_insert(*block, reuse,
1: /x log_sys.lsn._M_i = 0xa6a64a
(rr) i lo
r = 0x611000383e46 "\002"
c = 0x64000047bacb "\001"
c_end = 0x64000047bb22 "aY\031\t\005\004"
data_common = 10
block = 0x64000007d178
rec_size = 109
reuse = true
free_offset = -3
heap_no = 36
insert_buf = 0x64000047bedb "aY\031\t\005\004"
comp = false
extra_size = 12
next_rec = 0x64000047bc08 ""
page_last_insert = 0x64000047b030 <incomplete sequence \347>
last_insert = 0
page_n_recs = 0x64000047b036 ""
insert_rec = 0x64000047bee7 ""
data_size = 97
hdr_common = 5
n_owned = 4
info_status = 0 '\000'
We are attempting to insert a record of 12+97 bytes, while the record in the PAGE_FREE stack was 15+96 bytes. The origin of the record will move by 3 bytes from 0xeea to 0xee7 accordingly (see insert_rec above). All seems well and good on the write side. It is the recovery side that appears to be broken. Note: this LSN 0xa6a64a matches the start_lsn in the recovery.
After long debugging, I think that I found the bug. It is obvious when you look at the preceding and following lines. This affects the 10.5 and 10.6 series only.
- if (UNIV_UNLIKELY(free_rec + data_size > heap_top))
+ if (UNIV_UNLIKELY(free_rec + fdata_size > heap_top))
goto corrupted;
if (UNIV_UNLIKELY(extra_size + data_size > fextra_size + fdata_size))
goto corrupted;
Marko Mäkelä
added a comment - After long debugging, I think that I found the bug. It is obvious when you look at the preceding and following lines. This affects the 10.5 and 10.6 series only.
diff --git a/storage/innobase/page/page0cur.cc b/storage/innobase/page/page0cur.cc
index 025d4b9b967..f75d1a8be77 100644
--- a/storage/innobase/page/page0cur.cc
+++ b/storage/innobase/page/page0cur.cc
@@ -2443,7 +2443,7 @@ bool page_apply_insert_redundant(const buf_block_t &block, bool reuse,
if (UNIV_UNLIKELY(free_rec - fextra_size < heap_bot))
goto corrupted;
const ulint fdata_size= rec_get_data_size_old(free_rec);
- if (UNIV_UNLIKELY(free_rec + data_size > heap_top))
+ if (UNIV_UNLIKELY(free_rec + fdata_size > heap_top))
goto corrupted;
if (UNIV_UNLIKELY(extra_size + data_size > fextra_size + fdata_size))
goto corrupted;
MDEV-25031was an error that affected the recovery from a crash or a physical backup in 10.5.In 20210520-clusterjoin-failure-mysqld.log
it indeed seems that a 10.5.10 server is failing to start up:
mariadb-10.5.10
2021-05-20 13:24:02 0 [Note] InnoDB: Starting final batch to recover 28591 pages from redo log.
2021-05-20 13:24:02 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=2366]
Can you provide a copy of the data directory so that I can diagnose this? I would need at least the ib_logfile0 and the first 2367 pages of the system tablespace file(s).