Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
N/A
Description
origin/10.5 690fcfbd2954f9470047ce754c2a5c2fdfd15cf3 2023-02-14T16:27:21+00:00
|
Build with RelWithDebInfo+ASAN
|
|
Scenario:
|
1. Start the server, one session generate some initial data
|
2. Several sessions run concurrent DML only
|
3. During 2. is ongoing SIGKILL the server intentionally
|
4. Restart attempt
|
The last ends with
|
# 2023-02-15T12:37:23 [3163608] | [rr 3292492 3889]2023-02-15 12:35:46 0 [Note] InnoDB: Opened 3 undo tablespaces
|
# 2023-02-15T12:37:23 [3163608] | [rr 3292492 3893]2023-02-15 12:35:46 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=151914371,159538155
|
# 2023-02-15T12:37:23 [3163608] | [rr 3292492 4003]2023-02-15 12:35:47 0 [Note] InnoDB: Starting a batch to recover 396 pages from redo log.
|
# 2023-02-15T12:37:23 [3163608] | [rr 3292492 5466]2023-02-15 12:35:47 0 [ERROR] InnoDB: Not applying UNDO_APPEND due to corruption on [page id: space=2, page number=390]
|
# 2023-02-15T12:37:23 [3163608] | [rr 3292492 5470]2023-02-15 12:35:47 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
|
# 2023-02-15T12:37:23 [3163608] | [rr 3292492 5657]2023-02-15 12:35:47 0 [ERROR] [FATAL] InnoDB: Trying to write 8192 bytes at 5242880 outside the bounds of the file: .//undo002
|
# 2023-02-15T12:37:23 [3163608] | [rr 3292492 5678]230215 12:35:47 [rr 3292492 5680][ERROR] mysqld got signal 6 ;
|
|
pluto:/data/results/1676463901/TBR-1795$ _RR_TRACE_DIR=./1/rr/ rr replay --mark-stdio mysqld-1
|
# Fate of the server till SIGKILL
|
pluto:/data/results/1676463901/TBR-1795$ _RR_TRACE_DIR=./1/rr/ rr replay --mark-stdio mysqld-2
|
# Fate of the server during the restart failing around recovery
|
pluto:/data/results/1676463901/TBR-1795/1/fbackup
|
# Backup of the data directory after SIGKILL and before the restart
|
|
RQG
|
# git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
|
#
|
# GIT_SHOW: HEAD -> master, origin/master, origin/HEAD b82f03cf948740e0040c4d6cb4ac52e46b66c3b2 2023-02-14T12:49:41+01:00
|
# rqg.pl : Version 4.2.1 (2022-12)
|
#
|
# $RQG_HOME/rqg.pl \
|
# --grammar=conf/engines/many_indexes.yy \
|
# --gendata=conf/engines/many_indexes.zz \
|
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
|
# --mysqld=--loose-idle_write_transaction_timeout=0 \
|
# --mysqld=--loose-idle_transaction_timeout=0 \
|
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
|
# --mysqld=--connect_timeout=60 \
|
# --mysqld=--interactive_timeout=28800 \
|
# --mysqld=--slave_net_timeout=60 \
|
# --mysqld=--net_read_timeout=30 \
|
# --mysqld=--net_write_timeout=60 \
|
# --mysqld=--loose-table_lock_wait_timeout=50 \
|
# --mysqld=--wait_timeout=28800 \
|
# --mysqld=--lock-wait-timeout=86400 \
|
# --mysqld=--innodb-lock-wait-timeout=50 \
|
# --no-mask \
|
# --queries=10000000 \
|
# --seed=random \
|
# --reporters=Backtrace \
|
# --reporters=ErrorLog \
|
# --reporters=Deadlock \
|
# --validators=None \
|
# --mysqld=--log_output=none \
|
# --mysqld=--log_bin_trust_function_creators=1 \
|
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
|
# --engine=InnoDB \
|
# --restart_timeout=360 \
|
# --mysqld=--plugin-load-add=file_key_management.so \
|
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
|
# --mysqld=--plugin-load-add=provider_lzo.so \
|
# --mysqld=--plugin-load-add=provider_bzip2.so \
|
# --mysqld=--plugin-load-add=provider_lzma.so \
|
# --mysqld=--plugin-load-add=provider_snappy.so \
|
# --mysqld=--plugin-load-add=provider_lz4.so \
|
# --duration=300 \
|
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
|
# --mysqld=--loose-innodb_read_only_compressed=OFF \
|
# --reporters=CrashRecovery \
|
# --duration=300 \
|
# --mysqld=--innodb_stats_persistent=on \
|
# --mysqld=--innodb_adaptive_hash_index=on \
|
# --mysqld=--innodb_sort_buffer_size=65536 \
|
# --redefine=conf/mariadb/redefine_checks_off.yy \
|
# --mysqld=--log-bin \
|
# --mysqld=--sync-binlog=1 \
|
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
|
# --mysqld=--loose-max-statement-time=30 \
|
# --threads=9 \
|
# --mysqld=--innodb-use-native-aio=0 \
|
# --mysqld=--loose-gdb \
|
# --mysqld=--loose-debug-gdb \
|
# --rr=Extended \
|
# --rr_options=--chaos --wait \
|
# --mysqld=--innodb_undo_log_truncate=ON \
|
# --mysqld=--innodb_undo_tablespaces=3 \
|
# --mysqld=--innodb_rollback_on_timeout=ON \
|
# --mysqld=--innodb_page_size=8K \
|
# --mysqld=--innodb-buffer-pool-size=8M \
|
# <local settings>
|
|
Attachments
Issue Links
- is caused by
-
MDEV-30479 OPT_PAGE_CHECKSUM mismatch during recovery after innodb_undo_log_truncate=ON
-
- Closed
-
During recovery, the frame of the undo log page 2:0x186 is completely filled with NUL bytes. The LSN is as follows:
#3 0x0000558068012852 in log_phys_t::apply (this=0x3087344da178, block=...,
last_offset=@0x60600005746c: 0)
at /data/Server/10.5/storage/innobase/log/log0recv.cc:310
310 if (undo_append(block, ++l, --rlen) && !srv_force_recovery)
(rr) p *this
$3 = {<log_rec_t> = {next = 0x3087344da938, lsn = 151916320}, start_lsn = 151914684}
Applying the log fails on the very first record, whose length is 1587 bytes. The LSN difference is 49 bytes more, but that could be explained by the terminating NUL byte and 3*(12+4) bytes of framing for three 512-byte log blocks. So, it looks like there is only 1 record in that mini-transaction.
In the rr replay trace of the killed server, we should be able to find when this log record was written and what happened to the undo log page between that time and the server kill. It is somewhere between these times:
1: log_sys.lsn._M_i = 143624406
(rr) when
Current event: 886248
(rr) c
Continuing.
…
Thread 22 stopped.
[Switching to Thread 3166493.3176833]
0x0000773201a12288 in do_breakpoint (value=182)
at /home/roc/rr/rr/src/preload/syscallbuf.c:1085
1085 /home/roc/rr/rr/src/preload/syscallbuf.c: No such file or directory.
1: log_sys.lsn._M_i = 154782176
(rr) when
Current event: 940406