[MDEV-31559] btr_search_hash_table_validate() does not check if CHECK TABLE is killed Created: 2023-06-27  Updated: 2023-07-03  Resolved: 2023-07-03

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.14
Fix Version/s: 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2, 11.2.1

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: rr-profile


 Description   

Partial server freeze observed on
origin/bb-10.6-MDEV-31343 f410444a76b5703b803e6e43e3bbf9509b6fddb6 2023-05-25T17:22:33+03:00
 
Scenario:
1. Start the server + generate some initial data
2. Several connection run a mix of DML and CHECK TABLE.
After some time the processlist shows for all these connections the query
CHECK TABLE <some table> EXTENDED with the state killed.
The reason for the "killed" is most probably max_statement_time = 30.
None of these connections got some response of the server,
 
sdp:/data1/results/1685024678/Freeze$ _RR_TRACE_DIR=./1/rr rr replay --mark-stdio
 
RQG
====
# git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
#
# GIT_SHOW: HEAD -> master, origin/master, origin/HEAD 9b710460ff8c54260b0e0649f029efa932aee22c 2023-05-23T14:33:37+02:00
# rqg.pl  : Version 4.2.1 (2022-12)
#
# $RQG_HOME/rqg.pl \
# --grammar=conf/mariadb/table_stress_innodb_dml.yy \
# --gendata=conf/mariadb/table_stress.zz \
# --gendata_sql=conf/mariadb/table_stress.sql \
# --reporters=RestartConsistency \
# --mysqld=--innodb_max_undo_log_size=10M \
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --no_mask \
# --queries=10000000 \
# --seed=random \
# --reporters=None \
# --reporters=ErrorLog \
# --reporters=Deadlock \
# --validators=None \
# --mysqld=--log_output=none \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --engine=InnoDB \
# --restart_timeout=240 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--plugin-load-add=provider_lzo.so \
# --mysqld=--plugin-load-add=provider_bzip2.so \
# --mysqld=--plugin-load-add=provider_lzma.so \
# --mysqld=--plugin-load-add=provider_snappy.so \
# --mysqld=--plugin-load-add=provider_lz4.so \
# --mysqld=--loose-innodb_compression_level=1 \
# --duration=900 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--loose-innodb-sync-debug \
# --mysqld=--innodb_stats_persistent=on \
# --mysqld=--innodb_adaptive_hash_index=on \
# --mysqld=--innodb_random_read_ahead=OFF \
# --mysqld=--log-bin \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=on \
# --mysqld=--loose-max-statement-time=30 \
# --threads=129 \
# --mysqld=--innodb-use-native-aio=0 \
# --mysqld=--loose-gdb \
# --mysqld=--loose-debug-gdb \
# --rr=Extended \
# --rr_options=--wait \
# --mysqld=--innodb_undo_log_truncate=ON \
# --mysqld=--innodb_undo_tablespaces=3 \
# --mysqld=--innodb_rollback_on_timeout=OFF \
# --mysqld=--innodb_page_size=32K \
# --mysqld=--innodb-buffer-pool-size=32G \
# <local settings>



 Comments   
Comment by Matthias Leich [ 2023-07-03 ]

origin/bb-10.6-MDEV-31559 3d9014385926096264e36bb3a0960660d61d0d1d 2023-06-30T17:07:21+03:00 performed well in RQG testing.
There were some partial server freezes (maybe cause by MDEV-31393 and other reasons). But the processlists showed never
a session with some "killed" CHECK TABLE and a runtime exceeding the wait threshold in RQG. So the current problem is fixed.

Generated at Thu Feb 08 10:24:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.