[MDEV-27052] InnoDB: Failing assertion: state == TRX_STATE_NOT_STARTED || (relaxed && thd_get_error_number(trx->mysql_thd)) Created: 2021-11-15  Updated: 2024-01-12  Resolved: 2024-01-08

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Partitioning, Storage Engine - InnoDB
Affects Version/s: 10.6.6
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Cannot Reproduce Votes: 0
Labels: need_rr


 Description   

When assigning
   --innodb_rollback_on_timeout=ON  (the default is since some time "off")
   to the server startup
a significant fraction of RQG runs harvest after
1. Bootstrap + Start Server
2. Generate some initial data
when running some concurrent DDL/DML mix on
    origin/10.6 e58a312e42ad4df55fd9d25d8793f8cf1d2d45cf 2021-11-12T00:33:48+01:00
an assert like
[rr 3988585 246964]2021-11-15 07:33:15 0x7f5b82d0a700[rr 3988585 246967]  InnoDB: Assertion failure in file /data/Server/10.6K/storage/innobase/include/trx0trx.ic line 65
[rr 3988585 246969]InnoDB: Failing assertion: state == TRX_STATE_NOT_STARTED || (relaxed && thd_get_error_number(trx->mysql_thd))
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f5ba23ce859 in __GI_abort () at abort.c:79
#2  0x000055b3b93dfedf in ut_dbg_assertion_failed (expr=0x55b3ba603a80 "state == TRX_STATE_NOT_STARTED || (relaxed && thd_get_error_number(trx->mysql_thd))", file=0x55b3ba603900 "/data/Server/10.6K/storage/innobase/include/trx0trx.ic", line=65)
    at /data/Server/10.6K/storage/innobase/ut/ut0dbg.cc:60
#3  0x000055b3b9052acc in trx_state_eq (trx=0x7f5b966879e8, state=TRX_STATE_ACTIVE, relaxed=false) at /data/Server/10.6K/storage/innobase/include/trx0trx.ic:65
#4  0x000055b3b9085bb6 in rw_trx_hash_t::validate_element (trx=0x7f5b966879e8) at /data/Server/10.6K/storage/innobase/include/trx0sys.h:510
#5  0x000055b3b93c4a22 in rw_trx_hash_t::insert (this=0x55b3bb70e540 <trx_sys+33152>, trx=0x7f5b966879e8) at /data/Server/10.6K/storage/innobase/include/trx0sys.h:671
#6  0x000055b3b93c4c1d in trx_sys_t::register_rw (this=0x55b3bb7063c0 <trx_sys>, trx=0x7f5b966879e8) at /data/Server/10.6K/storage/innobase/include/trx0sys.h:1102
#7  0x000055b3b93c32ba in trx_set_rw_mode (trx=0x7f5b966879e8) at /data/Server/10.6K/storage/innobase/trx/trx0trx.cc:2154
#8  0x000055b3b906e187 in lock_table (table=0x618000000d08, mode=LOCK_X, thr=0x61a00032dd08) at /data/Server/10.6K/storage/innobase/lock/lock0lock.cc:3506
#9  0x000055b3b906f494 in lock_table_for_trx (table=0x618000000d08, trx=0x7f5b966879e8, mode=LOCK_X) at /data/Server/10.6K/storage/innobase/lock/lock0lock.cc:3667
#10 0x000055b3b8f0a9f3 in ha_innobase::rename_table (this=0x6290009dd238, from=0x7f5b82d03530 "./test/#sql-alter-3cdc69-f#P#p0", to=0x7f5b82d037c0 "./test/r#P#p0") at /data/Server/10.6K/storage/innobase/handler/ha_innodb.cc:14057
#11 0x000055b3b8602eeb in handler::ha_rename_table (this=0x6290009dd238, from=0x7f5b82d03530 "./test/#sql-alter-3cdc69-f#P#p0", to=0x7f5b82d037c0 "./test/r#P#p0") at /data/Server/10.6K/sql/handler.cc:5301
#12 0x000055b3b8dd6434 in ha_partition::del_ren_table (this=0x62b00012b2a8, from=0x7f5b82d03c80 "./test/#sql-alter-3cdc69-f", to=0x7f5b82d03ec0 "./test/r") at /data/Server/10.6K/sql/ha_partition.cc:2484
#13 0x000055b3b8dcc807 in ha_partition::rename_table (this=0x62b00012b2a8, from=0x7f5b82d03c80 "./test/#sql-alter-3cdc69-f", to=0x7f5b82d03ec0 "./test/r") at /data/Server/10.6K/sql/ha_partition.cc:662
#14 0x000055b3b8602eeb in handler::ha_rename_table (this=0x62b00012b2a8, from=0x7f5b82d03c80 "./test/#sql-alter-3cdc69-f", to=0x7f5b82d03ec0 "./test/r") at /data/Server/10.6K/sql/handler.cc:5301
#15 0x000055b3b807988f in mysql_rename_table (base=0x615000008f18, old_db=0x7f5b82d06170, old_name=0x7f5b82d061a0, new_db=0x7f5b82d06170, new_name=0x7f5b82d06190, id=0x7f5b82d061c0, flags=1) at /data/Server/10.6K/sql/sql_table.cc:4990
#16 0x000055b3b809ef81 in mysql_alter_table (thd=0x62b00011f218, new_db=0x62b000123c18, new_name=0x62b000124030, create_info=0x7f5b82d075d0, table_list=0x62b0001263c8, alter_info=0x7f5b82d074a0, order_num=0, order=0x0, ignore=false, if_exists=false)
    at /data/Server/10.6K/sql/sql_table.cc:10599
#17 0x000055b3b822a02b in Sql_cmd_alter_table::execute (this=0x62b000126d78, thd=0x62b00011f218) at /data/Server/10.6K/sql/sql_alter.cc:538
#18 0x000055b3b7e2eb1e in mysql_execute_command (thd=0x62b00011f218, is_called_from_prepared_stmt=false) at /data/Server/10.6K/sql/sql_parse.cc:5997
#19 0x000055b3b7e3afde in mysql_parse (thd=0x62b00011f218, rawbuf=0x62b000126238 "ALTER TABLE r PARTITION BY KEY(`col_int_key`) PARTITIONS 2 /* E_R Thread2 QNO 262 CON_ID 15 */", length=94, parser_state=0x7f5b82d08b20) at /data/Server/10.6K/sql/sql_parse.cc:8030
#20 0x000055b3b7e1325b in dispatch_command (command=COM_QUERY, thd=0x62b00011f218, packet=0x629000c4e219 "ALTER TABLE r PARTITION BY KEY(`col_int_key`) PARTITIONS 2 /* E_R Thread2 QNO 262 CON_ID 15 */ ", packet_length=95, blocking=true)
    at /data/Server/10.6K/sql/sql_parse.cc:1896
#21 0x000055b3b7e10633 in do_command (thd=0x62b00011f218, blocking=true) at /data/Server/10.6K/sql/sql_parse.cc:1404
#22 0x000055b3b8210477 in do_handle_one_connection (connect=0x6080000031b8, put_in_cache=true) at /data/Server/10.6K/sql/sql_connect.cc:1418
#23 0x000055b3b820fd08 in handle_one_connection (arg=0x6080000031b8) at /data/Server/10.6K/sql/sql_connect.cc:1312
#24 0x00007f5ba28f8609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#25 0x00007f5ba24cb293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) quit
 
sdp:/data/Results/1636989998/TBR-1240/dev/shm/vardir/1636989998/59/1/rr
 
RQG
====
# git clone https://github.com/mleich1/rqg --branch experimental RQG
#
# GIT_SHOW: HEAD -> experimental, origin/experimental abe8101e536bea9274b60ea3f49f66cdf4d71a23 2021-11-15T16:24:25+01:00
# rqg.pl  : Version 3.4 (2021-09)
#
# $RQG_HOME/rqg.pl \
# --grammar=conf/mariadb/partitions_innodb.yy \
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --no-mask \
# --queries=10000000 \
# --seed=random \
# --reporters=Backtrace \
# --reporters=ErrorLog \
# --reporters=Deadlock1 \
# --validators=None \
# --mysqld=--log_output=none \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --engine=InnoDB \
# --restart_timeout=240 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --duration=300 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--loose-innodb-sync-debug \
# --mysqld=--innodb_stats_persistent=off \
# --mysqld=--innodb_adaptive_hash_index=off \
# --mysqld=--log-bin \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
# --mysqld=--loose-max-statement-time=30 \
# --threads=9 \
# --mysqld=--innodb-use-native-aio=0 \
# --mysqld=--loose-gdb \
# --mysqld=--loose-debug-gdb \
# --mysqld=--loose-skip-stack-trace \
# --rr=Extended \
# --rr_options=--wait \
# --mysqld=--innodb_rollback_on_timeout=ON \
# --mysqld=--innodb_page_size=16K \
# --mysqld=--innodb-buffer-pool-size=8M \
# --no_mask \
# <some local settings>



 Comments   
Comment by Marko Mäkelä [ 2022-08-02 ]

Unfortunately, I had neglected this too long, and the trace is no longer available. The stack trace suggests some ALTER TABLE on a partitioned table.

Comment by Matthias Leich [ 2024-01-08 ]

The history of tests starts on
- pluto 2020-12-02      Since than including that date the current problem was never hit at all
- sdp    2023-04-14      Since than including that date the current problem was never hit at all.
                                          Anything before that including /data/Results/1636989998/TBR-1240  from 2021-11-15
                                          was lost when some OS upgrade was performed.
Therefore I assume that
- hitting the current problem is extreme rare in general
- either the problem was fixed by the patch for some other MDEV or the assert was modified or
  its just too rare.

Generated at Thu Feb 08 09:49:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.