Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.6, 10.11, 11.4, 11.7
-
None
Description
The failure happens both with MTR and RQG, but both tests are very non-deterministic (even though there is no explicit concurrency in either test, obviously it happens due to background activities), and unfortunately the failure seems completely non-rr-able, at least I haven't gotten a single failure under rr in a large number of attempts, either with MTR or with RQG, on different machines, builds, and versions. So, these two tests is the best I can provide for now.
I suspect that the problem may be related to MDEV-33575 and similar issues.
10.6 f5aed7457348022fb96295b8ca9ecae91f54797d |
2024-12-03 22:52:33 10 [Note] InnoDB: WSREP: BF lock wait long for trx:0x21057 error: Success query: OPTIMIZE TABLE mysql.gtid_slave_pos NOWAIT
|
------- TRX HAS BEEN WAITING 75 us FOR THIS LOCK TO BE GRANTED:
|
mariadbd: /data/bld/10.6-debug/storage/innobase/include/lock0lock.h:812: void lock_sys_t::assert_locked() const: Assertion `latch.have_wr()' failed.
|
241203 22:52:33 [ERROR] mysqld got signal 6 ;
|
|
#9 0x00007f6c6cc53e32 in __GI___assert_fail (assertion=0x55dc910d5898 "latch.have_wr()", file=0x55dc910d58a8 "/data/bld/10.6-debug/storage/innobase/include/lock0lock.h", line=812, function=0x55dc910d58e8 "void lock_sys_t::assert_locked() const") at ./assert/assert.c:101
|
#10 0x000055dc907bd9fd in lock_sys_t::assert_locked (this=0x55dc922e2480 <lock_sys>) at /data/bld/10.6-debug/storage/innobase/include/lock0lock.h:812
|
#11 0x000055dc907af6c5 in lock_table_print (file=0x7f6c6cdf2680 <_IO_2_1_stderr_>, lock=0x7f6c58675380) at /data/bld/10.6-debug/storage/innobase/lock/lock0lock.cc:4997
|
#12 0x000055dc9079d977 in wsrep_is_BF_lock_timeout (trx=...) at /data/bld/10.6-debug/storage/innobase/lock/lock0lock.cc:685
|
#13 0x000055dc907a341c in lock_wait (thr=0x7f6c1c036698) at /data/bld/10.6-debug/storage/innobase/lock/lock0lock.cc:2215
|
#14 0x000055dc908f7ef9 in row_mysql_handle_errors (new_err=0x7f6c5837c474, trx=0x7f6c58674e80, thr=0x7f6c1c036698, savept=0x0) at /data/bld/10.6-debug/storage/innobase/row/row0mysql.cc:688
|
#15 0x000055dc907ab28c in lock_table_for_trx (table=0x7f6c3000d408, trx=0x7f6c58674e80, mode=LOCK_X, no_wait=false) at /data/bld/10.6-debug/storage/innobase/lock/lock0lock.cc:4148
|
#16 0x000055dc90ad0efd in dict_stats_save (table=0x7f6c1c9c06c8, only_for_index=0x0) at /data/bld/10.6-debug/storage/innobase/dict/dict0stats.cc:2957
|
#17 0x000055dc90ad3273 in dict_stats_update (table=0x7f6c1c9c06c8, stats_upd_option=DICT_STATS_RECALC_PERSISTENT) at /data/bld/10.6-debug/storage/innobase/dict/dict0stats.cc:3743
|
#18 0x000055dc906e61b4 in ha_innobase::info_low (this=0x7f6c1c043860, flag=28, is_analyze=true) at /data/bld/10.6-debug/storage/innobase/handler/ha_innodb.cc:14807
|
#19 0x000055dc906e6d38 in ha_innobase::analyze (this=0x7f6c1c043860) at /data/bld/10.6-debug/storage/innobase/handler/ha_innodb.cc:15120
|
#20 0x000055dc902723b9 in handler::ha_analyze (this=0x7f6c1c043860, thd=0x7f6c1c000dc8, check_opt=0x7f6c1c0064d0) at /data/bld/10.6-debug/sql/handler.cc:5272
|
#21 0x000055dc90087646 in mysql_admin_table (thd=0x7f6c1c000dc8, tables=0x7f6c1c018288, check_opt=0x7f6c1c0064d0, operator_name=0x55dc918583f0 <msg_optimize>, lock_type=TL_WRITE, org_open_for_modify=true, repair_table_use_frm=false, extra_open_options=0, prepare_func=0x0, operator_func=(int (handler::*)(handler * const, THD *, HA_CHECK_OPT *)) 0x55dc902721cc <handler::ha_optimize(THD*, st_ha_check_opt*)>, view_operator_func=0x0, is_cmd_replicated=true) at /data/bld/10.6-debug/sql/sql_admin.cc:1259
|
#22 0x000055dc90088aa6 in Sql_cmd_optimize_table::execute (this=0x7f6c1c018b20, thd=0x7f6c1c000dc8) at /data/bld/10.6-debug/sql/sql_admin.cc:1618
|
#23 0x000055dc8fe9fa0f in mysql_execute_command (thd=0x7f6c1c000dc8, is_called_from_prepared_stmt=false) at /data/bld/10.6-debug/sql/sql_parse.cc:6152
|
#24 0x000055dc8fea5d64 in mysql_parse (thd=0x7f6c1c000dc8, rawbuf=0x7f6c1c018180 "OPTIMIZE TABLE mysql.gtid_slave_pos NOWAIT", length=42, parser_state=0x7f6c5837f380) at /data/bld/10.6-debug/sql/sql_parse.cc:8194
|
#25 0x000055dc8fea5408 in wsrep_mysql_parse (thd=0x7f6c1c000dc8, rawbuf=0x7f6c1c018180 "OPTIMIZE TABLE mysql.gtid_slave_pos NOWAIT", length=42, parser_state=0x7f6c5837f380) at /data/bld/10.6-debug/sql/sql_parse.cc:8005
|
#26 0x000055dc8fe91140 in dispatch_command (command=COM_QUERY, thd=0x7f6c1c000dc8, packet=0x7f6c1c00bc29 "", packet_length=42, blocking=true) at /data/bld/10.6-debug/sql/sql_parse.cc:1895
|
#27 0x000055dc8fe8fb91 in do_command (thd=0x7f6c1c000dc8, blocking=true) at /data/bld/10.6-debug/sql/sql_parse.cc:1421
|
#28 0x000055dc9006b508 in do_handle_one_connection (connect=0x55dc948a19a8, put_in_cache=true) at /data/bld/10.6-debug/sql/sql_connect.cc:1407
|
#29 0x000055dc9006b289 in handle_one_connection (arg=0x55dc9488abe8) at /data/bld/10.6-debug/sql/sql_connect.cc:1319
|
#30 0x000055dc905d0aea in pfs_spawn_thread (arg=0x55dc94746378) at /data/bld/10.6-debug/storage/perfschema/pfs.cc:2201
|
#31 0x00007f6c6cca8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
|
#32 0x00007f6c6cd2861c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
|
The tests need wsrep and thus a galera library. Apparently, any galera library would do, I tried with whatever is installed from Debian repos as well as with our latest galera releases, the result is the same, but it is needed just so that wsrep initializes.
MTR test
For reproducing purposes only! Don't put it into the regression suite. Note that it already has a loop inside, but it still needs to be run with a large --repeat=N. I usually run it with --repeat=100, and it fails for me eventually, but it's difficult to predict when – sometimes in the first couple iterations (1-2 minutes), and sometimes it takes almost an hour. As said above, never got it fail with rr.
perl ./mtr bug.latch1 --mem --mysqld=--wsrep-provider=/tmp/galera-26.4.20.so --mysqld=--wsrep-cluster-address=gcomm:// --mysqld=--binlog-format=row --mysqld=--wsrep-on --nocheck-testcases --repeat=100
|
Remember to set the correct path to the galera library!
--source include/wait_until_ready.inc
|
--source include/have_innodb.inc
|
--source include/have_sequence.inc
|
--source include/have_binlog_format_row.inc
|
|
ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB; |
FLUSH TABLES;
|
|
CREATE TABLE t (pk INT AUTO_INCREMENT PRIMARY KEY, f INT) ENGINE=InnoDB; |
INSERT INTO t SELECT seq, seq FROM seq_1_to_100; |
|
--let $run=10000
|
|
while ($run)
|
{
|
OPTIMIZE TABLE mysql.gtid_slave_pos NOWAIT; |
UPDATE t SET f = DEFAULT; |
--dec $run |
}
|
|
# Cleanup
|
DROP TABLE t; |
RQG test
It also needs to be run with --trials=N as shown in the command line below, usually fails for me in 3-5 trials.
Remember to set the correct basedir and path to the galera library.
git clone https://github.com/MariaDB/randgen --branch mdev35527 rqg-mdev35527
|
cd rqg-mdev35527
|
perl ./run.pl --duration=100 --threads=1 --engine=InnoDB --scenario=Standard --gendata=simple --queries=1000000 --grammar=mdev35527.yy --mysqld=--wsrep-cluster-address=gcomm:// --mysqld=--binlog-format=row --mysqld=--wsrep-on --nometadata-reload --user=root --trials=10 --output="latch.have_wr" --vardir=/dev/shm/var-mdev35527 --mysqld=--wsrep-provider=/tmp/galera-26.4.20.so --basedir=/data/bld/10.6-debug
|