[MDEV-24642] Assertion `r->emplace(pthread_self()).second' failed in sux_lock<srw>::s_lock_register() Created: 2021-01-21  Updated: 2021-01-25  Resolved: 2021-01-25

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: N/A
Fix Version/s: 10.6.0

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: debug, performance, rr-profile

Issue Links:
Problem/Incident
is caused by MDEV-24142 rw_lock_t has unnecessarily complex w... Closed

 Description   

origin/bb-10.6-MDEV-24637 22a15eb42589f98fdc305e04ab297a880f1c39ca 2021-01-21T17:02:13+05:30
 
mysqld: /Server/bb-10.6-MDEV-24637/storage/innobase/include/sux_lock.h:140: void sux_lock<srw>::s_lock_register() [with srw = ssux_lock_low]: Assertion `r->emplace(pthread_self()).second' failed.
...
Query (0x62b000015238): SELECT `col_varchar_255_ucs2_key` FROM `table100_innodb_int_autoinc`
Status: KILL_TIMEOUT
 
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000000070027859 in __GI_abort () at abort.c:79
#2  0x0000000070027729 in __assert_fail_base (fmt=0x701bd588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55f9b6d150a0 "r->emplace(pthread_self()).second", file=0x55f9b6d0e360 "/Server/bb-10.6-MDEV-24637/storage/innobase/include/sux_lock.h", line=140, 
    function=<optimized out>) at assert.c:92
#3  0x0000000070038f36 in __GI___assert_fail (assertion=0x55f9b6d150a0 "r->emplace(pthread_self()).second", file=0x55f9b6d0e360 "/Server/bb-10.6-MDEV-24637/storage/innobase/include/sux_lock.h", line=140, 
    function=0x55f9b6d15100 "void sux_lock<srw>::s_lock_register() [with srw = ssux_lock_low]") at assert.c:101
#4  0x000055f9b553f698 in sux_lock<ssux_lock_low>::s_lock_register (this=0x3e5a5ed31260) at /Server/bb-10.6-MDEV-24637/storage/innobase/include/sux_lock.h:140
#5  0x000055f9b58dcdc1 in sux_lock<ssux_lock_low>::s_lock_try (this=0x3e5a5ed31260) at /Server/bb-10.6-MDEV-24637/storage/innobase/include/sux_lock.h:204
#6  0x000055f9b58d239a in btr_search_guess_on_hash (index=0x616000885a20, info=0x61a0000675b8, tuple=0x623000043240, mode=4, latch_mode=1, cursor=0x623000042c80, ahi_latch=0x0, mtr=0x48836a6125c0) at /Server/bb-10.6-MDEV-24637/storage/innobase/btr/btr0sea.cc:1105
#7  0x000055f9b588e5c6 in btr_cur_search_to_nth_level_func (index=0x616000885a20, level=0, tuple=0x623000043240, mode=PAGE_CUR_LE, latch_mode=1, cursor=0x623000042c80, ahi_latch=0x0, mtr=0x48836a6125c0, autoinc=0)
    at /Server/bb-10.6-MDEV-24637/storage/innobase/btr/btr0cur.cc:1417
#8  0x000055f9b5715f31 in btr_pcur_open_with_no_init_func (index=0x616000885a20, tuple=0x623000043240, mode=PAGE_CUR_LE, latch_mode=1, cursor=0x623000042c80, ahi_latch=0x0, mtr=0x48836a6125c0) at /Server/bb-10.6-MDEV-24637/storage/innobase/include/btr0pcur.ic:497
#9  0x000055f9b57277a8 in Row_sel_get_clust_rec_for_mysql::operator() (this=0x48836a611ba0, prebuilt=0x6230000429a0, sec_index=0x6160008bf920, rec=0x3e5a5f7183aa "\200", thr=0x6250007ca3a0, out_rec=0x48836a611a40, offsets=0x48836a611ac0, offset_heap=0x48836a611aa0, 
    vrow=0x0, mtr=0x48836a6125c0) at /Server/bb-10.6-MDEV-24637/storage/innobase/row/row0sel.cc:3264
#10 0x000055f9b5734aae in row_search_mvcc (buf=0x6260001ec138 ")(\002", mode=PAGE_CUR_G, prebuilt=0x6230000429a0, match_mode=0, direction=1) at /Server/bb-10.6-MDEV-24637/storage/innobase/row/row0sel.cc:5294
#11 0x000055f9b532912d in ha_innobase::general_fetch (this=0x61d0001c0cb8, buf=0x6260001ec138 ")(\002", direction=1, match_mode=0) at /Server/bb-10.6-MDEV-24637/storage/innobase/handler/ha_innodb.cc:8807
#12 0x000055f9b5329978 in ha_innobase::index_next (this=0x61d0001c0cb8, buf=0x6260001ec138 ")(\002") at /Server/bb-10.6-MDEV-24637/storage/innobase/handler/ha_innodb.cc:8869
#13 0x000055f9b48026dc in handler::ha_index_next (this=0x61d0001c0cb8, buf=0x6260001ec138 ")(\002") at /Server/bb-10.6-MDEV-24637/sql/handler.cc:3173
#14 0x000055f9b413dcf4 in join_read_next (info=0x62b000017e38) at /Server/bb-10.6-MDEV-24637/sql/sql_select.cc:21626
#15 0x000055f9b3e2c392 in READ_RECORD::read_record (this=0x62b000017e38) at /Server/bb-10.6-MDEV-24637/sql/records.h:81
#16 0x000055f9b413632d in sub_select (join=0x62b000016af8, join_tab=0x62b000017d70, end_of_records=false) at /Server/bb-10.6-MDEV-24637/sql/sql_select.cc:20613
#17 0x000055f9b41342b6 in do_select (join=0x62b000016af8, procedure=0x0) at /Server/bb-10.6-MDEV-24637/sql/sql_select.cc:20141
#18 0x000055f9b40c1123 in JOIN::exec_inner (this=0x62b000016af8) at /Server/bb-10.6-MDEV-24637/sql/sql_select.cc:4472
#19 0x000055f9b40be70e in JOIN::exec (this=0x62b000016af8) at /Server/bb-10.6-MDEV-24637/sql/sql_select.cc:4252
#20 0x000055f9b40c2688 in mysql_select (thd=0x62b00005b218, tables=0x62b000015ab0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x62b000016ac8, unit=0x62b00005f3e0, select_lex=0x62b000015378)
    at /Server/bb-10.6-MDEV-24637/sql/sql_select.cc:4668
#21 0x000055f9b409426c in handle_select (thd=0x62b00005b218, lex=0x62b00005f318, result=0x62b000016ac8, setup_tables_done_option=0) at /Server/bb-10.6-MDEV-24637/sql/sql_select.cc:417
#22 0x000055f9b3ffd78c in execute_sqlcom_select (thd=0x62b00005b218, all_tables=0x62b000015ab0) at /Server/bb-10.6-MDEV-24637/sql/sql_parse.cc:6133
#23 0x000055f9b3fec68d in mysql_execute_command (thd=0x62b00005b218) at /Server/bb-10.6-MDEV-24637/sql/sql_parse.cc:3829
#24 0x000055f9b40089d7 in mysql_parse (thd=0x62b00005b218, rawbuf=0x62b000015238 "SELECT `col_varchar_255_ucs2_key` FROM `table100_innodb_int_autoinc` /* E_R Thread7 QNO 120 CON_ID 20 */", length=104, parser_state=0x48836a6147c0)
    at /Server/bb-10.6-MDEV-24637/sql/sql_parse.cc:7901
#25 0x000055f9b3fdf6b5 in dispatch_command (command=COM_QUERY, thd=0x62b00005b218, packet=0x62900078f219 "", packet_length=105) at /Server/bb-10.6-MDEV-24637/sql/sql_parse.cc:1833
#26 0x000055f9b3fdc57b in do_command (thd=0x62b00005b218) at /Server/bb-10.6-MDEV-24637/sql/sql_parse.cc:1365
#27 0x000055f9b441dbc3 in do_handle_one_connection (connect=0x608000003238, put_in_cache=true) at /Server/bb-10.6-MDEV-24637/sql/sql_connect.cc:1410
#28 0x000055f9b441d52c in handle_one_connection (arg=0x608000003238) at /Server/bb-10.6-MDEV-24637/sql/sql_connect.cc:1312
#29 0x000055f9b5120daf in pfs_spawn_thread (arg=0x61500000b718) at /Server/bb-10.6-MDEV-24637/storage/perfschema/pfs.cc:2201
#30 0x00004a8f6034c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#31 0x0000000070124293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr)
pluto:/data/Results/1611230368/tmp1/dev/shm/vardir/1611230368/13/1/rr
_RR_TRACE_DIR="." rr replay --mark-stdio mysqld-0
 
RQG
====
perl rqg.pl \                   
--grammar=conf/engines/many_indexes.yy \
--gendata=conf/engines/many_indexes.zz \
--reporters=Mariabackup_linux \
--mysqld=--innodb_use_native_aio=1 \
--mysqld=--innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--connect_timeout=60 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--net_read_timeout=30 \
--mysqld=--net_write_timeout=60 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--wait_timeout=28800 \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--innodb-lock-wait-timeout=50 \
--no-mask \
--queries=10000000 \
--seed=random \
--reporters=Backtrace \
--reporters=ErrorLog \
--reporters=Deadlock1 \
--validators=None \
--mysqld=--log_output=none \
--mysqld=--log-bin \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--engine=InnoDB \
--restart_timeout=120 \
--mysqld=--plugin-load-add=file_key_management.so \
--mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
--duration=300 \
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
--mysqld=--loose-innodb-sync-debug \
--mysqld=--innodb_stats_persistent=on \
--mysqld=--innodb_adaptive_hash_index=on \
--mysqld=--loose-max-statement-time=30 \
--threads=9 \
--mysqld=--innodb_page_size=32K \
--mysqld=--innodb-buffer-pool-size=24M \
--duration=300 \
--no_mask \
--workdir=<local settings> \
--vardir=<local settings> \
--mtr-build-thread=<local settings> \
--basedir1=<local settings> \
--script_debug=_nix_ \
--rr=Extended \
--rr_options=--chaos



 Comments   
Comment by Marko Mäkelä [ 2021-01-22 ]

I believe that this is an over-zealous debug check. Before MDEV-24142, debug code kept track of threads holding S-latches using a doubly linked list. MDEV-24142 replaced the linked list with std::set, and duplicates are no longer tolerated.

The reason why recursive shared latch requests are not allowed in blocking mode is that if some other thread has enqueued a waiting exclusive lock, then further S-latch requests will block until the exclusive lock has been granted and released. If a thread were already holding one S-latch while waiting for the X-latch to be granted and released by another thread, the two threads would deadlock.

However, a try-lock operation is fine if it succeeds; in that case there cannot have existed any pending exclusive lock request.

I think that the correct fix is to replace the std::set in the debugging code with std::unordered_map that would keep a count of S-latch requests.

As part of fixing this, buf_page_try_get() should revert back to s_lock_try(). It had changed it to the more intrusive u_lock_try() in response to this debug check failing.

Comment by Marko Mäkelä [ 2021-01-22 ]

mleich, please test the bb-10.6-MDEV-24642 branch. I was happy to find that C++11 introduced std::unordered_multiset, which is exactly what we needed here.

Comment by Matthias Leich [ 2021-01-25 ]

origin/bb-10.6-MDEV-24642 18d3aaf3c0f268102cd4763abb2b9001d107665f 2021-01-22T08:55:16+02:00
behaved well during RQG testing.

Generated at Thu Feb 08 09:31:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.