[MDEV-26809] Assertion `lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)' failed Created: 2021-10-12  Updated: 2022-11-07  Resolved: 2022-11-07

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.6.10, 10.7.6, 10.8.5, 10.9.3, 10.10.2, 10.11.0

Type: Bug Priority: Major
Reporter: Ramesh Sivaraman Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: need_rr

Issue Links:
Duplicate
is duplicated by MDEV-29081 trx_t::lock.was_chosen_as_deadlock_vi... Closed

 Description   

Testcase

Start 3 node cluster and run following RQG test to reproduce the crash
# RQG data load
perl gendata.pl   --dsn=dbi:mysql:host=127.0.0.1:port=35130:user=root:database=test   --spec=conf/galera/galera_stress.zz
 
# Initiate stress test on Node1
perl gentest.pl   --dsn=dbi:mysql:host=127.0.0.1:port=35130:user=root:database=test   --grammar=conf/galera/galera_stress.yy --duration=10300 --queries=100000000 --threads=32 &
 
# Initiate stress test on Node2
perl gentest.pl   --dsn=dbi:mysql:host=127.0.0.1:port=17115:user=root:database=test   --grammar=conf/galera/galera_stress.yy --duration=10300 --queries=100000000 --threads=32 & 

10.7.0 76149650764ea9660ae05d4987ea6c91534851ab (Debug)

#0  0x0000000070000002 in ?? ()
#1  0x0000071129e6b766 in _raw_syscall () at /home/roc/rr/rr/src/preload/raw_syscall.S:120
#2  0x0000071129e6704e in traced_raw_syscall (call=<optimized out>) at /home/roc/rr/rr/src/preload/syscallbuf.c:272
#3  0x0000071129e6a4d1 in syscall_hook_internal (call=0x1475199fcfa0) at /home/roc/rr/rr/src/preload/syscallbuf.c:3295
#4  syscall_hook (call=0x1475199fcfa0) at /home/roc/rr/rr/src/preload/syscallbuf.c:3329
#5  0x0000071129e66e50 in _syscall_hook_trampoline () at /home/roc/rr/rr/src/preload/syscall_hook.S:313
#6  0x0000071129e66eaf in __morestack () at /home/roc/rr/rr/src/preload/syscall_hook.S:458
#7  0x0000071129e66f08 in _syscall_hook_trampoline_89_c2_f7_da () at /home/roc/rr/rr/src/preload/syscall_hook.S:504
#8  0x000072123158af0c in __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#9  0x000055903e0995f6 in my_write_core (sig=sig@entry=6) at /test/10.7_dbg/mysys/stacktrace.c:424
#10 0x000055903d8912d9 in handle_fatal_signal (sig=6) at /test/10.7_dbg/sql/signal_handler.cc:345
#11 <signal handler called>
#12 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#13 0x00003dce6cc0a859 in __GI_abort () at abort.c:79
#14 0x00003dce6cc0a729 in __assert_fail_base (fmt=0x3dce6cda0588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=0x55903e493160 "lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)",
    file=0x55903e4921b8 "/test/10.7_dbg/storage/innobase/lock/lock0lock.cc", line=5773, function=<optimized out>) at assert.c:92
#15 0x00003dce6cc1bf36 in __GI___assert_fail (
    assertion=assertion@entry=0x55903e493160 "lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)",
    file=file@entry=0x55903e4921b8 "/test/10.7_dbg/storage/innobase/lock/lock0lock.cc", line=line@entry=5773,
    function=function@entry=0x55903e4930e0 "bool lock_trx_has_expl_x_lock(const trx_t&, const dict_table_t&, page_id_t, ulint)")
    at assert.c:101
#16 0x000055903dcc954a in lock_trx_has_expl_x_lock (trx=..., table=..., id=..., heap_no=heap_no@entry=42)
    at /test/10.7_dbg/storage/innobase/include/lock0lock.h:949
#17 0x000055903de1a401 in row_upd_clust_step (node=node@entry=0x104c2003c320, thr=thr@entry=0x104c2003c6a8)
    at /test/10.7_dbg/storage/innobase/include/buf0buf.h:821
#18 0x000055903de1d258 in row_upd (thr=0x104c2003c6a8, node=0x104c2003c320) at /test/10.7_dbg/storage/innobase/row/row0upd.cc:2907
#19 row_upd_step (thr=thr@entry=0x104c2003c6a8) at /test/10.7_dbg/storage/innobase/row/row0upd.cc:3051
#20 0x000055903ddc231b in row_update_for_mysql (prebuilt=0x104c2003b7e8) at /test/10.7_dbg/storage/innobase/row/row0mysql.cc:1717
#21 0x000055903dc46a66 in ha_innobase::update_row (this=0x104c2003a670, old_row=0x104c2003a210 <incomplete sequence \351>,
    new_row=0x104c2003a1e8 <incomplete sequence \351>) at /test/10.7_dbg/storage/innobase/handler/ha_innodb.cc:8588
#22 0x000055903d8a6357 in handler::ha_update_row (this=0x104c2003a670, old_data=0x104c2003a210 <incomplete sequence \351>,
    new_data=0x104c2003a1e8 <incomplete sequence \351>) at /test/10.7_dbg/sql/handler.cc:7569
#23 0x000055903d6b1b50 in mysql_update (thd=thd@entry=0x4bbb4c000db8, table_list=<optimized out>, fields=..., values=...,
    conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551613, ignore=<optimized out>,
    found_return=<optimized out>, updated_return=<optimized out>) at /test/10.7_dbg/sql/sql_update.cc:1081
#24 0x000055903d599b8c in mysql_execute_command (thd=thd@entry=0x4bbb4c000db8,
    is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/10.7_dbg/sql/sql_limit.h:83
#25 0x000055903d584c5f in mysql_parse (thd=thd@entry=0x4bbb4c000db8,
    rawbuf=rawbuf@entry=0x4bbb4c013dd0 "UPDATE IGNORE `table0_innodb_int_autoinc` AS X SET `col_int_key` = 1  /* QNO 12891 CON_ID 27 */",
    length=length@entry=95, parser_state=parser_state@entry=0x7aca75f3e400) at /test/10.7_dbg/sql/sql_parse.cc:8036
#26 0x000055903d5845e5 in wsrep_mysql_parse (thd=thd@entry=0x4bbb4c000db8,
    rawbuf=0x4bbb4c013dd0 "UPDATE IGNORE `table0_innodb_int_autoinc` AS X SET `col_int_key` = 1  /* QNO 12891 CON_ID 27 */", length=95,
    parser_state=parser_state@entry=0x7aca75f3e400) at /test/10.7_dbg/sql/sql_parse.cc:7849
#27 0x000055903d593943 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x4bbb4c000db8,
    packet=packet@entry=0x4bbb4c00b739 "UPDATE IGNORE `table0_innodb_int_autoinc` AS X SET `col_int_key` = 1  /* QNO 12891 CON_ID 27 */ ",
    packet_length=packet_length@entry=96, blocking=blocking@entry=true) at /test/10.7_dbg/sql/sql_class.h:1358
#28 0x000055903d596d3a in do_command (thd=0x4bbb4c000db8, blocking=blocking@entry=true) at /test/10.7_dbg/sql/sql_parse.cc:1404
#29 0x000055903d70e8a7 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x7112a2f0d58,
    put_in_cache=put_in_cache@entry=true) at /test/10.7_dbg/sql/sql_connect.cc:1418
--Type <RET> for more, q to quit, c to continue without paging--
#30 0x000055903d70eeaf in handle_one_connection (arg=arg@entry=0x7112a2f0d58) at /test/10.7_dbg/sql/sql_connect.cc:1312
#31 0x000055903db7ad13 in pfs_spawn_thread (arg=0x7112a294ee8) at /test/10.7_dbg/storage/perfschema/pfs.cc:2201
#32 0x0000721231582609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#33 0x00003dce6cd07293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95



 Comments   
Comment by Alice Sherepa [ 2022-06-20 ]

I am getting the same assertion on 10.10, but without involving Galera

preview-10.10-optimizer f332260c9872a428f68e0461329bb5fa29461592

mysqld: /10.10/storage/innobase/lock/lock0lock.cc:5992: bool lock_trx_has_expl_x_lock(const trx_t&, const dict_table_t&, page_id_t, ulint): Assertion `lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)' failed.
220620 10:12:03 [ERROR] mysqld got signal 6 ;
 
Server version: 10.10.0-MariaDB-debug-log
 
linux/raise.c:51(__GI_raise)[0x7fbe176027bb]
stdlib/abort.c:81(__GI_abort)[0x7fbe175ed535]
intl/loadmsgcat.c:1177(_nl_load_domain)[0x7fbe175ed40f]
??:0(__assert_fail)[0x7fbe175fb102]
lock/lock0lock.cc:5991(lock_trx_has_expl_x_lock(trx_t const&, dict_table_t const&, page_id_t, unsigned long))[0x561bc554faa8]
row/row0upd.cc:2663(row_upd_clust_step(upd_node_t*, que_thr_t*))[0x561bc5800f85]
row/row0upd.cc:2792(row_upd(upd_node_t*, que_thr_t*))[0x561bc5801e89]
row/row0upd.cc:2934(row_upd_step(que_thr_t*))[0x561bc5802d63]
row/row0mysql.cc:1687(row_update_for_mysql(row_prebuilt_t*))[0x561bc576067f]
handler/ha_innodb.cc:8585(ha_innobase::update_row(unsigned char const*, unsigned char const*))[0x561bc53a3847]
sql/handler.cc:7604(handler::ha_update_row(unsigned char const*, unsigned char const*))[0x561bc497aafd]
sql/sql_update.cc:1095(mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*, unsigned int, st_order*, unsigned long long, bool, unsigned long long*, unsigned long long*))[0x561bc43e58d6]
sql/sql_parse.cc:4405(mysql_execute_command(THD*, bool))[0x561bc408671e]
sql/sql_parse.cc:8036(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x561bc409fd8b]
sql/sql_parse.cc:1896(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x561bc4076437]
sql/sql_parse.cc:1407(do_command(THD*, bool))[0x561bc407316d]
sql/sql_connect.cc:1418(do_handle_one_connection(CONNECT*, bool))[0x561bc4511a77]
sql/sql_connect.cc:1314(handle_one_connection)[0x561bc45112fc]
perfschema/pfs.cc:2203(pfs_spawn_thread)[0x561bc51afdc7]
nptl/pthread_create.c:487(start_thread)[0x7fbe17abafa3]
x86_64/clone.S:97(clone)[0x7fbe176c3eff]
 
Query (0x6290006db2a8): UPDATE X SET f4 = 1 WHERE v4 NOT BETWEEN 6 AND 0

Comment by Sergei Golubchik [ 2022-07-12 ]

see also MDEV-29081, it has more info and supposedly a reproducer

Comment by Marko Mäkelä [ 2022-07-27 ]

ramesh, I still cannot replay the trace. I used the following commands:

ssh galapq
rr replay /qa-dev/MDEV-26809/rr/latest-trace
continue

The trace will fail to replay. A likely cause is that some of the code was changed, as identified by the output:

[ERROR /home/roc/rr/rr/src/TraceStream.cc:1122:read_mapped_region()] Metadata of /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 changed: replay divergence likely, but continuing anyway. inode: 114561653/114558216; mode: 33188/33188; uid: 0/0; gid: 0/0; size: 1956992/1952928; mtime: 1622274567/1596888770
[FATAL /home/roc/rr/rr/src/Registers.cc:405:compare_register_files()] 
 (task 5145 (rec:1342138) at time 196)
 -> Assertion `!bail_error || match' failed to hold. Fatal register mismatch (ticks/rec:14623/14693)
=== End rr backtrace
Launch gdb with
  gdb '-l' '10000' '-ex' 'set sysroot /' '-ex' 'target extended-remote 127.0.0.1:5145' /qa-dev/MDEV-26809/rr/mysqld-0/mmap_hardlink_8_mariadbd

Can you please create a new trace, possibly using the script in MDEV-29081?

Comment by Marko Mäkelä [ 2022-11-04 ]

ramesh, sorry, I had missed your update. There is no /test/mtest/10.10 directory on the server anymore.

Does this assertion still fail on the latest 10.6 or later branches? MDEV-29081 was claimed to be a duplicate of this.

Comment by Ramesh Sivaraman [ 2022-11-05 ]

marko Could not reproduce the issue on the latest 10.6 or 10.10 build using the script provided in MDEV-29081. And the RQG test case provided in the description now hits the Galera bug MDEV-28472.

Generated at Thu Feb 08 09:48:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.