Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26809

Assertion `lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)' failed

Details

    Description

      Testcase

      Start 3 node cluster and run following RQG test to reproduce the crash
      # RQG data load
      perl gendata.pl   --dsn=dbi:mysql:host=127.0.0.1:port=35130:user=root:database=test   --spec=conf/galera/galera_stress.zz
       
      # Initiate stress test on Node1
      perl gentest.pl   --dsn=dbi:mysql:host=127.0.0.1:port=35130:user=root:database=test   --grammar=conf/galera/galera_stress.yy --duration=10300 --queries=100000000 --threads=32 &
       
      # Initiate stress test on Node2
      perl gentest.pl   --dsn=dbi:mysql:host=127.0.0.1:port=17115:user=root:database=test   --grammar=conf/galera/galera_stress.yy --duration=10300 --queries=100000000 --threads=32 & 
      

      10.7.0 76149650764ea9660ae05d4987ea6c91534851ab (Debug)

      #0  0x0000000070000002 in ?? ()
      #1  0x0000071129e6b766 in _raw_syscall () at /home/roc/rr/rr/src/preload/raw_syscall.S:120
      #2  0x0000071129e6704e in traced_raw_syscall (call=<optimized out>) at /home/roc/rr/rr/src/preload/syscallbuf.c:272
      #3  0x0000071129e6a4d1 in syscall_hook_internal (call=0x1475199fcfa0) at /home/roc/rr/rr/src/preload/syscallbuf.c:3295
      #4  syscall_hook (call=0x1475199fcfa0) at /home/roc/rr/rr/src/preload/syscallbuf.c:3329
      #5  0x0000071129e66e50 in _syscall_hook_trampoline () at /home/roc/rr/rr/src/preload/syscall_hook.S:313
      #6  0x0000071129e66eaf in __morestack () at /home/roc/rr/rr/src/preload/syscall_hook.S:458
      #7  0x0000071129e66f08 in _syscall_hook_trampoline_89_c2_f7_da () at /home/roc/rr/rr/src/preload/syscall_hook.S:504
      #8  0x000072123158af0c in __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
      #9  0x000055903e0995f6 in my_write_core (sig=sig@entry=6) at /test/10.7_dbg/mysys/stacktrace.c:424
      #10 0x000055903d8912d9 in handle_fatal_signal (sig=6) at /test/10.7_dbg/sql/signal_handler.cc:345
      #11 <signal handler called>
      #12 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      #13 0x00003dce6cc0a859 in __GI_abort () at abort.c:79
      #14 0x00003dce6cc0a729 in __assert_fail_base (fmt=0x3dce6cda0588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
          assertion=0x55903e493160 "lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)",
          file=0x55903e4921b8 "/test/10.7_dbg/storage/innobase/lock/lock0lock.cc", line=5773, function=<optimized out>) at assert.c:92
      #15 0x00003dce6cc1bf36 in __GI___assert_fail (
          assertion=assertion@entry=0x55903e493160 "lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)",
          file=file@entry=0x55903e4921b8 "/test/10.7_dbg/storage/innobase/lock/lock0lock.cc", line=line@entry=5773,
          function=function@entry=0x55903e4930e0 "bool lock_trx_has_expl_x_lock(const trx_t&, const dict_table_t&, page_id_t, ulint)")
          at assert.c:101
      #16 0x000055903dcc954a in lock_trx_has_expl_x_lock (trx=..., table=..., id=..., heap_no=heap_no@entry=42)
          at /test/10.7_dbg/storage/innobase/include/lock0lock.h:949
      #17 0x000055903de1a401 in row_upd_clust_step (node=node@entry=0x104c2003c320, thr=thr@entry=0x104c2003c6a8)
          at /test/10.7_dbg/storage/innobase/include/buf0buf.h:821
      #18 0x000055903de1d258 in row_upd (thr=0x104c2003c6a8, node=0x104c2003c320) at /test/10.7_dbg/storage/innobase/row/row0upd.cc:2907
      #19 row_upd_step (thr=thr@entry=0x104c2003c6a8) at /test/10.7_dbg/storage/innobase/row/row0upd.cc:3051
      #20 0x000055903ddc231b in row_update_for_mysql (prebuilt=0x104c2003b7e8) at /test/10.7_dbg/storage/innobase/row/row0mysql.cc:1717
      #21 0x000055903dc46a66 in ha_innobase::update_row (this=0x104c2003a670, old_row=0x104c2003a210 <incomplete sequence \351>,
          new_row=0x104c2003a1e8 <incomplete sequence \351>) at /test/10.7_dbg/storage/innobase/handler/ha_innodb.cc:8588
      #22 0x000055903d8a6357 in handler::ha_update_row (this=0x104c2003a670, old_data=0x104c2003a210 <incomplete sequence \351>,
          new_data=0x104c2003a1e8 <incomplete sequence \351>) at /test/10.7_dbg/sql/handler.cc:7569
      #23 0x000055903d6b1b50 in mysql_update (thd=thd@entry=0x4bbb4c000db8, table_list=<optimized out>, fields=..., values=...,
          conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551613, ignore=<optimized out>,
          found_return=<optimized out>, updated_return=<optimized out>) at /test/10.7_dbg/sql/sql_update.cc:1081
      #24 0x000055903d599b8c in mysql_execute_command (thd=thd@entry=0x4bbb4c000db8,
          is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/10.7_dbg/sql/sql_limit.h:83
      #25 0x000055903d584c5f in mysql_parse (thd=thd@entry=0x4bbb4c000db8,
          rawbuf=rawbuf@entry=0x4bbb4c013dd0 "UPDATE IGNORE `table0_innodb_int_autoinc` AS X SET `col_int_key` = 1  /* QNO 12891 CON_ID 27 */",
          length=length@entry=95, parser_state=parser_state@entry=0x7aca75f3e400) at /test/10.7_dbg/sql/sql_parse.cc:8036
      #26 0x000055903d5845e5 in wsrep_mysql_parse (thd=thd@entry=0x4bbb4c000db8,
          rawbuf=0x4bbb4c013dd0 "UPDATE IGNORE `table0_innodb_int_autoinc` AS X SET `col_int_key` = 1  /* QNO 12891 CON_ID 27 */", length=95,
          parser_state=parser_state@entry=0x7aca75f3e400) at /test/10.7_dbg/sql/sql_parse.cc:7849
      #27 0x000055903d593943 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x4bbb4c000db8,
          packet=packet@entry=0x4bbb4c00b739 "UPDATE IGNORE `table0_innodb_int_autoinc` AS X SET `col_int_key` = 1  /* QNO 12891 CON_ID 27 */ ",
          packet_length=packet_length@entry=96, blocking=blocking@entry=true) at /test/10.7_dbg/sql/sql_class.h:1358
      #28 0x000055903d596d3a in do_command (thd=0x4bbb4c000db8, blocking=blocking@entry=true) at /test/10.7_dbg/sql/sql_parse.cc:1404
      #29 0x000055903d70e8a7 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x7112a2f0d58,
          put_in_cache=put_in_cache@entry=true) at /test/10.7_dbg/sql/sql_connect.cc:1418
      --Type <RET> for more, q to quit, c to continue without paging--
      #30 0x000055903d70eeaf in handle_one_connection (arg=arg@entry=0x7112a2f0d58) at /test/10.7_dbg/sql/sql_connect.cc:1312
      #31 0x000055903db7ad13 in pfs_spawn_thread (arg=0x7112a294ee8) at /test/10.7_dbg/storage/perfschema/pfs.cc:2201
      #32 0x0000721231582609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #33 0x00003dce6cd07293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      
      

      Attachments

        Issue Links

          Activity

            alice Alice Sherepa added a comment -

            I am getting the same assertion on 10.10, but without involving Galera

            preview-10.10-optimizer f332260c9872a428f68e0461329bb5fa29461592

            mysqld: /10.10/storage/innobase/lock/lock0lock.cc:5992: bool lock_trx_has_expl_x_lock(const trx_t&, const dict_table_t&, page_id_t, ulint): Assertion `lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)' failed.
            220620 10:12:03 [ERROR] mysqld got signal 6 ;
             
            Server version: 10.10.0-MariaDB-debug-log
             
            linux/raise.c:51(__GI_raise)[0x7fbe176027bb]
            stdlib/abort.c:81(__GI_abort)[0x7fbe175ed535]
            intl/loadmsgcat.c:1177(_nl_load_domain)[0x7fbe175ed40f]
            ??:0(__assert_fail)[0x7fbe175fb102]
            lock/lock0lock.cc:5991(lock_trx_has_expl_x_lock(trx_t const&, dict_table_t const&, page_id_t, unsigned long))[0x561bc554faa8]
            row/row0upd.cc:2663(row_upd_clust_step(upd_node_t*, que_thr_t*))[0x561bc5800f85]
            row/row0upd.cc:2792(row_upd(upd_node_t*, que_thr_t*))[0x561bc5801e89]
            row/row0upd.cc:2934(row_upd_step(que_thr_t*))[0x561bc5802d63]
            row/row0mysql.cc:1687(row_update_for_mysql(row_prebuilt_t*))[0x561bc576067f]
            handler/ha_innodb.cc:8585(ha_innobase::update_row(unsigned char const*, unsigned char const*))[0x561bc53a3847]
            sql/handler.cc:7604(handler::ha_update_row(unsigned char const*, unsigned char const*))[0x561bc497aafd]
            sql/sql_update.cc:1095(mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*, unsigned int, st_order*, unsigned long long, bool, unsigned long long*, unsigned long long*))[0x561bc43e58d6]
            sql/sql_parse.cc:4405(mysql_execute_command(THD*, bool))[0x561bc408671e]
            sql/sql_parse.cc:8036(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x561bc409fd8b]
            sql/sql_parse.cc:1896(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x561bc4076437]
            sql/sql_parse.cc:1407(do_command(THD*, bool))[0x561bc407316d]
            sql/sql_connect.cc:1418(do_handle_one_connection(CONNECT*, bool))[0x561bc4511a77]
            sql/sql_connect.cc:1314(handle_one_connection)[0x561bc45112fc]
            perfschema/pfs.cc:2203(pfs_spawn_thread)[0x561bc51afdc7]
            nptl/pthread_create.c:487(start_thread)[0x7fbe17abafa3]
            x86_64/clone.S:97(clone)[0x7fbe176c3eff]
             
            Query (0x6290006db2a8): UPDATE X SET f4 = 1 WHERE v4 NOT BETWEEN 6 AND 0
            

            alice Alice Sherepa added a comment - I am getting the same assertion on 10.10, but without involving Galera preview-10.10-optimizer f332260c9872a428f68e0461329bb5fa29461592 mysqld: /10.10/storage/innobase/lock/lock0lock.cc:5992: bool lock_trx_has_expl_x_lock(const trx_t&, const dict_table_t&, page_id_t, ulint): Assertion `lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)' failed. 220620 10:12:03 [ERROR] mysqld got signal 6 ;   Server version: 10.10.0-MariaDB-debug-log   linux/raise.c:51(__GI_raise)[0x7fbe176027bb] stdlib/abort.c:81(__GI_abort)[0x7fbe175ed535] intl/loadmsgcat.c:1177(_nl_load_domain)[0x7fbe175ed40f] ??:0(__assert_fail)[0x7fbe175fb102] lock/lock0lock.cc:5991(lock_trx_has_expl_x_lock(trx_t const&, dict_table_t const&, page_id_t, unsigned long))[0x561bc554faa8] row/row0upd.cc:2663(row_upd_clust_step(upd_node_t*, que_thr_t*))[0x561bc5800f85] row/row0upd.cc:2792(row_upd(upd_node_t*, que_thr_t*))[0x561bc5801e89] row/row0upd.cc:2934(row_upd_step(que_thr_t*))[0x561bc5802d63] row/row0mysql.cc:1687(row_update_for_mysql(row_prebuilt_t*))[0x561bc576067f] handler/ha_innodb.cc:8585(ha_innobase::update_row(unsigned char const*, unsigned char const*))[0x561bc53a3847] sql/handler.cc:7604(handler::ha_update_row(unsigned char const*, unsigned char const*))[0x561bc497aafd] sql/sql_update.cc:1095(mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*, unsigned int, st_order*, unsigned long long, bool, unsigned long long*, unsigned long long*))[0x561bc43e58d6] sql/sql_parse.cc:4405(mysql_execute_command(THD*, bool))[0x561bc408671e] sql/sql_parse.cc:8036(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x561bc409fd8b] sql/sql_parse.cc:1896(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x561bc4076437] sql/sql_parse.cc:1407(do_command(THD*, bool))[0x561bc407316d] sql/sql_connect.cc:1418(do_handle_one_connection(CONNECT*, bool))[0x561bc4511a77] sql/sql_connect.cc:1314(handle_one_connection)[0x561bc45112fc] perfschema/pfs.cc:2203(pfs_spawn_thread)[0x561bc51afdc7] nptl/pthread_create.c:487(start_thread)[0x7fbe17abafa3] x86_64/clone.S:97(clone)[0x7fbe176c3eff]   Query (0x6290006db2a8): UPDATE X SET f4 = 1 WHERE v4 NOT BETWEEN 6 AND 0

            see also MDEV-29081, it has more info and supposedly a reproducer

            serg Sergei Golubchik added a comment - see also MDEV-29081 , it has more info and supposedly a reproducer

            ramesh, I still cannot replay the trace. I used the following commands:

            ssh galapq
            rr replay /qa-dev/MDEV-26809/rr/latest-trace
            continue
            

            The trace will fail to replay. A likely cause is that some of the code was changed, as identified by the output:

            …
            [ERROR /home/roc/rr/rr/src/TraceStream.cc:1122:read_mapped_region()] Metadata of /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 changed: replay divergence likely, but continuing anyway. inode: 114561653/114558216; mode: 33188/33188; uid: 0/0; gid: 0/0; size: 1956992/1952928; mtime: 1622274567/1596888770
            …
            [FATAL /home/roc/rr/rr/src/Registers.cc:405:compare_register_files()] 
             (task 5145 (rec:1342138) at time 196)
             -> Assertion `!bail_error || match' failed to hold. Fatal register mismatch (ticks/rec:14623/14693)
            …
            === End rr backtrace
            Launch gdb with
              gdb '-l' '10000' '-ex' 'set sysroot /' '-ex' 'target extended-remote 127.0.0.1:5145' /qa-dev/MDEV-26809/rr/mysqld-0/mmap_hardlink_8_mariadbd
            

            Can you please create a new trace, possibly using the script in MDEV-29081?

            marko Marko Mäkelä added a comment - ramesh , I still cannot replay the trace. I used the following commands: ssh galapq rr replay /qa-dev/MDEV-26809/rr/latest-trace continue The trace will fail to replay. A likely cause is that some of the code was changed, as identified by the output: … [ERROR /home/roc/rr/rr/src/TraceStream.cc:1122:read_mapped_region()] Metadata of /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 changed: replay divergence likely, but continuing anyway. inode: 114561653/114558216; mode: 33188/33188; uid: 0/0; gid: 0/0; size: 1956992/1952928; mtime: 1622274567/1596888770 … [FATAL /home/roc/rr/rr/src/Registers.cc:405:compare_register_files()] (task 5145 (rec:1342138) at time 196) -> Assertion `!bail_error || match' failed to hold. Fatal register mismatch (ticks/rec:14623/14693) … === End rr backtrace Launch gdb with gdb '-l' '10000' '-ex' 'set sysroot /' '-ex' 'target extended-remote 127.0.0.1:5145' /qa-dev/MDEV-26809/rr/mysqld-0/mmap_hardlink_8_mariadbd Can you please create a new trace, possibly using the script in MDEV-29081 ?

            ramesh, sorry, I had missed your update. There is no /test/mtest/10.10 directory on the server anymore.

            Does this assertion still fail on the latest 10.6 or later branches? MDEV-29081 was claimed to be a duplicate of this.

            marko Marko Mäkelä added a comment - ramesh , sorry, I had missed your update. There is no /test/mtest/10.10 directory on the server anymore. Does this assertion still fail on the latest 10.6 or later branches? MDEV-29081 was claimed to be a duplicate of this.

            marko Could not reproduce the issue on the latest 10.6 or 10.10 build using the script provided in MDEV-29081. And the RQG test case provided in the description now hits the Galera bug MDEV-28472.

            ramesh Ramesh Sivaraman added a comment - marko Could not reproduce the issue on the latest 10.6 or 10.10 build using the script provided in MDEV-29081 . And the RQG test case provided in the description now hits the Galera bug MDEV-28472 .

            People

              marko Marko Mäkelä
              ramesh Ramesh Sivaraman
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.