Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15326

InnoDB: Failing assertion: !other_lock || wsrep_thd_is_BF(lock->trx->mysql_thd, FALSE) || wsrep_thd_is_BF(other_lock->trx->mysql_thd, FALSE)

Details

    Description

      https://api.travis-ci.org/v3/job/341571075/log.txt

      10.2 c6e35276f180bdcf5517bb682c46ba7df8a08186

      2018-02-15 04:52:54 0x7fb344d2a700  InnoDB: Assertion failure in file /home/travis/src/storage/innobase/lock/lock0lock.cc line 6243
      InnoDB: Failing assertion: !other_lock || wsrep_thd_is_BF(lock->trx->mysql_thd, FALSE) || wsrep_thd_is_BF(other_lock->trx->mysql_thd, FALSE)
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mysqld startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/
      InnoDB: about forcing recovery.
      180215  4:52:54 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
      Server version: 10.2.14-MariaDB-debug-log
      key_buffer_size=2147483648
      read_buffer_size=131072
      max_used_connections=8
      max_threads=153
      thread_count=14
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2433429 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
      Thread pointer: 0x7fb28c000f00
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7fb344d29e48 thread_stack 0x20000
      /home/travis/server/bin/mysqld(my_print_stacktrace+0x35)[0x110f3ab]
      handler/handler0alter.cc:1922(innobase_row_to_mysql(TABLE*, dict_table_t const*, dtuple_t const*))[0x9bd338]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fb347564cb0]
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7fb3469b8035]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7fb3469bb79b]
      sql/log_event.cc:7913(Gtid_list_log_event::do_apply_event(rpl_group_info*))[0xee7f4b]
      sql/sql_yacc.yy:12978(MYSQLparse(THD*))[0xd6132a]
      sql/sql_yacc.yy:13820(MYSQLparse(THD*))[0xd6381e]
      sql/item_func.cc:5672(Item_user_var_as_out_param::fix_fields(THD*, Item**))[0xe4804d]
      sql/item_func.cc:3983(Interruptible_wait::set_timeout(unsigned long long))[0xe4fb82]
      sql/opt_index_cond_pushdown.cc:131(uses_index_fields_only(Item*, TABLE*, unsigned int, bool))[0xce5aa7]
      sql/opt_index_cond_pushdown.cc:275(make_cond_remainder)[0xce609b]
      /home/travis/server/bin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x1c3)[0x9c3fe9]
      sql/field.h:3376(Field_blob::set_value(unsigned char*))[0xb2dd83]
      /home/travis/server/bin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_ordery15enum_duplicatesbPySB_+0x1fce)[0x832440]
      /home/travis/server/bin/mysqld(_Z21mysql_execute_commandP3THD+0x3d3b)[0x74aab6]
      /home/travis/server/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x2a4)[0x7562c6]
      /home/travis/server/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0xfeb)[0x743efe]
      /home/travis/server/bin/mysqld(_Z10do_commandP3THD+0x705)[0x742846]
      /home/travis/server/bin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1c8)[0x88f93e]
      /home/travis/server/bin/mysqld(handle_one_connection+0x30)[0x88f6c0]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7fb34755ce9a]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb346a782ed]
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7fb28c0114c8): UPDATE `view_table1_innodb` SET `col_decimal_key_default_null` = 'lightly' WHERE `col_float_unsigned_not_null_key` < 5  /* QNO 14993 CON_ID 20 */
      Connection ID (thread ID): 20
      Status: NOT_KILLED
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
      

      #5  0x00007fb3469bb79b in abort () from /lib/x86_64-linux-gnu/libc.so.6
      #6  0x0000000000ee7f4b in ut_dbg_assertion_failed (expr=0x133c6e0 "!other_lock || wsrep_thd_is_BF(lock->trx->mysql_thd, FALSE) || wsrep_thd_is_BF(other_lock->trx->mysql_thd, FALSE)", file=0x133a3f0 "/home/travis/src/storage/innobase/lock/lock0lock.cc", line=6243) at /home/travis/src/storage/innobase/ut/ut0dbg.cc:61
      #7  0x0000000000d6132a in lock_rec_queue_validate (locked_lock_trx_sys=0, block=0x7fb2aba06660, rec=0x7fb2ac80841f "", index=0x7fb268071e98, offsets=0x7fb344d26d80) at /home/travis/src/storage/innobase/lock/lock0lock.cc:6241
      #8  0x0000000000d6381e in lock_clust_rec_read_check_and_lock (flags=0, block=0x7fb2aba06660, rec=0x7fb2ac80841f "", index=0x7fb268071e98, offsets=0x7fb344d26d80, mode=LOCK_X, gap_mode=0, thr=0x7fb290599dc0) at /home/travis/src/storage/innobase/lock/lock0lock.cc:7061
      #9  0x0000000000e4804d in sel_set_rec_lock (pcur=0x7fb290598910, rec=0x7fb2ac80841f "", index=0x7fb268071e98, offsets=0x7fb344d26d80, mode=3, type=0, thr=0x7fb290599dc0, mtr=0x7fb344d27920) at /home/travis/src/storage/innobase/row/row0sel.cc:1250
      #10 0x0000000000e4fb82 in row_search_mvcc (buf=0x7fb2901b5878 "\377\377\377\376\377\377\377\200", mode=PAGE_CUR_G, prebuilt=0x7fb290598748, match_mode=0, direction=1) at /home/travis/src/storage/innobase/row/row0sel.cc:4969
      #11 0x0000000000ce5aa7 in ha_innobase::general_fetch (this=0x7fb290fa6208, buf=0x7fb2901b5878 "\377\377\377\376\377\377\377\200", direction=1, match_mode=0) at /home/travis/src/storage/innobase/handler/ha_innodb.cc:10024
      #12 0x0000000000ce609b in ha_innobase::rnd_next (this=0x7fb290fa6208, buf=0x7fb2901b5878 "\377\377\377\376\377\377\377\200") at /home/travis/src/storage/innobase/handler/ha_innodb.cc:10237
      #13 0x00000000009c3fe9 in handler::ha_rnd_next (this=0x7fb290fa6208, buf=0x7fb2901b5878 "\377\377\377\376\377\377\377\200") at /home/travis/src/sql/handler.cc:2576
      #14 0x0000000000b2dd83 in rr_sequential (info=0x7fb344d28080) at /home/travis/src/sql/records.cc:485
      #15 0x0000000000832440 in mysql_update (thd=0x7fb28c000f00, table_list=0x7fb28c0116f0, fields=..., values=..., conds=0x7fb28c012160, order_num=0, order=0x0, limit=18446744073709551598, handle_duplicates=DUP_ERROR, ignore=false, found_return=0x7fb344d287b0, updated_return=0x7fb344d287b8) at /home/travis/src/sql/sql_update.cc:735
      #16 0x000000000074aab6 in mysql_execute_command (thd=0x7fb28c000f00) at /home/travis/src/sql/sql_parse.cc:4273
      #17 0x00000000007562c6 in mysql_parse (thd=0x7fb28c000f00, rawbuf=0x7fb28c0114c8 "UPDATE `view_table1_innodb` SET `col_decimal_key_default_null` = 'lightly' WHERE `col_float_unsigned_not_null_key` < 5  /* QNO 14993 CON_ID 20 */", length=145, parser_state=0x7fb344d29080, is_com_multi=false, is_next_command=false) at /home/travis/src/sql/sql_parse.cc:7902
      #18 0x0000000000743efe in dispatch_command (command=COM_QUERY, thd=0x7fb28c000f00, packet=0x7fb28c008c51 " UPDATE `view_table1_innodb` SET `col_decimal_key_default_null` = 'lightly' WHERE `col_float_unsigned_not_null_key` < 5  /* QNO 14993 CON_ID 20 */ ", packet_length=147, is_com_multi=false, is_next_command=false) at /home/travis/src/sql/sql_parse.cc:1806
      #19 0x0000000000742846 in do_command (thd=0x7fb28c000f00) at /home/travis/src/sql/sql_parse.cc:1360
      #20 0x000000000088f93e in do_handle_one_connection (connect=0x3712c90) at /home/travis/src/sql/sql_connect.cc:1335
      #21 0x000000000088f6c0 in handle_one_connection (arg=0x3712c90) at /home/travis/src/sql/sql_connect.cc:1241
      #22 0x00007fb34755ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      #23 0x00007fb346a782ed in clone () from /lib/x86_64-linux-gnu/libc.so.6
      #24 0x0000000000000000 in ?? ()
      

      experimental 6b97ede2ef98c5ef90c2387528266f7fcde1685d

      perl /home/travis/rqg/runall-new.pl --views --validators=TransformerNoComparator --mysqld=--max-statement-time=30 --seed=1518670089 --threads=6 --duration=350 --reporters=Backtrace,ErrorLog,Deadlock --transformers=ExecuteAsUnion,ExecuteAsView --redefine=conf/mariadb/federated.yy --mysqld=--default-storage-engine=MyISAM --mysqld=--query_cache_type=0 --mysqld=--query_cache_size=0 --mysqld=--key_buffer_size=2G --mysqld=--thread_stack=192 --mysqld=--thread_cache_size=8 --mysqld=--myisam_recover_options=BACKUP --mysqld=--character-set-server=utf8mb4 --mysqld=--collation-server=utf8mb4_general_ci --grammar=conf/replication/replication.yy --gendata=conf/replication/replication-5.1.zz --engine=InnoDB --basedir1=/home/travis/server --vardir=/home/travis/logs/vardir1_1
      

      Not reproducible right away.

      Attachments

        Issue Links

          Activity

            For the record, here is my partial fix for 10.3.

            marko Marko Mäkelä added a comment - For the record, here is my partial fix for 10.3 .

            Here is an analysis of a 10.3 crash with the partial fix:

            10.3 12cd8d54e373f5e8de98ed2cb5bc6921049a38d2

            #5  0x00005647d21e00d0 in lock_rec_add_to_queue (type_mode=1059, 
                block=0x7fc32b1cf170, heap_no=2, index=0x7fc2d00f7608, 
                trx=0x7fc32ba8e288, caller_owns_trx_mutex=<optimized out>)
                at /mariadb/10.3/storage/innobase/lock/lock0lock.cc:1853
            #6  0x00005647d21f07f3 in lock_rec_convert_impl_to_expl_for_trx (
                block=<optimized out>, rec=<optimized out>, index=<optimized out>, 
                trx=<optimized out>, heap_no=2)
                at /mariadb/10.3/storage/innobase/lock/lock0lock.cc:5431
            #7  lock_rec_convert_impl_to_expl (caller_trx=<optimized out>, block=0x7fc32b1cf170, rec=<optimized out>, index=<optimized out>, offsets=<optimized out>) at /mariadb/10.3/storage/innobase/lock/lock0lock.cc:5591
            #8  0x00005647d21f2e00 in lock_clust_rec_read_check_and_lock (flags=0, block=0x7fc32b1cf170, rec=0x7fc32b51807d "\200", index=0x7fc2d00f7608, offsets=0x7fc3280dfec0, mode=LOCK_X, gap_mode=1024, thr=0x7fc2d4056548) at /mariadb/10.3/storage/innobase/lock/lock0lock.cc:5854
            #9  0x00005647d232a1f9 in sel_set_rec_lock (pcur=0x7fc2d4055ce8, rec=<optimized out>, index=0x7fc2d00f7608, offsets=<optimized out>, mode=<optimized out>, type=<optimized out>, thr=0x7fc2d4056548, mtr=0x7fc3280e01e8) at /mariadb/10.3/storage/innobase/row/row0sel.cc:1260
            #10 0x00005647d2326abb in row_search_mvcc (buf=<optimized out>, mode=<optimized out>, prebuilt=0x7fc2d4055b18, match_mode=<optimized out>, direction=0) at /mariadb/10.3/storage/innobase/row/row0sel.cc:5062
            …
            #22 0x00005647d1d3f53d in mysql_parse (thd=0x7fc2b8000cf8, rawbuf=0x7fc2b80109f0 "DELETE FROM t3 WHERE col1 = 7", length=<optimized out>, parser_state=<optimized out>, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /mariadb/10.3/sql/sql_parse.cc:7829
            

            The current transaction is 0x3d1b, and the other_lock->trx->id is 0x3d1c.
            The clustered index root page contains (col1,DB_TRX_ID,DB_ROLL_PTR,col_text)=(4,0,1<<55,77),(7,0x3d19,insert,77).
            The transaction 0x3d19 is waiting for lock_sys.mutex at the start of lock_trx_release_locks().
            Because even with the partial fix, we will already have removed the transaction from trx_sys.rw_trx_hash at this point of commit, it is very possible that the other_lock->trx thread observed that the record is no longer implicitly locked. That thread is waiting for the clustered index leaf page latch, which our crashing thread is holding.

            I developed a follow-up fix that allows us to remove the transaction from trx_sys.rw_trx_hash after the locks have been released. The MVCC read view creation still needs some work, because the test innodb.innodb-read-view is returning wrong result.

            marko Marko Mäkelä added a comment - Here is an analysis of a 10.3 crash with the partial fix: 10.3 12cd8d54e373f5e8de98ed2cb5bc6921049a38d2 #5 0x00005647d21e00d0 in lock_rec_add_to_queue (type_mode=1059, block=0x7fc32b1cf170, heap_no=2, index=0x7fc2d00f7608, trx=0x7fc32ba8e288, caller_owns_trx_mutex=<optimized out>) at /mariadb/10.3/storage/innobase/lock/lock0lock.cc:1853 #6 0x00005647d21f07f3 in lock_rec_convert_impl_to_expl_for_trx ( block=<optimized out>, rec=<optimized out>, index=<optimized out>, trx=<optimized out>, heap_no=2) at /mariadb/10.3/storage/innobase/lock/lock0lock.cc:5431 #7 lock_rec_convert_impl_to_expl (caller_trx=<optimized out>, block=0x7fc32b1cf170, rec=<optimized out>, index=<optimized out>, offsets=<optimized out>) at /mariadb/10.3/storage/innobase/lock/lock0lock.cc:5591 #8 0x00005647d21f2e00 in lock_clust_rec_read_check_and_lock (flags=0, block=0x7fc32b1cf170, rec=0x7fc32b51807d "\200", index=0x7fc2d00f7608, offsets=0x7fc3280dfec0, mode=LOCK_X, gap_mode=1024, thr=0x7fc2d4056548) at /mariadb/10.3/storage/innobase/lock/lock0lock.cc:5854 #9 0x00005647d232a1f9 in sel_set_rec_lock (pcur=0x7fc2d4055ce8, rec=<optimized out>, index=0x7fc2d00f7608, offsets=<optimized out>, mode=<optimized out>, type=<optimized out>, thr=0x7fc2d4056548, mtr=0x7fc3280e01e8) at /mariadb/10.3/storage/innobase/row/row0sel.cc:1260 #10 0x00005647d2326abb in row_search_mvcc (buf=<optimized out>, mode=<optimized out>, prebuilt=0x7fc2d4055b18, match_mode=<optimized out>, direction=0) at /mariadb/10.3/storage/innobase/row/row0sel.cc:5062 … #22 0x00005647d1d3f53d in mysql_parse (thd=0x7fc2b8000cf8, rawbuf=0x7fc2b80109f0 "DELETE FROM t3 WHERE col1 = 7", length=<optimized out>, parser_state=<optimized out>, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /mariadb/10.3/sql/sql_parse.cc:7829 The current transaction is 0x3d1b, and the other_lock->trx->id is 0x3d1c. The clustered index root page contains (col1,DB_TRX_ID,DB_ROLL_PTR,col_text)=(4,0,1<<55,77),(7,0x3d19,insert,77). The transaction 0x3d19 is waiting for lock_sys.mutex at the start of lock_trx_release_locks() . Because even with the partial fix, we will already have removed the transaction from trx_sys.rw_trx_hash at this point of commit, it is very possible that the other_lock->trx thread observed that the record is no longer implicitly locked. That thread is waiting for the clustered index leaf page latch, which our crashing thread is holding. I developed a follow-up fix that allows us to remove the transaction from trx_sys.rw_trx_hash after the locks have been released. The MVCC read view creation still needs some work, because the test innodb.innodb-read-view is returning wrong result.

            This patch makes the 10.2 fix equivalent to the problematic 10.3 fix, introducing a failure of the test {{innodb.innodb-read-view}:

            diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
            index 6c2361207cf..b005baa18f7 100644
            --- a/storage/innobase/trx/trx0trx.cc
            +++ b/storage/innobase/trx/trx0trx.cc
            @@ -1629,13 +1629,6 @@ trx_commit_in_memory(
             		DBUG_LOG("trx", "Autocommit in memory: " << trx);
             		trx->state = TRX_STATE_NOT_STARTED;
             	} else {
            -		if (trx->id > 0) {
            -			/* For consistent snapshot, we need to remove current
            -			transaction from running transaction id list for mvcc
            -			before doing commit and releasing locks. */
            -			trx_erase_mvcc(trx, serialised);
            -		}
            -
             		lock_trx_release_locks(trx);
             		ut_ad(trx->read_only || !trx->rsegs.m_redo.rseg || trx->id);
             		ut_ad(!trx_is_referenced(trx));
            @@ -1672,7 +1665,10 @@ trx_commit_in_memory(
             			trx_sys_mutex_exit();
             		}
             
            -		trx->id = 0;
            +		if (trx->id > 0) {
            +			trx_erase_mvcc(trx, serialised);
            +			trx->id = 0;
            +		}
             	}
             
             	ut_ad(!trx->rsegs.m_redo.update_undo);
            

            svoj had the idea that we could assign trx->state=TRX_STATE_COMMITTED_IN_MEMORY; as the first step, before removing the transaction from trx_sys, and then proceeding to free the explicit locks. That sounds promising to me, and it would allow to keep the trx_erase_lists() in one atomic piece in 10.2.

            marko Marko Mäkelä added a comment - This patch makes the 10.2 fix equivalent to the problematic 10.3 fix, introducing a failure of the test {{innodb.innodb-read-view}: diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc index 6c2361207cf..b005baa18f7 100644 --- a/storage/innobase/trx/trx0trx.cc +++ b/storage/innobase/trx/trx0trx.cc @@ -1629,13 +1629,6 @@ trx_commit_in_memory( DBUG_LOG("trx", "Autocommit in memory: " << trx); trx->state = TRX_STATE_NOT_STARTED; } else { - if (trx->id > 0) { - /* For consistent snapshot, we need to remove current - transaction from running transaction id list for mvcc - before doing commit and releasing locks. */ - trx_erase_mvcc(trx, serialised); - } - lock_trx_release_locks(trx); ut_ad(trx->read_only || !trx->rsegs.m_redo.rseg || trx->id); ut_ad(!trx_is_referenced(trx)); @@ -1672,7 +1665,10 @@ trx_commit_in_memory( trx_sys_mutex_exit(); } - trx->id = 0; + if (trx->id > 0) { + trx_erase_mvcc(trx, serialised); + trx->id = 0; + } } ut_ad(!trx->rsegs.m_redo.update_undo); svoj had the idea that we could assign trx->state=TRX_STATE_COMMITTED_IN_MEMORY; as the first step, before removing the transaction from trx_sys , and then proceeding to free the explicit locks. That sounds promising to me, and it would allow to keep the trx_erase_lists() in one atomic piece in 10.2.

            I think that we should try to fix this serious corruption bug before the next releases. My 10.2 fix was promising in RQG tests, but it is causing a performance regression, so the fix will need some more work.

            marko Marko Mäkelä added a comment - I think that we should try to fix this serious corruption bug before the next releases. My 10.2 fix was promising in RQG tests, but it is causing a performance regression, so the fix will need some more work.
            marko Marko Mäkelä added a comment - - edited

            I implemented svoj’s idea on top of 10.2 (3 commits), and I merged that to 10.3.

            marko Marko Mäkelä added a comment - - edited I implemented svoj ’s idea on top of 10.2 (3 commits), and I merged that to 10.3 .

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.