Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5189

Replication failure and subsequent assertion failure on concurrent DML flow with slave-parallel-threads > 1 with gtid_domain_id per thread

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • 10.0.5
    • None
    • None

    Description

      The DML flow consists of simple INSERT / UPDATE / DELETE and also BEGIN/COMMIT which are executed in several threads each of which has set a session value of gtid_domain_id equal to CONNECTION_ID (hence it's unique for every thread). Replication (row-based) promptly fails, and after that an assertion failure happens. I assume that the assertion failure is one of those known issues with error handling, but the replication failure itself shouldn't be happening at the first place.

      The result is the same whether the slave uses GTID or not. In the standard RQG it does not, but if you want to try with GTID, you can either start the servers separately, or apply the patch for RQG provided below.

      RQG grammar (parallel-replication-2.yy):

      query_init:
      	SET gtid_domain_id = CONNECTION_ID() ;
       
      query:
      	transaction |
      	insert_replace | update | delete |
      	insert_replace | update | delete |
      	insert_replace | update | delete |
      	insert_replace | update | delete |
      	insert_replace | update | delete |
      	insert_replace | update | delete |
      	insert_replace | update | delete |
      	insert_replace | update | delete |
      	insert_replace | update | delete ;
       
      set_domain_id:
      	SET gtid_domain_id = _digit ;
       
      transaction:
      	START TRANSACTION |
      	COMMIT ;
       
      insert_replace:
      	INSERT INTO _table (`pk`) VALUES (NULL) ;
       
      update:
      	UPDATE _table SET _field_no_pk = value where ORDER BY _field_list LIMIT large_digit ;
       
      delete:
      	DELETE FROM _table where_delete ORDER BY _field_list LIMIT small_digit ;
       
      where:
      	|
      	WHERE _field_key < value | 	
      	WHERE _field_key IN ( value , value , value , value , value ) |
      	WHERE _field_key BETWEEN small_digit AND large_digit |
      	WHERE _field_key BETWEEN _tinyint_unsigned AND _int_unsigned ;
       
      where_delete:
      	|
      	WHERE _field_key = value |
      	WHERE _field_key IN ( value , value , value , value , value ) |
      	WHERE _field_key BETWEEN small_digit AND large_digit ;
       
      large_digit:
      	5 | 6 | 7 | 8 ;
       
      small_digit:
      	1 | 2 | 3 | 4 ;
       
      value:
      	_digit | _tinyint_unsigned | _varchar(1) | _int_unsigned ;

      RQG command line:

      perl ./runall-new.pl --grammar=parallel-replication-2.yy --threads=10 --duration=600 --queries=100M --basedir=<your basedir> --engine=InnoDB --vardir=<your location for logs> --rpl_mode=row --mysqld=--slave-parallel-threads=5

      RQG patch to use GTID:

      === modified file 'lib/DBServer/MySQL/ReplMySQLd.pm'
      --- lib/DBServer/MySQL/ReplMySQLd.pm	2012-06-11 08:23:46 +0000
      +++ lib/DBServer/MySQL/ReplMySQLd.pm	2013-10-25 10:24:51 +0000
      @@ -192,6 +192,7 @@
                          " MASTER_PORT = ".$self->master->port.",".
                          " MASTER_HOST = '127.0.0.1',".
                          " MASTER_USER = 'root',".
      +                   " MASTER_USE_GTID = current_pos,".
                          " MASTER_CONNECT_RETRY = 1");
           
       	$slave_dbh->do("START SLAVE");
       

      131025 14:26:32 [ERROR] Slave SQL: Could not execute Update_rows event on table test.AA; Can't find record in 'AA', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000001, end_log_pos 105548, Internal MariaDB error code: 1032
      mysqld: /sql/sql_base.cc:5731: bool lock_tables(THD*, TABLE_LIST*, uint, uint): Assertion `thd->lock == 0' failed.
      131025 14:26:32 [ERROR] mysqld got signal 6 ;

      #4  0x00007fddba5e7425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
      #5  0x00007fddba5eab8b in __GI_abort () at abort.c:91
      #6  0x00007fddba5e00ee in __assert_fail_base (fmt=<optimized out>, assertion=0xd5ac18 "thd->lock == 0", file=0xd59b00 "/sql/sql_base.cc", line=<optimized out>, function=<optimized out>) at assert.c:94
      #7  0x00007fddba5e0192 in __GI___assert_fail (assertion=0xd5ac18 "thd->lock == 0", file=0xd59b00 "/sql/sql_base.cc", line=5731, function=0xd5c2a0 "bool lock_tables(THD*, TABLE_LIST*, uint, uint)") at assert.c:103
      #8  0x00000000005c16a1 in lock_tables (thd=0x7fdd5c000b00, tables=0x7fdd98ff8680, count=1, flags=0) at /sql/sql_base.cc:5731
      #9  0x00000000005c1229 in open_and_lock_tables (thd=0x7fdd5c000b00, tables=0x7fdd98ff8680, derived=false, flags=0, prelocking_strategy=0x7fdd98ff8510) at /sql/sql_base.cc:5572
      #10 0x00000000005b46b1 in open_and_lock_tables (thd=0x7fdd5c000b00, tables=0x7fdd98ff8680, derived=false, flags=0) at /sql/sql_base.h:562
      #11 0x0000000000783944 in rpl_slave_state::record_gtid (this=0x15155a0, thd=0x7fdd5c000b00, gtid=0x7fdd98ff8cb0, sub_id=10, in_transaction=true, in_statement=false) at /sql/rpl_gtid.cc:342
      #12 0x00000000008e0c3a in Xid_log_event::do_apply_event (this=0x7fdd5801c280, rgi=0x7fdd5801abb0) at /sql/log_event.cc:6932
      #13 0x0000000000597096 in Log_event::apply_event (this=0x7fdd5801c280, rgi=0x7fdd5801abb0) at /sql/log_event.h:1322
      #14 0x000000000058dfff in apply_event_and_update_pos (ev=0x7fdd5801c280, thd=0x7fdd5c000b00, rgi=0x7fdd5801abb0, rpt=0x3d60cc8) at /sql/slave.cc:3102
      #15 0x0000000000786888 in rpt_handle_event (qev=0x7fdd5801c370, rpt=0x3d60cc8) at /sql/rpl_parallel.cc:62
      #16 0x0000000000786e77 in handle_rpl_parallel_thread (arg=0x3d60cc8) at /sql/rpl_parallel.cc:223
      #17 0x00007fddbb3b0e9a in start_thread (arg=0x7fdd98ff9700) at pthread_create.c:308
      #18 0x00007fddba6a4cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

      revision-id: knielsen@knielsen-hq.org-20131024065348-t37zcjiw9mdta4kd
      date: 2013-10-24 08:53:48 +0200
      build-date: 2013-10-25 14:25:12 +0400
      revno: 3683
      branch-nick: 10.0-knielsen

      Attachments

        Issue Links

          Activity

            People

              knielsen Kristian Nielsen
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.