Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34010

[ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid ..., Internal MariaDB error code: 1964

    XMLWordPrintable

Details

    Description

      Likely related: MDEV-9386, MDEV-32372 (related assertion), MDEV-33761, though duplication is not confirmed with any of them.
      Possibly related: MDEV-33909.

      During testing I regularly see this error:

      [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964
      

      I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB.

      set sql_mode='';
      CREATE TABLE t (c YEAR KEY);
      INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)'));
      DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2;
      INSERT INTO t VALUES (1)
      

      SET sql_mode='';
      CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t);
      CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB;
      CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB;
      DROP TABLE IF EXISTS t,mysqlt,mysqlt2;
      INSERT INTO t VALUES (1);
      DELETE FROM t WHERE f=29;
      INSERT INTO t2 VALUES (1);
      

      SET sql_mode='';
      CREATE TABLE t (c INT);
      INSERT INTO t VALUES (0);
      UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0);
      INSERT INTO t VALUES (0);
      

      CREATE TABLE t0 (a BLOB) ;
      CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ;
      SET SESSION autocommit=0;
      DELETE FROM t0;
      CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ;
      INSERT INTO mysql.user SELECT * FROM t0;
      INSERT INTO t0 VALUES (0xCFD0);
      SELECT REPEAT (
      ALTER TABLE t CHANGE COLUMN c c BINARY (0);
      INSERT INTO t0 VALUES (1);
      ALTER TABLE t CHANGE COLUMN a a BINARY (0);
      

      Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor).

      All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL.

      Here are the options I used:

      MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1"  # or =STATEMENT
      SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2"
      

      The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also.

      As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master.

      This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority.

      NTS: ~/MDEV-34010

      Attachments

        Issue Links

          Activity

            People

              Elkin Andrei Elkin
              Roel Roel Van de Paar
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.