Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.5, 10.6, 10.11, 11.1(EOL), 11.2(EOL), 11.3(EOL), 11.4, 11.5(EOL)
Description
Likely related: MDEV-9386, MDEV-32372 (related assertion), MDEV-33761, though duplication is not confirmed with any of them.
Possibly related: MDEV-33909.
During testing I regularly see this error:
[ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964
|
I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB.
set sql_mode=''; |
CREATE TABLE t (c YEAR KEY); |
INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); |
DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; |
INSERT INTO t VALUES (1) |
SET sql_mode=''; |
CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); |
CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; |
CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; |
DROP TABLE IF EXISTS t,mysqlt,mysqlt2; |
INSERT INTO t VALUES (1); |
DELETE FROM t WHERE f=29; |
INSERT INTO t2 VALUES (1); |
SET sql_mode=''; |
CREATE TABLE t (c INT); |
INSERT INTO t VALUES (0); |
UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); |
INSERT INTO t VALUES (0); |
CREATE TABLE t0 (a BLOB) ; |
CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; |
SET SESSION autocommit=0; |
DELETE FROM t0; |
CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; |
INSERT INTO mysql.user SELECT * FROM t0; |
INSERT INTO t0 VALUES (0xCFD0); |
SELECT REPEAT ( |
ALTER TABLE t CHANGE COLUMN c c BINARY (0); |
INSERT INTO t0 VALUES (1); |
ALTER TABLE t CHANGE COLUMN a a BINARY (0); |
Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor).
All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL.
Here are the options I used:
MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT
|
SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2"
|
The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also.
As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master.
This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority.
NTS: ~/MDEV-34010
Attachments
Issue Links
- relates to
-
MDEV-9386 Commit failed due to failure of an earlier commit on which this one depends
-
- Open
-
-
MDEV-32372 Assertion `thd->transaction->stmt.is_empty() || thd->in_sub_stmt || (thd->state_flags & Open_tables_state::BACKUPS_AVAIL)' failed in close_thread_tables upon ALTER
-
- Open
-
-
MDEV-33761 WSREP: Parallel slave worker failed at wsrep_before_command() hook
-
- Open
-
-
MDEV-34346 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964 on parallel replica
-
- Confirmed
-
-
MDEV-33954 --slave-skip-errors=all Incompatible with Optimistic Parallel Replication
-
- Confirmed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link | This issue relates to TODO-4062 [ TODO-4062 ] |
Link | This issue relates to MDEV-33761 [ MDEV-33761 ] |
Link | This issue relates to MDEV-32372 [ MDEV-32372 ] |
Attachment | MDEV-34010.tar.xz [ 73452 ] |
Labels | parallelslave sporadic | divergent_slave parallelslave sporadic |
Description |
Likely related: MDEV-9386, MDEV-32372, MDEV-33761, though duplication is not confirmed with any of them. Possibly related: MDEV-33909.
During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB in the CLI. {code:sql} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Likely related: MDEV-9386, MDEV-32372, MDEV-33761, though duplication is not confirmed with any of them. Possibly related: MDEV-33909.
During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB in the CLI. {code:sql:title=Testcase #1} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Description |
Likely related: MDEV-9386, MDEV-32372, MDEV-33761, though duplication is not confirmed with any of them. Possibly related: MDEV-33909.
During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB in the CLI. {code:sql:title=Testcase #1} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Likely related: MDEV-9386, MDEV-32372, MDEV-33761, though duplication is not confirmed with any of them. Possibly related: MDEV-33909.
During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB in the CLI. {code:sql,title=Testcase #1} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Description |
Likely related: MDEV-9386, MDEV-32372, MDEV-33761, though duplication is not confirmed with any of them. Possibly related: MDEV-33909.
During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB in the CLI. {code:sql,title=Testcase #1} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Likely related: MDEV-9386, MDEV-32372, MDEV-33761, though duplication is not confirmed with any of them. Possibly related: MDEV-33909.
During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB in the CLI. {code:sql} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Description |
Likely related: MDEV-9386, MDEV-32372, MDEV-33761, though duplication is not confirmed with any of them. Possibly related: MDEV-33909.
During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB in the CLI. {code:sql} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Likely related: MDEV-9386, MDEV-32372 (which is RBR, ref below), MDEV-33761, though duplication is not confirmed with any of them. Possibly related: MDEV-33909.
During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB in the CLI. {code:sql} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Description |
Likely related: MDEV-9386, MDEV-32372 (which is RBR, ref below), MDEV-33761, though duplication is not confirmed with any of them. Possibly related: MDEV-33909.
During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB in the CLI. {code:sql} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Likely related: MDEV-9386, MDEV-32372 (which is RBR, ref below), MDEV-33761, though duplication is not confirmed with any of them.
Possibly related: MDEV-33909. During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB in the CLI. {code:sql} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Comment |
[ MTR seems to fail to reproduce the issue:
{code:sql} --source include/have_innodb.inc --source include/have_binlog_format_statement.inc --source include/master-slave.inc SET sql_mode=''; CREATE TABLE t (c INT) ENGINE=InnoDB; INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); --sync_slave_with_master INSERT INTO t VALUES (0); --source include/rpl_end.inc {code} Executed as {code:bash} #Loop MTR on main/test.test till it fails LOG="$(mktemp)"; echo "Logfile: ${LOG}"; LOOP=0; while true; do LOOP=$[ ${LOOP} + 1 ]; echo "Loop: ${LOOP}"; ./mtr --mysqld=--slave_skip_errors=ALL --mysqld=--slave-parallel-threads=10 test 2>&1 >>${LOG}; if grep -q '1964' ${LOG}; then break; fi; done {code} ] |
Description |
Likely related: MDEV-9386, MDEV-32372 (which is RBR, ref below), MDEV-33761, though duplication is not confirmed with any of them.
Possibly related: MDEV-33909. During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB in the CLI. {code:sql} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Likely related: MDEV-9386, MDEV-32372 (related assertion), MDEV-33761, though duplication is not confirmed with any of them.
Possibly related: MDEV-33909. During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB in the CLI. {code:sql} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Description |
Likely related: MDEV-9386, MDEV-32372 (related assertion), MDEV-33761, though duplication is not confirmed with any of them.
Possibly related: MDEV-33909. During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB in the CLI. {code:sql} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Likely related: MDEV-9386, MDEV-32372 (related assertion), MDEV-33761, though duplication is not confirmed with any of them.
Possibly related: MDEV-33909. During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB. {code:sql} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Link | This issue relates to MDEV-33954 [ MDEV-33954 ] |
Description |
Likely related: MDEV-9386, MDEV-32372 (related assertion), MDEV-33761, though duplication is not confirmed with any of them.
Possibly related: MDEV-33909. During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB. {code:sql} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. |
Likely related: MDEV-9386, MDEV-32372 (related assertion), MDEV-33761, though duplication is not confirmed with any of them.
Possibly related: MDEV-33909. During testing I regularly see this error: {noformat} [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964 {noformat} I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB. {code:sql} set sql_mode=''; CREATE TABLE t (c YEAR KEY); INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; INSERT INTO t VALUES (1) {code} {code:sql} SET sql_mode=''; CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; DROP TABLE IF EXISTS t,mysqlt,mysqlt2; INSERT INTO t VALUES (1); DELETE FROM t WHERE f=29; INSERT INTO t2 VALUES (1); {code} {code:sql} SET sql_mode=''; CREATE TABLE t (c INT); INSERT INTO t VALUES (0); UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); INSERT INTO t VALUES (0); {code} {code:sql} CREATE TABLE t0 (a BLOB) ; CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; SET SESSION autocommit=0; DELETE FROM t0; CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; INSERT INTO mysql.user SELECT * FROM t0; INSERT INTO t0 VALUES (0xCFD0); SELECT REPEAT ( ALTER TABLE t CHANGE COLUMN c c BINARY (0); INSERT INTO t0 VALUES (1); ALTER TABLE t CHANGE COLUMN a a BINARY (0); {code} Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor). All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL. Here are the options I used: {noformat} MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2" {noformat} The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also. As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master. This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority. NTS: ~/MDEV-34010 |
Fix Version/s | 11.3 [ 28565 ] |
Priority | Critical [ 2 ] | Major [ 3 ] |
Link | This issue relates to MDEV-34346 [ MDEV-34346 ] |
Fix Version/s | 11.1 [ 28549 ] |
Fix Version/s | 11.2(EOL) [ 28603 ] |
I am able to reliably reproduce the issue about 5-15 times for every 30-40 threads in reducer and using pquery (as uploaded in MDEV-34010.tar.xz
).