Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.5, 10.6, 10.11, 11.1(EOL), 11.2(EOL), 11.3(EOL), 11.4, 11.5(EOL)
Description
Likely related: MDEV-9386, MDEV-32372 (related assertion), MDEV-33761, though duplication is not confirmed with any of them.
Possibly related: MDEV-33909.
During testing I regularly see this error:
[ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-48, Internal MariaDB error code: 1964
|
I have reduced various testcases for this sporadic bug. Note tables are by default InnoDB.
set sql_mode=''; |
CREATE TABLE t (c YEAR KEY); |
INSERT INTO t VALUES (ST_GEOMFROMTEXT ('POINT(1 1)')); |
DELETE FROM t WHERE c IN (SELECT 1) LIMIT 2; |
INSERT INTO t VALUES (1) |
SET sql_mode=''; |
CREATE TABLE t (f INT) ENGINE=mrg_innodb UNION (t); |
CREATE TEMPORARY TABLE t (i INT) ENGINE=InnoDB; |
CREATE TABLE t2 (c1 CHAR(1)) ENGINE=InnoDB; |
DROP TABLE IF EXISTS t,mysqlt,mysqlt2; |
INSERT INTO t VALUES (1); |
DELETE FROM t WHERE f=29; |
INSERT INTO t2 VALUES (1); |
SET sql_mode=''; |
CREATE TABLE t (c INT); |
INSERT INTO t VALUES (0); |
UPDATE t SET c=REPEAT (0,0) WHERE c=REPEAT (0,0); |
INSERT INTO t VALUES (0); |
CREATE TABLE t0 (a BLOB) ; |
CREATE TABLE t3 (a TINYINT,b INT,c CHAR(0),d VARCHAR(0),e VARCHAR(0),f VARBINARY(0),g MEDIUMBLOB,h BLOB,id BIGINT,KEY(b),KEY(e)) ; |
SET SESSION autocommit=0; |
DELETE FROM t0; |
CREATE TABLE t3 (a INT,b INT UNSIGNED,c BINARY (0),d VARCHAR(0),e VARBINARY(0),f VARCHAR(0),g MEDIUMBLOB,h MEDIUMBLOB,id BIGINT,KEY(b),KEY(e)) ; |
INSERT INTO mysql.user SELECT * FROM t0; |
INSERT INTO t0 VALUES (0xCFD0); |
SELECT REPEAT ( |
ALTER TABLE t CHANGE COLUMN c c BINARY (0); |
INSERT INTO t0 VALUES (1); |
ALTER TABLE t CHANGE COLUMN a a BINARY (0); |
Sporadicity is about 1 in 3, where at least 150+ mariadbd's are started simultaneously (i.e. system load may play a factor).
All testcases require MBR or SBR to be active (but not RBR!) with slave_parallel_threads >1, best set to =10, so the issue reproduces more readily. Also set --slave_skip_errors=ALL.
Here are the options I used:
MASTER_EXTRA="--log_bin=binlog --binlog_format=MIXED --server_id=1" # or =STATEMENT
|
SLAVE_EXTRA="--slave-parallel-threads=10 --slave_skip_errors=ALL --server_id=2"
|
The bug was manually confirmed in 10.5, 10.6 and 11.5 using trunk builds a few days old. I have also added in-between versions as they are almost definitely affected also.
As the error happen on the slave and not the master, it is highly likely that this leads to a discrepant slave, likely without being noticed by the main application writing to the master.
This combined with a very commonplace testcase like the 3rd one, and the issues seen in the wild, makes this bug a top priority to be fixed for parallel slaves, and as such the Critical priority.
NTS: ~/MDEV-34010
Attachments
Issue Links
- relates to
-
MDEV-9386 Commit failed due to failure of an earlier commit on which this one depends
- Open
-
MDEV-32372 Assertion `thd->transaction->stmt.is_empty() || thd->in_sub_stmt || (thd->state_flags & Open_tables_state::BACKUPS_AVAIL)' failed in close_thread_tables upon ALTER
- Open
-
MDEV-33761 WSREP: Parallel slave worker failed at wsrep_before_command() hook
- Open
-
MDEV-34346 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964 on parallel replica
- Confirmed
-
MDEV-33954 --slave-skip-errors=all Incompatible with Optimistic Parallel Replication
- Confirmed