[MDEV-24622] Replication does not support bulk insert into empty table Created: 2021-01-19  Updated: 2024-02-01

Status: Open
Project: MariaDB Server
Component/s: Replication, Storage Engine - InnoDB
Affects Version/s: 10.6
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: regression-10.6

Issue Links:
Problem/Incident
is caused by MDEV-515 innodb bulk insert Closed
Relates
relates to MDEV-27751 InnoDB: Failing assertion: !cursor->i... Closed
relates to MDEV-33350 Bulk InnoDB LOAD DATA breaks replicat... Confirmed

 Description   

MDEV-515 will replace row-level locking with table-level locking in the special case that an INSERT is performed into an empty table or partition.

This causes a few replication tests to fail. We have adjusted some tests by inserting an extra record into the table, so that the normal row-level locking and undo logging will be used. But, some tests will require deeper studying to assess whether such a work-around is acceptable or whether some improvements to the replication logic will be needed. Some failures are nondeterministic:

multi_source.gtid_ignore_duplicates 'innodb' [ pass ]   5774
multi_source.gtid_ignore_duplicates 'innodb' [ 2 pass ]   4660
multi_source.gtid_ignore_duplicates 'innodb' [ 3 pass ]   5678
multi_source.gtid_ignore_duplicates 'innodb' [ 4 fail ]  Found warnings/errors in server log file!
        Test ended at 2021-01-19 09:33:32
line
2021-01-19  9:33:28 92 [Warning] Slave: Connection was killed Error_code: 1927
2021-01-19  9:33:28 92 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
2021-01-19  9:33:28 92 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2021-01-19  9:33:28 89 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
2021-01-19  9:33:28 89 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
2021-01-19  9:33:28 89 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964



 Comments   
Comment by Andrei Elkin [ 2021-01-19 ]

alice: I am suspecting a parallel slave replay happens behind the scene, which is fine. So hopefully it's just a matter of mtr.suppression. Could you please investigate more.

Comment by Alice Sherepa [ 2021-01-20 ]

It seems to be broken with parallel replication. Here is the test to demonstrate it (derived from gtid_ignore_duplicates.test, shorter version, + no primary key), (add gtid_ignore_duplicates.cnf file ):

--source include/not_embedded.inc
--source include/have_innodb.inc
--source include/have_debug.inc
 
--echo *** Test all-to-all replication with --gtid-ignore-duplicates ***
 
--connect (server_1,127.0.0.1,root,,,$SERVER_MYPORT_1)
--connect (server_2,127.0.0.1,root,,,$SERVER_MYPORT_2)
--connect (server_3,127.0.0.1,root,,,$SERVER_MYPORT_3)
 
# Setup A <-> B, B <-> C, C <-> A
 
--connection server_1
SET @old_parallel= @@GLOBAL.slave_parallel_threads;
SET GLOBAL slave_parallel_threads=5;
SET @old_ignore_duplicates= @@GLOBAL.gtid_ignore_duplicates;
SET GLOBAL gtid_ignore_duplicates=1;
SET GLOBAL gtid_domain_id= 1;
SET SESSION gtid_domain_id= 1;
--replace_result $SERVER_MYPORT_2 MYPORT_2
eval CHANGE MASTER 'b2a' TO master_port=$SERVER_MYPORT_2, master_host='127.0.0.1', master_user='root', master_use_gtid=slave_pos;
--replace_result $SERVER_MYPORT_3 MYPORT_3
eval CHANGE MASTER 'c2a' TO master_port=$SERVER_MYPORT_3, master_host='127.0.0.1', master_user='root', master_use_gtid=slave_pos;
set default_master_connection = 'b2a';
START SLAVE;
--source include/wait_for_slave_to_start.inc
set default_master_connection = 'c2a';
START SLAVE;
--source include/wait_for_slave_to_start.inc
set default_master_connection = '';
 
--connection server_2
SET @old_parallel= @@GLOBAL.slave_parallel_threads;
SET GLOBAL slave_parallel_threads=5;
SET @old_ignore_duplicates= @@GLOBAL.gtid_ignore_duplicates;
SET GLOBAL gtid_ignore_duplicates=1;
SET GLOBAL gtid_domain_id= 2;
SET SESSION gtid_domain_id= 2;
--replace_result $SERVER_MYPORT_1 MYPORT_1
eval CHANGE MASTER 'a2b' TO master_port=$SERVER_MYPORT_1, master_host='127.0.0.1', master_user='root', master_use_gtid=slave_pos;
--replace_result $SERVER_MYPORT_3 MYPORT_3
eval CHANGE MASTER 'c2b' TO master_port=$SERVER_MYPORT_3, master_host='127.0.0.1', master_user='root', master_use_gtid=slave_pos;
set default_master_connection = 'a2b';
START SLAVE;
--source include/wait_for_slave_to_start.inc
set default_master_connection = 'c2b';
START SLAVE;
--source include/wait_for_slave_to_start.inc
set default_master_connection = '';
 
--connection server_3
SET @old_parallel= @@GLOBAL.slave_parallel_threads;
SET GLOBAL slave_parallel_threads=5;
SET @old_ignore_duplicates= @@GLOBAL.gtid_ignore_duplicates;
SET GLOBAL gtid_ignore_duplicates=1;
SET GLOBAL gtid_domain_id= 3;
SET SESSION gtid_domain_id= 3;
--replace_result $SERVER_MYPORT_1 MYPORT_1
eval CHANGE MASTER 'a2c' TO master_port=$SERVER_MYPORT_1, master_host='127.0.0.1', master_user='root', master_use_gtid=slave_pos;
--replace_result $SERVER_MYPORT_2 MYPORT_2
eval CHANGE MASTER 'b2c' TO master_port=$SERVER_MYPORT_2, master_host='127.0.0.1', master_user='root', master_use_gtid=slave_pos;
set default_master_connection = 'a2c';
START SLAVE;
--source include/wait_for_slave_to_start.inc
set default_master_connection = 'b2c';
START SLAVE;
--source include/wait_for_slave_to_start.inc
set default_master_connection = '';
 
--connection server_1
ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
CALL mtr.add_suppression("This change will not take full effect until all SQL threads have been restarted");
CREATE TABLE t1 (a INT ) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1);
start transaction;
INSERT INTO t1 VALUES (2);
INSERT INTO t1 VALUES (3);
commit;
INSERT INTO t1 VALUES (4);
INSERT INTO t1 VALUES (10);
 
--source include/save_master_gtid.inc
 
--connection server_2
--source include/sync_with_master_gtid.inc
SELECT * FROM t1 ORDER BY a;
 
--connection server_3
--source include/sync_with_master_gtid.inc
SELECT * FROM t1 ORDER BY a;
 
--connection server_1
--source include/sync_with_master_gtid.inc
SELECT * FROM t1 ORDER BY a;
 
# Clean up.
--connection server_1
SET GLOBAL gtid_domain_id=0;
--sorted_result
STOP ALL SLAVES;
SET GLOBAL slave_parallel_threads= @old_parallel;
SET GLOBAL gtid_ignore_duplicates= @old_ignore_duplicates;
 
--connection server_2
SET GLOBAL gtid_domain_id=0;
--sorted_result
STOP ALL SLAVES;
SET GLOBAL slave_parallel_threads= @old_parallel;
SET GLOBAL gtid_ignore_duplicates= @old_ignore_duplicates;
 
--connection server_3
SET GLOBAL gtid_domain_id=0;
--sorted_result
STOP ALL SLAVES;
SET GLOBAL slave_parallel_threads= @old_parallel;
SET GLOBAL gtid_ignore_duplicates= @old_ignore_duplicates;
 
--connection server_1
DROP TABLE t1;
ALTER TABLE mysql.gtid_slave_pos ENGINE=Aria;
--source include/reset_master_slave.inc
--disconnect server_1
 
--connection server_2
DROP TABLE t1;
ALTER TABLE mysql.gtid_slave_pos ENGINE=Aria;
--source include/reset_master_slave.inc
--disconnect server_2
 
--connection server_3
DROP TABLE t1;
ALTER TABLE mysql.gtid_slave_pos ENGINE=Aria;
--source include/reset_master_slave.inc
--disconnect server_3

as a result on server_2 and 3 all different variations of data (run it with repeat=N):

SELECT * FROM t1 ORDER BY a;
a
1
2
3
4
10
connection server_3;
include/sync_with_master_gtid.inc
SELECT * FROM t1 ORDER BY a;
a
1
1
2
2
3
3
4
4
10
10
---------------
connection server_2;
include/sync_with_master_gtid.inc
SELECT * FROM t1 ORDER BY a;
a
1
1
2
2
3
3
4
4
10
10
connection server_3;
include/sync_with_master_gtid.inc
SELECT * FROM t1 ORDER BY a;
a
1
2
3
4
10
----------------------
connection server_2;
include/sync_with_master_gtid.inc
SELECT * FROM t1 ORDER BY a;
a
1
2
3
4
4
10
10
connection server_3;
include/sync_with_master_gtid.inc
SELECT * FROM t1 ORDER BY a;
a
1
2
2
3
3
4
10
---------------------------------
connection server_2;
include/sync_with_master_gtid.inc
SELECT * FROM t1 ORDER BY a;
a
1
2
3
4
10
connection server_3;
include/sync_with_master_gtid.inc
SELECT * FROM t1 ORDER BY a;
a
1
2
2
3
3
4
4
10

Comment by Marko Mäkelä [ 2021-01-20 ]

Elkin, you suggested that adding call mtr.add_suppressions() for deadlock warnings could be acceptable so that the tests could remain enabled? But what about test failures, such as lock wait timeouts? I suppose that all these are more or less nondeterministic. Here is the output of a recent run:

./mtr --enable-disabled --force --big-test --max-test-fail=0 multi_source.{multi_parallel,gtid_ignore_duplicates} rpl.rpl_parallel_{multilevel,no_log_slave_updates}

run from my system:

multi_source.gtid_ignore_duplicates 'innodb' [ pass ]   5832
multi_source.multi_parallel 'innodb'     [ fail ]
        Test ended at 2021-01-20 17:52:43
 
CURRENT_TEST: multi_source.multi_parallel
--- /mariadb/10.6/mysql-test/suite/multi_source/multi_parallel.result	2020-07-14 16:19:22.666935155 +0300
+++ /mariadb/10.6/mysql-test/suite/multi_source/multi_parallel.reject	2021-01-20 17:52:43.337650439 +0200
@@ -44,7 +44,7 @@
 connection slave;
 Slave_DDL_Groups= 40;
 Slave_Non_Transactional_Groups= 20;
-Slave_Transactional_Groups= 20;
+Slave_Transactional_Groups= 31;
 stop all slaves;
 Warnings:
 Note	1938	SLAVE 'master2' stopped
 
mysqltest: Result content mismatch
 
 - saving '/dev/shm/10.6/mysql-test/var/log/multi_source.multi_parallel-innodb/' to '/dev/shm/10.6/mysql-test/var/log/multi_source.multi_parallel-innodb/'
rpl.rpl_parallel_no_log_slave_updates 'innodb,stmt' [ fail ]
        Test ended at 2021-01-20 17:57:46
 
CURRENT_TEST: rpl.rpl_parallel_no_log_slave_updates
mysqltest: In included file "./include/wait_for_slave_param.inc": 
included from ./include/wait_for_slave_sql_error.inc at line 41:
included from /mariadb/10.6/mysql-test/suite/rpl/t/rpl_parallel_no_log_slave_updates.test at line 136:
At line 119: Timeout in include/wait_for_slave_param.inc
 
The result from queries just before the failure was:
< snip >
master-bin.000001	1065	Xid	1	1096	COMMIT /* xid=124 */
master-bin.000001	1096	Gtid	1	1140	BEGIN GTID 0-1-5 cid=124
master-bin.000001	1140	Query	1	1299	use `test`; INSERT INTO t3 VALUES (32, foo(32,
'ha_write_row_end SIGNAL t2_query WAIT_FOR t2_cont',
''))
master-bin.000001	1299	Query	1	1501	use `test`; INSERT INTO t3 VALUES (33, foo(33,
'wait_for_prior_commit_waiting SIGNAL t2_waiting',
'wait_for_prior_commit_killed SIGNAL t2_killed'))
master-bin.000001	1501	Xid	1	1532	COMMIT /* xid=130 */
master-bin.000001	1532	Gtid	1	1576	BEGIN GTID 0-1-6 cid=124
master-bin.000001	1576	Query	1	1686	use `test`; INSERT INTO t3 VALUES (34, foo(34,
'',
''))
master-bin.000001	1686	Xid	1	1717	COMMIT /* xid=138 */
 
**** SHOW RELAYLOG EVENTS on server_1 ****
relaylog_name = 'No such row'
SHOW RELAYLOG EVENTS IN 'No such row';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
connection server_2;
 
More results from queries before failure can be found in /dev/shm/10.6/mysql-test/var/log/rpl_parallel_no_log_slave_updates.log
 
 - saving '/dev/shm/10.6/mysql-test/var/log/rpl.rpl_parallel_no_log_slave_updates-innodb,stmt/' to '/dev/shm/10.6/mysql-test/var/log/rpl.rpl_parallel_no_log_slave_updates-innodb,stmt/'
rpl.rpl_parallel_multilevel 'innodb,mix' [ fail ]
        Test ended at 2021-01-20 17:57:49
 
CURRENT_TEST: rpl.rpl_parallel_multilevel
--- /mariadb/10.6/mysql-test/suite/rpl/r/rpl_parallel_multilevel.result	2020-07-14 16:19:22.810936833 +0300
+++ /mariadb/10.6/mysql-test/suite/rpl/r/rpl_parallel_multilevel.reject	2021-01-20 17:57:48.745294330 +0200
@@ -57,7 +57,7 @@
 2	11
 3	1
 status
-Ok, no retry
+ERROR: 2 retries during replication (was 0 now 2)
 connection server_3;
 include/start_slave.inc
 include/sync_with_master_gtid.inc
@@ -67,7 +67,7 @@
 2	11
 3	1
 status
-Ok, no retry
+ERROR: 1 retries during replication (was 0 now 1)
 connection server_4;
 include/start_slave.inc
 include/sync_with_master_gtid.inc
 
mysqltest: Result length mismatch
 
 - saving '/dev/shm/10.6/mysql-test/var/log/rpl.rpl_parallel_multilevel-innodb,mix/' to '/dev/shm/10.6/mysql-test/var/log/rpl.rpl_parallel_multilevel-innodb,mix/'
rpl.rpl_parallel_multilevel 'innodb,row' [ pass ]   1452
rpl.rpl_parallel_multilevel 'innodb,stmt' [ fail ]
        Test ended at 2021-01-20 17:57:53
 
CURRENT_TEST: rpl.rpl_parallel_multilevel
--- /mariadb/10.6/mysql-test/suite/rpl/r/rpl_parallel_multilevel.result	2020-07-14 16:19:22.810936833 +0300
+++ /mariadb/10.6/mysql-test/suite/rpl/r/rpl_parallel_multilevel.reject	2021-01-20 17:57:53.669353064 +0200
@@ -57,7 +57,7 @@
 2	11
 3	1
 status
-Ok, no retry
+ERROR: 2 retries during replication (was 0 now 2)
 connection server_3;
 include/start_slave.inc
 include/sync_with_master_gtid.inc
@@ -67,7 +67,7 @@
 2	11
 3	1
 status
-Ok, no retry
+ERROR: 1 retries during replication (was 0 now 1)
 connection server_4;
 include/start_slave.inc
 include/sync_with_master_gtid.inc
@@ -77,7 +77,7 @@
 2	11
 3	1
 status
-Ok, no retry
+ERROR: 2 retries during replication (was 0 now 2)
 *** MDEV-6676: Test that the FL_WAITED flag in GTID is preserved in slave binlog ***
 connection server_2;
 include/stop_slave.inc
 
mysqltest: Result length mismatch
 
 - saving '/dev/shm/10.6/mysql-test/var/log/rpl.rpl_parallel_multilevel-innodb,stmt/' to '/dev/shm/10.6/mysql-test/var/log/rpl.rpl_parallel_multilevel-innodb,stmt/'
--------------------------------------------------------------------------
The servers were restarted 5 times
Spent 7.284 of 321 seconds executing testcases
 
Completed: Failed 4/6 tests, 33.33% were successful.
 
Failing test(s): multi_source.multi_parallel rpl.rpl_parallel_no_log_slave_updates rpl.rpl_parallel_multilevel

Comment by Alice Sherepa [ 2021-01-21 ]

1) multi_parallel.test test is written to check counters and expects Slave_Transactional_Groups to be 20 (10 times loop * 2 inserts in the innodb table), in relaylog file everything is correct, but Slave_Transactional_Groups counter in the test is jumping between 27..38.
Failure not related to multi_source, below is the test to demonstrate the problem:

--source include/master-slave.inc
--source include/have_innodb.inc
 
--connection slave
--source include/stop_slave.inc
SET @old_parallel_threads= @@GLOBAL.slave_parallel_threads;
SET @@global.slave_parallel_threads= 2;  
--source include/start_slave.inc
 
--connection master
 
--let $loop_counter=10
--disable_query_log
while ($loop_counter)
{
  create table t1(a int) engine=innodb;
  insert into t1 values(1);
  insert into t1 values(2);
  drop table t1;
  dec $loop_counter;
}
--enable_query_log
 
--sync_slave_with_master
 
--let $status= query_get_value(show slave status, Slave_Transactional_Groups, 1)
--echo Slave_Transactional_Groups= $status;
 
#Cleanup
--connection slave
--source include/stop_slave.inc
SET @@global.slave_parallel_threads= @old_parallel_threads;
#SET @@global.slave_parallel_mode   = @old_parallel_mode;
--source include/start_slave.inc
--source include/rpl_end.inc

Please run it with --repeat=N --force-restart

2) In some cases test fails as "sync_slave_with_master failed"
from mysqld.2.err

2021-01-21 14:24:51 14 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-3' for key 'PRIMARY', Gtid 0-1-3, Internal MariaDB error code: 1942
2021-01-21 14:24:51 14 [ERROR] Slave (additional info): Duplicate entry '0-3' for key 'PRIMARY' Error_code: 1062
2021-01-21 14:24:51 14 [Warning] Slave: Duplicate entry '0-3' for key 'PRIMARY' Error_code: 1062
2021-01-21 14:24:51 14 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 638
2021-01-21 14:24:51 12 [Note] Error reading relay log event: slave SQL thread was killed
2021-01-21 14:24:51 13 [ERROR] Slave SQL: Error 'Commit failed due to failure of an earlier commit on which this one depends' on query. Default database: 'test'. Query: 'DROP TABLE `t1` /* generated by server */', Gtid 0-1-4, Internal MariaDB error code: 1964
2021-01-21 14:24:51 13 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2021-01-21 14:24:51 13 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 638

3)in the test table mysql.gtid_slave_pos is of engine=Aria (should it be altered to InnoDB?)

Comment by Marko Mäkelä [ 2021-01-21 ]

Even after today’s update to MDEV-515 (with refactored undo log and ROLLBACK changes), the same tests fail.

Comment by Marko Mäkelä [ 2021-01-22 ]

Based on seppo’s suggestion in MDEV-24623, I disabled the MDEV-515 logic on replicas (by introducing an accessor thd_is_slave(), which should be removed as part of fixing this).

Generated at Thu Feb 08 09:31:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.