[MDEV-19157] Optimistic Parallel Replication fails when gtid_slave_pos table is non transnactional and binlog events are transactional Created: 2019-04-03  Updated: 2023-11-14

Status: Stalled
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Major
Reporter: Sachin Setiya (Inactive) Assignee: Brandon Nesterenko
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Arch Linux


Issue Links:
Blocks
is blocked by MDEV-25768 ALTER gtid_slave_pos table storage en... Stalled

 Description   

--source include/have_binlog_format_row.inc
--source include/have_innodb.inc
--source include/master-slave.inc
 
--sync_slave_with_master
--source include/stop_slave.inc
 
SET GLOBAL slave_parallel_threads=7;
set global slave_parallel_mode=optimistic;
set global slave_transaction_retries=1000;
alter table mysql.gtid_slave_pos engine=myisam;
 
# Create some conflicting  events on master
connection master;
CREATE TABLE t1(a INT NOT NULL AUTO_INCREMENT PRIMARY KEY) engine=innodb;
INSERT INTO t1 VALUES (1),(2),(3),(4);
--disable_query_log
--let $count=5
while ($count)
{
update t1 set a=a+5 where a=1;
update t1 set a=a+5 where a=2;
update t1 set a=a+5 where a=3;
update t1 set a=a+5 where a=4;
update t1 set a=a-5 where a=5;
update t1 set a=a-5 where a=6;
--dec $count
}
--enable_query_log
select * from t1;
drop table t1;
 
--connection slave
start slave;
--connection master
--sync_slave_with_master
stop slave;
SET GLOBAL slave_parallel_threads=0;
set global slave_parallel_mode=conservative;
set global slave_transaction_retries=10;
start slave;
--source include/rpl_end.inc

It fails with following error

Last_Errno      1942
Last_Error      Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-11' for key 'PRIMARY'



 Comments   
Comment by Sachin Setiya (Inactive) [ 2019-09-24 ]

If we reduce the $counter to 3 . this test case can trigger MDEV-10761 (Although non-deterministically)

Comment by Andrei Elkin [ 2020-02-23 ]

The blocker status reflects the fact that optimistic is default in 10.5. So it must be fixed in 10.5 until GA.

Comment by Andrei Elkin [ 2020-02-23 ]

The Fix Version is updated to 10.3 as fixes certainly depend on MDEV-12179.

Comment by Andrei Elkin [ 2020-05-27 ]

sachin.setiya.007 Consider to log a warning when slave starts optimistic - while having non-transactional gtid_slave_pos when gtid-pos-auto-engines option is default (NULL). The warning should suggest to either not use optimistic or ALTER the gtid table, or provide a list of engines to gtid-pos-auto-engines. I am not sure about a combination of non-transactional gtid_slave_pos and transactional engines in the option though. Needs investigating.

Comment by Sachin Setiya (Inactive) [ 2020-06-02 ]

HI Elkin, Actually combination of Transactional Storage Engine and Non Transactional gtid_ slave_pos is creating issue, If SE is non transactional we wont have any issue in optimistic replication .

We need different fix for 10.2 and 10.3 onwards(gtid-pos-auto-engines)

Comment by Sachin Setiya (Inactive) [ 2020-06-11 ]

bb-10.2-sachin bb-10.5-sachin

Comment by Andrei Elkin [ 2020-12-15 ]

sachin.setiya.007: what do you mean we won't in

If SE is non transactional we wont have any issue in optimistic replication

If the replicated trx or gtid_slave_pos is non-transactional the optimistic mode error-stop does not guarantee to resume.
Oth, when both are transactional but their SE are different the resume must be fine (though I've not tested that).

Comment by Sachin Setiya (Inactive) [ 2021-04-01 ]

Hi Elkin

Patch for 10.2 https://github.com/MariaDB/server/commit/73d84fe45a2c25b81fa5ea0cb60935f463c5633e
I am working on porting patch from 10.5 to 10.3. At the moment I am getting server crash

#0  0x00007f47ce05fef5 in raise () from /usr/lib/libc.so.6
#1  0x00007f47ce049862 in abort () from /usr/lib/libc.so.6
#2  0x00007f47ce049747 in __assert_fail_base.cold () from /usr/lib/libc.so.6
#3  0x00007f47ce058646 in __assert_fail () from /usr/lib/libc.so.6
#4  0x000055ffddd11b3f in ilink::assert_linked (this=0x7f47b40418c0) at sql/sql_list.h:642
#5  0x000055ffddd12ec6 in unlink_not_visible_thd (thd=0x7f47b40418c0) at sql/sql_class.h:4912
#6  0x000055ffde0240da in handle_rpl_parallel_thread (arg=0x7f47c401aad0) at sql/rpl_parallel.cc:1471
#7  0x000055ffde956508 in pfs_spawn_thread (arg=0x7f47c4039850) at storage/perfschema/pfs.cc:1869
#8  0x00007f47cea49299 in start_thread () from /usr/lib/libpthread.so.0
#9  0x00007f47ce122053 in clone () from /usr/lib/libc.so.6

Comment by Sachin Setiya (Inactive) [ 2021-04-05 ]

10.2 patch https://github.com/MariaDB/server/commit/96c9e1819e1d923f4682db93d940c425beaa15cd
10.3 patch https://github.com/MariaDB/server/commit/db9f8e44c9d65d271548735d0f2ae0c1041ffe22

Comment by Andrei Elkin [ 2021-04-21 ]

Comments provided via slack.
Waiting for a new patch.

Comment by Sachin Setiya (Inactive) [ 2021-05-20 ]

Patch Status:- Old comments have been addressed.

New Review Request:- In addition to warning, can optimistic replication be dialed down so that we do not get duplicate gtid error even if gtid_slave_pos is Non Transactional.

Possible Solution :- In ::do_event if we find that gtid_slave_pos is not transactional , then treat optimistic replication as conservative.

Comment by Sachin Setiya (Inactive) [ 2021-05-21 ]

Patch addressing review comments
https://github.com/MariaDB/server/commit/e8efa0fe82cafd5e855744be4e5206d17a3e2624

Comment by Sachin Setiya (Inactive) [ 2021-05-25 ]

Review can be delayed till MDEV-25768 is fixed, As of now tests in buildbot fails , because START SLAVE acquires lock on the gtid_slave_pos (for reading storage engine info)

Comment by Andrei Elkin [ 2021-08-12 ]

The latest patch was reviewed with a critical note which is MDEV-25768. The patch therefore has to wait
until the latter one's resolution.

Comment by Alice Sherepa [ 2022-01-13 ]

--source include/have_binlog_format_row.inc
--source include/have_innodb.inc
--source include/master-slave.inc
 
--sync_slave_with_master
--source include/stop_slave.inc
 
SET GLOBAL slave_parallel_threads=2;
set global slave_parallel_mode=optimistic;
alter table mysql.gtid_slave_pos engine=myisam;
 
--connection slave
start slave;
 
-- connection master
CREATE TABLE t1 (c1 int, c2 int) Engine=InnoDB;
INSERT INTO t1 VALUES (1,1);
UPDATE t1 SET c1=5 where c2=1;
show binlog events;
 
-- sync_slave_with_master

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
include/master-slave.inc
[connection master]
connection slave;
include/stop_slave.inc
SET GLOBAL slave_parallel_threads=2;
set global slave_parallel_mode=optimistic;
alter table mysql.gtid_slave_pos engine=myisam;
Warnings:
Warning	1478	Table storage engine 'MyISAM' does not support the create option 'TRANSACTIONAL=0'
connection slave;
start slave;
connection master;
CREATE TABLE t1 (c1 int, c2 int) Engine=InnoDB;
INSERT INTO t1 VALUES (1,1);
UPDATE t1 SET c1=5 where c2=1;
show binlog events;
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
master-bin.000001	4	Format_desc	1	256	Server ver: 10.8.0-MariaDB-debug-log, Binlog ver: 4
master-bin.000001	256	Gtid_list	1	285	[]
master-bin.000001	285	Binlog_checkpoint	1	329	master-bin.000001
master-bin.000001	329	Gtid	1	371	GTID 0-1-1
master-bin.000001	371	Query	1	493	use `test`; CREATE TABLE t1 (c1 int, c2 int) Engine=InnoDB
master-bin.000001	493	Gtid	1	535	BEGIN GTID 0-1-2
master-bin.000001	535	Annotate_rows	1	585	INSERT INTO t1 VALUES (1,1)
master-bin.000001	585	Table_map	1	631	table_id: 32 (test.t1)
master-bin.000001	631	Write_rows_v1	1	673	table_id: 32 flags: STMT_END_F
master-bin.000001	673	Xid	1	704	COMMIT /* xid=148 */
master-bin.000001	704	Gtid	1	746	BEGIN GTID 0-1-3
master-bin.000001	746	Annotate_rows	1	798	UPDATE t1 SET c1=5 where c2=1
master-bin.000001	798	Table_map	1	844	table_id: 32 (test.t1)
master-bin.000001	844	Update_rows_v1	1	896	table_id: 32 flags: STMT_END_F
master-bin.000001	896	Xid	1	927	COMMIT /* xid=149 */
main.1_my 'innodb,row'                   [ fail ]
        Test ended at 2022-01-13 16:19:45
 
CURRENT_TEST: main.1_my
analyze: sync_with_master
mysqltest: At line 22: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 927, 300, '')' returned NULL indicating slave SQL thread failure
 
 
 == /home/alice/git/10.7/mysql-test/var/tmp/analyze-sync_with_master-mysqld.2.err ==
 
############################## default ##############################
 
**** SHOW WARNINGS on default ****
SHOW WARNINGS;
Level	Code	Message
 
**** SELECT replication-related variables on default ****
SELECT NOW(), @@SERVER_ID;
NOW()	@@SERVER_ID
2022-01-13 16:19:44	2
 
**** SHOW SLAVE STATUS on default ****
SHOW SLAVE STATUS;
Slave_IO_State	Waiting for master to send event
Master_Host	127.0.0.1
Master_User	root
Master_Port	16000
Connect_Retry	1
Master_Log_File	master-bin.000001
Read_Master_Log_Pos	927
Relay_Log_File	slave-relay-bin.000003
Relay_Log_Pos	931
Relay_Master_Log_File	master-bin.000001
Slave_IO_Running	Yes
Slave_SQL_Running	No
Replicate_Do_DB	
Replicate_Ignore_DB	
Replicate_Do_Table	
Replicate_Ignore_Table	
Replicate_Wild_Do_Table	
Replicate_Wild_Ignore_Table	
Last_Errno	1942
Last_Error	Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-3' for key 'PRIMARY'
Skip_Counter	0
Exec_Master_Log_Pos	704
Relay_Log_Space	1836
Until_Condition	None
Until_Log_File	
Until_Log_Pos	0
Master_SSL_Allowed	No
Master_SSL_CA_File	
Master_SSL_CA_Path	
Master_SSL_Cert	
Master_SSL_Cipher	
Master_SSL_Key	
Seconds_Behind_Master	NULL
Master_SSL_Verify_Server_Cert	No
Last_IO_Errno	0
Last_IO_Error	
Last_SQL_Errno	1942
Last_SQL_Error	Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-3' for key 'PRIMARY'
Replicate_Ignore_Server_Ids	
Master_Server_Id	1
Master_SSL_Crl	
Master_SSL_Crlpath	
Using_Gtid	No
Gtid_IO_Pos	
Replicate_Do_Domain_Ids	
Replicate_Ignore_Domain_Ids	
Parallel_Mode	optimistic
SQL_Delay	0
SQL_Remaining_Delay	NULL
Slave_SQL_Running_State	
Slave_DDL_Groups	1
Slave_Non_Transactional_Groups	0
Slave_Transactional_Groups	3
 
**** SHOW MASTER STATUS on default ****
SHOW MASTER STATUS;
File	slave-bin.000001
Position	867
Binlog_Do_DB	
Binlog_Ignore_DB	
 
**** SHOW SLAVE HOSTS on default ****
SHOW SLAVE HOSTS;
 
**** SHOW PROCESSLIST on default ****
SHOW PROCESSLIST;
Id	User	Host	db	Command	Time	State	Info	Progress
13	system user		NULL	Slave_IO	0	Waiting for master to send event	NULL	0.000
17	root	localhost	NULL	Query	0	starting	SHOW PROCESSLIST	0.000
 
**** SHOW BINARY LOGS on default ****
SHOW BINARY LOGS;
Log_name	File_size
slave-bin.000001	867
 
**** SHOW BINLOG EVENTS on default ****
binlog_name = 'slave-bin.000001'
SHOW BINLOG EVENTS IN 'slave-bin.000001';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
slave-bin.000001	4	Format_desc	2	256	Server ver: 10.8.0-MariaDB-debug-log, Binlog ver: 4
slave-bin.000001	256	Gtid_list	2	285	[]
slave-bin.000001	285	Binlog_checkpoint	2	328	slave-bin.000001
slave-bin.000001	328	Gtid	2	370	GTID 0-2-1
slave-bin.000001	370	Query	2	492	use `test`; alter table mysql.gtid_slave_pos engine=myisam
slave-bin.000001	492	Gtid	1	534	GTID 0-1-1
slave-bin.000001	534	Query	1	656	use `test`; CREATE TABLE t1 (c1 int, c2 int) Engine=InnoDB
slave-bin.000001	656	Gtid	1	698	BEGIN GTID 0-1-2
slave-bin.000001	698	Annotate_rows	1	748	INSERT INTO t1 VALUES (1,1)
slave-bin.000001	748	Table_map	1	794	table_id: 34 (test.t1)
slave-bin.000001	794	Write_rows_v1	1	836	table_id: 34 flags: STMT_END_F
slave-bin.000001	836	Xid	1	867	COMMIT /* xid=249 */
 
**** SHOW RELAYLOG EVENTS on default ****
relaylog_name = 'slave-relay-bin.000003'
SHOW RELAYLOG EVENTS IN 'slave-relay-bin.000003';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
slave-relay-bin.000003	4	Format_desc	2	256	Server ver: 10.8.0-MariaDB-debug-log, Binlog ver: 4
slave-relay-bin.000003	256	Rotate	1	0	master-bin.000001;pos=329
slave-relay-bin.000003	304	Format_desc	1	0	Server ver: 10.8.0-MariaDB-debug-log, Binlog ver: 4
slave-relay-bin.000003	556	Gtid	1	371	GTID 0-1-1
slave-relay-bin.000003	598	Query	1	493	use `test`; CREATE TABLE t1 (c1 int, c2 int) Engine=InnoDB
slave-relay-bin.000003	720	Gtid	1	535	BEGIN GTID 0-1-2
slave-relay-bin.000003	762	Annotate_rows	1	585	INSERT INTO t1 VALUES (1,1)
slave-relay-bin.000003	812	Table_map	1	631	table_id: 32 (test.t1)
slave-relay-bin.000003	858	Write_rows_v1	1	673	table_id: 32 flags: STMT_END_F
slave-relay-bin.000003	900	Xid	1	704	COMMIT /* xid=148 */
slave-relay-bin.000003	931	Gtid	1	746	BEGIN GTID 0-1-3
slave-relay-bin.000003	973	Annotate_rows	1	798	UPDATE t1 SET c1=5 where c2=1
slave-relay-bin.000003	1025	Table_map	1	844	table_id: 32 (test.t1)
slave-relay-bin.000003	1071	Update_rows_v1	1	896	table_id: 32 flags: STMT_END_F
slave-relay-bin.000003	1123	Xid	1	927	COMMIT /* xid=149 */
connection default;
 
 == /home/alice/git/10.7/mysql-test/var/tmp/analyze-sync_with_master-mysqld.1.err ==
 
############################## default ##############################
 
**** SHOW WARNINGS on default ****
SHOW WARNINGS;
Level	Code	Message
 
**** SELECT replication-related variables on default ****
SELECT NOW(), @@SERVER_ID;
NOW()	@@SERVER_ID
2022-01-13 16:19:44	1
 
**** SHOW SLAVE STATUS on default ****
SHOW SLAVE STATUS;
 
**** SHOW MASTER STATUS on default ****
SHOW MASTER STATUS;
File	master-bin.000001
Position	927
Binlog_Do_DB	
Binlog_Ignore_DB	
 
**** SHOW SLAVE HOSTS on default ****
SHOW SLAVE HOSTS;
Server_id	2
Host	127.0.0.1
Port	16001
Master_id	1
 
**** SHOW PROCESSLIST on default ****
SHOW PROCESSLIST;
Id	User	Host	db	Command	Time	State	Info	Progress
11	root	localhost:43296	NULL	Binlog Dump	0	Master has sent all binlog to slave; waiting for more updates	NULL	0.000
12	root	localhost	NULL	Query	0	starting	SHOW PROCESSLIST	0.000
 
**** SHOW BINARY LOGS on default ****
SHOW BINARY LOGS;
Log_name	File_size
master-bin.000001	927
 
**** SHOW BINLOG EVENTS on default ****
binlog_name = 'master-bin.000001'
SHOW BINLOG EVENTS IN 'master-bin.000001';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
master-bin.000001	4	Format_desc	1	256	Server ver: 10.8.0-MariaDB-debug-log, Binlog ver: 4
master-bin.000001	256	Gtid_list	1	285	[]
master-bin.000001	285	Binlog_checkpoint	1	329	master-bin.000001
master-bin.000001	329	Gtid	1	371	GTID 0-1-1
master-bin.000001	371	Query	1	493	use `test`; CREATE TABLE t1 (c1 int, c2 int) Engine=InnoDB
master-bin.000001	493	Gtid	1	535	BEGIN GTID 0-1-2
master-bin.000001	535	Annotate_rows	1	585	INSERT INTO t1 VALUES (1,1)
master-bin.000001	585	Table_map	1	631	table_id: 32 (test.t1)
master-bin.000001	631	Write_rows_v1	1	673	table_id: 32 flags: STMT_END_F
master-bin.000001	673	Xid	1	704	COMMIT /* xid=148 */
master-bin.000001	704	Gtid	1	746	BEGIN GTID 0-1-3
master-bin.000001	746	Annotate_rows	1	798	UPDATE t1 SET c1=5 where c2=1
master-bin.000001	798	Table_map	1	844	table_id: 32 (test.t1)
master-bin.000001	844	Update_rows_v1	1	896	table_id: 32 flags: STMT_END_F
master-bin.000001	896	Xid	1	927	COMMIT /* xid=149 */
 
**** SHOW RELAYLOG EVENTS on default ****
relaylog_name = 'No such row'
SHOW RELAYLOG EVENTS IN 'No such row';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
connection default;

Comment by Brandon Nesterenko [ 2023-11-14 ]

Review incomplete, moving ticket to Stalled because it depends on MDEV-25768, which in turn depends on MDEV-28373.

Generated at Thu Feb 08 08:49:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.