[MDEV-27849] rpl.rpl_start_alter_7 (and 8, mysqbinlog_2) fail in buildbot, [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos Created: 2022-02-15  Updated: 2023-12-11  Resolved: 2023-12-11

Status: Closed
Project: MariaDB Server
Component/s: Replication, Tests
Affects Version/s: 10.8, 10.9, 10.10, 10.11, 11.0
Fix Version/s: 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2

Type: Bug Priority: Major
Reporter: Angelique Sklavounos (Inactive) Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: None

Attachments: File MDEV-27849.cnf     File MDEV-27849.test     File logs.tar.gz    
Issue Links:
Relates
relates to MDEV-31120 Duplicate entry allowed into a UNIQUE... Closed

 Description   

https://buildbot.mariadb.org/#/builders/195/builds/4566/steps/7/logs/stdio
https://buildbot.mariadb.org/#/builders/195/builds/4419/steps/7/logs/stdio

Occurs for rpl.rpl_start_alter_7, rpl.rpl_start_alter_8, rpl.rpl_start_alter_mysqlbinlog_2, rpl.rpl_start_alter_4, rpl.rpl_start_alter_3, rpl.rpl_start_alter_6, rpl.rpl_start_alter_5

rpl.rpl_start_alter_7 'innodb'           w1 [ fail ]
        Test ended at 2022-02-15 02:25:31
CURRENT_TEST: rpl.rpl_start_alter_7
mysqltest: In included file "./include/sync_with_master_gtid.inc": 
included from /buildbot/amd64-ubuntu-1804-msan/build/mysql-test/suite/rpl/t/rpl_start_alter_7.test at line 83:
At line 48: Failed to sync with master
The result from queries just before the failure was:
< snip >
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
connection server_2;
drop database s2;
select @@gtid_binlog_pos;
@@gtid_binlog_pos
12-2-412
connection server_3;
start all slaves;
Warnings:
Note	1937	SLAVE 'm2' started
Note	1937	SLAVE 'm1' started
set default_master_connection = 'm1';
include/wait_for_slave_to_start.inc
set default_master_connection = 'm2';
include/wait_for_slave_to_start.inc
set default_master_connection = 'm1';
include/sync_with_master_gtid.inc
Timeout in master_gtid_wait('11-1-412', 120), current slave GTID position is: 11-1-291,12-2-412.

Not checked with the original failures, but the replica error log in the multi-master setup shows:

2022-11-24  4:29:00 31 [Note] Master 'm1': Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './mysqld-relay-bin-m1.000001' position: 4; GTID position '12-2-2'
2022-11-24  4:29:02 24 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '11-605' for key 'PRIMARY', Gtid 11-1-281, Internal MariaDB error code: 1942
2022-11-24  4:29:02 24 [ERROR] Slave (additional info): Duplicate entry '11-605' for key 'PRIMARY' Error_code: 1062
2022-11-24  4:29:02 24 [Warning] Slave: Duplicate entry '11-605' for key 'PRIMARY' Error_code: 1062
2022-11-24  4:29:02 24 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 49198; GTID position '11-1-280,12-2-334'
2022-11-24  4:29:02 31 [Note] Master 'm1': Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 49198; GTID position '11-1-280,12-2-334', master: 127.0.0.1:16040



 Comments   
Comment by Andrei Elkin [ 2022-02-15 ]

angelique.sklavounos, let me ask you to take over at least to clear out that's its test's slowness to blame on
in MSAN environment. The first thing to do is to try out a manual timeout increase

diff --git a/mysql-test/suite/rpl/t/rpl_start_alter_7.test b/mysql-test/suite/rpl/t/rpl_start_alter_7.test
index 7225c075ea7..79acf1f2c69 100644
--- a/mysql-test/suite/rpl/t/rpl_start_alter_7.test
+++ b/mysql-test/suite/rpl/t/rpl_start_alter_7.test
@@ -79,9 +79,11 @@ set default_master_connection = 'm2';
 
 set default_master_connection = 'm1';
 --let $master_pos= $master_pos_1
+--let $slave_timeout= 600
 --source include/sync_with_master_gtid.inc
 set default_master_connection = 'm2';
 --let $master_pos= $master_pos_2
+--let $slave_timeout= 600
 --source include/sync_with_master_gtid.inc
 
 --echo # cleanup

and if that helps (expected), go to improve include/sync_with_master_gtid.inc that always has had
a valgrind branch that now needs to be generalized to include MSAN.

Comment by Angelique Sklavounos (Inactive) [ 2022-03-10 ]

The fail has not been seen on the newer amd64-ubuntu-2004-msan builder, but is still occasionally seen on kvm-ubsan, amd64-ubuntu-1804-valgrind, kvm-asan, and kvm-fulltest2.

Comment by Angelique Sklavounos (Inactive) [ 2022-11-25 ]

The replica error log shows a duplicate key error (mysqld.3.err):

2022-11-24  4:29:00 31 [Note] Master 'm1': Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './mysqld-relay-bin-m1.000001' position: 4; GTID position '12-2-2'
2022-11-24  4:29:02 24 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '11-605' for key 'PRIMARY', Gtid 11-1-281, Internal MariaDB error code: 1942
2022-11-24  4:29:02 24 [ERROR] Slave (additional info): Duplicate entry '11-605' for key 'PRIMARY' Error_code: 1062
2022-11-24  4:29:02 24 [Warning] Slave: Duplicate entry '11-605' for key 'PRIMARY' Error_code: 1062
2022-11-24  4:29:02 24 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 49198; GTID position '11-1-280,12-2-334'
2022-11-24  4:29:02 31 [Note] Master 'm1': Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 49198; GTID position '11-1-280,12-2-334', master: 127.0.0.1:16040

Error logs attached as logs.tar.gz.

11.0 81c1f526a

rpl.rpl_start_alter_mysqlbinlog_2 'innodb' w4 [ fail ]
        Test ended at 2022-11-24 04:31:01
 
CURRENT_TEST: rpl.rpl_start_alter_mysqlbinlog_2
mysqltest: In included file "./include/sync_with_master_gtid.inc": 
included from /mnt/buildbot/build/mariadb-11.0.1/mysql-test/suite/rpl/t/rpl_start_alter_mysqlbinlog_2.test at line 117:
At line 48: Failed to sync with master
 
The result from queries just before the failure was:
< snip >
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
connection server_2;
DROP DATABASE s2;
connection server_3;
START ALL SLAVES;
Warnings:
Note	1937	SLAVE 'm2' started
Note	1937	SLAVE 'm1' started
SET default_master_connection = 'm1';
include/wait_for_slave_to_start.inc
SET default_master_connection = 'm2';
include/wait_for_slave_to_start.inc
SET default_master_connection = 'm1';
include/sync_with_master_gtid.inc
Timeout in master_gtid_wait('11-1-412', 120), current slave GTID position is: 11-1-280,12-2-412.

Comment by Angelique Sklavounos (Inactive) [ 2023-05-25 ]

https://buildbot.mariadb.net/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/10374

11.0 d5d7c8ba9

rpl.rpl_start_alter_4 'innodb,stmt'      w1 [ fail ]
        Test ended at 2023-02-21 15:45:30
 
CURRENT_TEST: rpl.rpl_start_alter_4
mysqltest: In included file "./include/sync_with_master_gtid.inc": 
included from /usr/share/mysql-test/suite/rpl/include/start_alter_concurrent.inc at line 193:
included from /usr/share/mysql-test/suite/rpl/include/start_alter_include.inc at line 55:
included from /usr/share/mysql-test/suite/rpl/t/rpl_start_alter_4.test at line 37:
At line 48: Failed to sync with master
 
The result from queries just before the failure was:
< snip >
include/sync_with_master_gtid.inc
# Concurrent DML
# Sync slave
include/save_master_gtid.inc
include/sync_with_master_gtid.inc
# Rollback tests
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
# Sync slave
include/save_master_gtid.inc
include/sync_with_master_gtid.inc
Timeout in master_gtid_wait('0-1-36,11-1-135,12-1-137', 120), current slave GTID position is: 0-1-36,11-1-130,12-1-124.

https://buildbot.mariadb.net/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/10374/steps/mtr/logs/mysqld.2.err.1

2023-02-21 15:43:28 41 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '12-701' for key 'PRIMARY', Gtid 12-1-125, Internal MariaDB error code: 1942
2023-02-21 15:43:28 41 [ERROR] Slave (additional info): Duplicate entry '12-701' for key 'PRIMARY' Error_code: 1062
2023-02-21 15:43:28 41 [Warning] Slave: Duplicate entry '12-701' for key 'PRIMARY' Error_code: 1062
2023-02-21 15:43:28 41 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 52213; GTID position '0-1-36,11-1-130,12-1-124'

Comment by Angelique Sklavounos (Inactive) [ 2023-05-25 ]

https://buildbot.mariadb.org/#/builders/168/builds/21371

bb-11.2-midenok-MDEV-27180 86c327775

rpl.rpl_start_alter_3 'innodb,mix'       w7 [ fail ]
        Test ended at 2023-05-18 11:04:58
 
CURRENT_TEST: rpl.rpl_start_alter_3
mysqltest: In included file "./include/sync_with_master_gtid.inc": 
included from /buildbot/amd64-ubuntu-1804-clang10-asan/build/mysql-test/suite/rpl/include/start_alter_concurrent.inc at line 193:
included from /buildbot/amd64-ubuntu-1804-clang10-asan/build/mysql-test/suite/rpl/include/start_alter_include.inc at line 55:
included from /buildbot/amd64-ubuntu-1804-clang10-asan/build/mysql-test/suite/rpl/t/rpl_start_alter_3.test at line 38:
At line 48: Failed to sync with master
 
The result from queries just before the failure was:
< snip >
include/sync_with_master_gtid.inc
# Concurrent DML
# Sync slave
include/save_master_gtid.inc
include/sync_with_master_gtid.inc
# Rollback tests
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
# Sync slave
include/save_master_gtid.inc
include/sync_with_master_gtid.inc
Timeout in master_gtid_wait('0-1-308', 120), current slave GTID position is: 0-1-281.

https://ci.mariadb.org/35116/logs/amd64-ubuntu-1804-clang10-asan/mysqld.2.err.7

2023-05-18 11:02:56 13 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-282' for key 'PRIMARY', Gtid 0-1-282, Internal MariaDB error code: 1942
2023-05-18 11:02:56 13 [ERROR] Slave (additional info): Duplicate entry '0-282' for key 'PRIMARY' Error_code: 1062
2023-05-18 11:02:56 13 [Warning] Slave: Duplicate entry '0-282' for key 'PRIMARY' Error_code: 1062
2023-05-18 11:02:56 13 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 50683; GTID position '0-1-281'

Comment by Angelique Sklavounos (Inactive) [ 2023-05-25 ]

https://buildbot.mariadb.org/#/builders/537/builds/6057

refs/pull/2570/merge 11d7b1c61

rpl.rpl_start_alter_6 'innodb,row'       w31 [ fail ]
        Test ended at 2023-05-23 01:08:10
 
CURRENT_TEST: rpl.rpl_start_alter_6
mysqltest: In included file "./include/sync_with_master_gtid.inc": 
included from /home/buildbot/ppc64le-ubuntu-2004-debug/build/mysql-test/suite/rpl/include/start_alter_concurrent.inc at line 193:
included from /home/buildbot/ppc64le-ubuntu-2004-debug/build/mysql-test/suite/rpl/include/start_alter_include.inc at line 55:
included from /home/buildbot/ppc64le-ubuntu-2004-debug/build/mysql-test/suite/rpl/t/rpl_start_alter_6.test at line 40:
At line 48: Failed to sync with master
 
The result from queries just before the failure was:
< snip >
include/sync_with_master_gtid.inc
# Concurrent DML
# Sync slave
include/save_master_gtid.inc
include/sync_with_master_gtid.inc
# Rollback tests
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
# Sync slave
include/save_master_gtid.inc
include/sync_with_master_gtid.inc
Timeout in master_gtid_wait('0-1-36,11-1-135,12-1-137', 120), current slave GTID position is: 0-1-36,11-1-133,12-1-126.

https://ci.mariadb.org/35225/logs/ppc64le-ubuntu-2004-debug/mysqld.2.err.31

2023-05-23  1:06:09 44 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062:  Duplicate entry '12-701' for key 'PRIMARY', Error_code: 1062; the event's master log master-bin.000001, end_log_pos 55999, Gtid 12-1-127, Internal MariaDB error code: 1942
2023-05-23  1:06:09 44 [ERROR] Slave (additional info): Duplicate entry '12-701' for key 'PRIMARY' Error_code: 1062
2023-05-23  1:06:09 44 [Warning] Slave: Duplicate entry '12-701' for key 'PRIMARY' Error_code: 1062
2023-05-23  1:06:09 44 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 57733; GTID position '0-1-36,11-1-133,12-1-126'

Comment by Daniel Black [ 2023-07-14 ]

MDEV-31120 contains the repeatable test case that produces:

Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062:  Duplicate entry '0-1212750' for key 'PRIMARY'

Comment by Daniel Black [ 2023-07-17 ]

From monty in MDEV-31120, mtr changing mysql.gtid_slave_pos to Aria is exposing the problem. As InnoDB, the default user experience, it doesn't generate errors like on the attached test.

Comment by Marko Mäkelä [ 2023-08-02 ]

The test rpl.rpl_start_alter_6 is occasionally failing like this:

10.8 600751e7693dcf6236d3e6b64fa24d19

rpl.rpl_start_alter_6 'innodb,stmt'      w2 [ fail ]
        Test ended at 2022-06-02 06:08:49
 
CURRENT_TEST: rpl.rpl_start_alter_6
mysqltest: In included file "./include/sync_with_master_gtid.inc": 
included from /home/buildbot/buildbot/build/mariadb-10.8.4/mysql-test/suite/rpl/include/start_alter_concurrent.inc at line 193:
included from /home/buildbot/buildbot/build/mariadb-10.8.4/mysql-test/suite/rpl/include/start_alter_include.inc at line 55:
included from /home/buildbot/buildbot/build/mariadb-10.8.4/mysql-test/suite/rpl/t/rpl_start_alter_6.test at line 40:
At line 48: Failed to sync with master
 
The result from queries just before the failure was:
< snip >
include/sync_with_master_gtid.inc
# Concurrent DML
# Sync slave
include/save_master_gtid.inc
include/sync_with_master_gtid.inc
# Rollback tests
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
# Sync slave
include/save_master_gtid.inc
include/sync_with_master_gtid.inc
Timeout in master_gtid_wait('0-1-36,11-1-135,12-1-137', 120), current slave GTID position is: 0-1-36,11-1-127,12-1-123.

Comment by Marko Mäkelä [ 2023-08-02 ]

Here is a more recent failure, a mandatory staging build of 10.9:

10.9 8760fe11b89450d56269ed062b36b4238d45707f

rpl.rpl_start_alter_6 'innodb,mix'       w21 [ fail ]
        Test ended at 2023-08-02 09:17:22
CURRENT_TEST: rpl.rpl_start_alter_6
mysqltest: In included file "./include/sync_with_master_gtid.inc": 
included from /home/buildbot/amd64-debian-11-msan/build/mysql-test/suite/rpl/include/start_alter_concurrent.inc at line 193:
included from /home/buildbot/amd64-debian-11-msan/build/mysql-test/suite/rpl/include/start_alter_include.inc at line 55:
included from /home/buildbot/amd64-debian-11-msan/build/mysql-test/suite/rpl/t/rpl_start_alter_6.test at line 40:
At line 48: Failed to sync with master
The result from queries just before the failure was:
< snip >
include/sync_with_master_gtid.inc
# Concurrent DML
# Sync slave
include/save_master_gtid.inc
include/sync_with_master_gtid.inc
# Rollback tests
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
ERROR 23000: Duplicate entry '2' for key 'b'
# Sync slave
include/save_master_gtid.inc
include/sync_with_master_gtid.inc
Timeout in master_gtid_wait('0-1-36,11-1-135,12-1-137', 120), current slave GTID position is: 0-1-36,11-1-130,12-1-124.
More results from queries before failure can be found in /home/buildbot/amd64-debian-11-msan/build/mysql-test/var/21/log/rpl_start_alter_6.log

mysqld.2.err.21 looks like the replica server was killed while the test execution was in progress. There is also a var.tar.gz available for download.

Comment by Kristian Nielsen [ 2023-11-15 ]

Agree with Daniel, the direct cause of these failures seem to be that mysql.gtid_slave_pos is using engine=Aria, not InnoDB.
Optimistic parallel replication requires using InnoDB so that updates of the mysql.gtid_slave_pos can be rolled back in case a retry is needed after a deadlock.

I'm not sure exactly what is being retried here. It looks like it must be the START or ROLLBACK ALTER that's being deadlock killed after it has reached the wait_for_prior_commit.

But I think ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB will fix these failures.

Comment by Marko Mäkelä [ 2023-11-17 ]

knielsen, thank you. Do you know why mtr is altering the table from InnoDB to Aria in the first place? The SQL bootstrap script would initially create the table in InnoDB.

If there are any non-InnoDB replication tests, maybe only those tests should use the table in non-InnoDB format?

Comment by Kristian Nielsen [ 2023-11-17 ]

marko, no I don't know. I believe the database installation script creates the table in InnoDB if that engine is available and MyISAM (or these days maybe Aria) if not. Maybe the mtr behavior is historical, or maybe there was some reason for it.

Before blindly pushing the change to the mysql.gtid_slave_pos table for these test cases though, I want to understand why these DDL statements are being rolled back for retry, and if it's correct/safe to do so, so that I do not hide a bug in the START ALTER code.

Comment by Andrei Elkin [ 2023-11-18 ]

elenst, maybe you can cast light upon
> Do you know why mtr is altering the table from InnoDB to Aria in the first place?
Thanks anyway!

Comment by Elena Stepanova [ 2023-11-18 ]

My git blame is as good as everyone's:

commit e22c3810f059e4f6e3ec52f09d35486e0ff80fb6
Author: Sergei Golubchik
Date:   Thu Jun 5 09:04:43 2014 +0200
 
    MDEV-6243 mysql_install_db or mysql_upgrade fails when default_engine=archive
    
    don't use the default storage engine for mysql.gtid_slave_pos, prefer innodb.
    but alter it to myisam in mtr, because many tests run without innodb.

And naturally MyISAM was further switched to Aria along with the system table engine change, by some Monty's commit.

Comment by Kristian Nielsen [ 2023-11-20 ]

I guess mtr uses the same mysql_install_db template for all tests, regardless of restart options (I didn't check)? Then we cannot have innodb tables in there. Maybe we could change it in include/have_innodb.inc.

Anyway, I debugged the rpl_start_alter_6 failure. It turns out the replication deadlock kill+retry is caused by persistent statistics inside InnoDB, dict_stats_save(). This function creates an internal transaction that gets assigned the user-query's THD. The function then goes on to take table(?) locks on the dict tables. When these locks conflict, they cause parallel replication to deadlock kill the later transaction because it has the replication THD assigned. I think this is a redundant/false-alarm retry, since the dict system transaction is internal and should not be able to cause a deadlock.

I'm wondering if it's correct to assign the query/replication THD to the separate InnoDB-internal trx that updates the dict tables. In this case, it causes unnecessary and unexpected transaction rollback and retry in parallel replication.

On the other hand, maybe having a NULL trx->mysql_thd would cause problems in other places. It's not a fatal problem to get spurious rollback+retry in parallel replication (it's handled on the upper layer); but would still be preferable to avoid if it is possible. The user might carefully have arranged for no conflicts to be possible between replicated transactions, and then be surprised / experience problems when such conflicts occur from persistent stats updates.

Comment by Marko Mäkelä [ 2023-11-22 ]

Starting with MDEV-16678, InnoDB pretty much requires a valid trx_t::mysql_thd. The assignment of THD is outside the control of InnoDB, other than that there is some preallocation of purge-related THD objects related to MDEV-16264 and MDEV-11024. But those cannot be used by dict_stats_save().

Comment by Kristian Nielsen [ 2023-11-22 ]

Thanks, Marko.I'll change the tests to use InnoDB for the mysql.gtid_slave_pos table.

For now, I don't think we need to do anything else. Some transaction rollback+retry is expected in in-order parallel replication, and presumably conflicts due to dict stats update will be rare. Let's just keep in mind that this can be a source of otherwise unexpected conflicts in parallel replication, in case we see it in other tests or user reports.

Comment by Kristian Nielsen [ 2023-12-11 ]

Fix pushed to 10.11. Bunch of rpl_start_alter_*.test now change the mysql.gtid_slave_pos table to use InnoDB.

Generated at Thu Feb 08 09:56:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.