Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.11.18
-
None
-
None
Description
We reported a problem with Galera and async replication earlier but couldn't pinpoint it exactly (MDEV-38257). We since then have set up a test scenario to further investigate.
We have two Galera clusters (A and B) each having a dedicated write node (A1 and B1). The Clusters are connected via bidirectional async replication on nodes A2 and B2 (A3 and B3 left out for clarity):
A1 <--- WSREP ---> A2 <---- async repl. ----> B2 <--- WSREP ---> B1
|
Parallel usage of a sequence on A1 and B1 leads to a deadlock between the write node and the replication node. Symptoms are a replica thread stuck in "Commit" and the write node stuck in "Waiting for certification".
We reproduced the situation with this setup:
Some relevant settings:
wsrep_on = ON
|
wsrep_provider_options="gmcast.peer_timeout=PT10S;evs.inactive_timeout=PT20S;evs.suspect_timeout=PT10S"
|
binlog_format = ROW
|
log_slave_updates=ON
|
wsrep_gtid_mode=ON
|
innodb_autoinc_lock_mode = 2
|
wsrep_slave_threads=1
|
wsrep_restart_slave=1
|
auto_increment_increment=10
|
auto_increment_offset=7
|
wsrep_auto_increment_control=0
|
Necessary DDL:
CREATE SEQUENCE IF NOT EXISTS event_seq |
START WITH 1 INCREMENT BY 0 NOCACHE; |
 |
CREATE TABLE IF NOT EXISTS event_log ( |
id BIGINT NOT NULL, |
event_type VARCHAR(64), |
PRIMARY KEY (id) |
) ENGINE=InnoDB;
|
and then loop over using the sequence:
CREATE OR REPLACE PROCEDURE generate_sequence_load() |
BEGIN
|
DECLARE i INT DEFAULT 0; |
WHILE i < 10000 DO
|
START TRANSACTION; |
INSERT INTO event_log (id, event_type) |
VALUES (NEXT VALUE FOR event_seq, 'test'); |
COMMIT; |
SET i = i + 1; |
END WHILE; |
END |
and then start this procedure in parallel on node A1 and B1.
This was reproducible on 10.11.11, 10.11.15 and 10.11.18. Our production system is running on 10.11.10 now, that versions seems to be unaffected, but we are running into MDEV-35447 so an upgrade would be appreciated.
We are happy to provide any helpful logs or further infos.
Attachments
Issue Links
- relates to
-
MDEV-38257 Galera hangs in "Waiting for certification" when Async Replication is in use
-
- Closed
-