Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.6.18
-
None
Description
Moving from 10.6.16-MariaDB-log to 10.6.18-MariaDB-log MariaDB Server,
we are experiencing Primary stalling when a DLL is executed using
semi-sync replication with wait point=AFTER_SYNC ( it does not happen with AFTER_COMMIT).
Having a workload with many concurrent users executing DMLs
the primary can stuck if we execute a DDL during a specific moment of the semy-syncs steps.
the problem persists until it reaches the master timeout to switch to asynchronous.
But we use a high value of this timeout and when this problem
occurs the primary can reach the maximum number of connections.
In order to quickly reproduce this scenario
you can execute following steps as a easy example of what we face as same point:
Tables:
CREATE OR REPLACE TABLE table1 (id INT NOT NULL, col1 int, PRIMARY KEY (id)) ENGINE=InnoDB;
|
INSERT INTO table1 VALUES (1,0);
|
CREATE OR REPLACE TABLE table3 (id int auto_increment primary key, col1 int) ENGINE=InnoDB;;
|
Execute following 3 intensive loop in parallel
--- thread 1
|
delimiter //
|
for i in 1..500 do CREATE OR REPLACE TABLE table2(id int primary key); end for; //
|
|
--- thread 2
|
delimiter //
|
for i in 1..2000 do set autocommit=0; update table1 set col1=i where id=1; select * from table1 where id=1; commit; select sleep(0.2); end for; //
|
|
--- thread 3
|
delimiter //
|
for i in 1..1000 do insert into table3 values (null,i); select sleep(0.1); end for //
|
|
After few second from processlist:
All commit and SHOW commands using binlog are stuck
+-------+-------------+------+------------------------------------------------------+---------------------------------------------------------------+ |
| id | command | time | info | state | |
+-------+-------------+------+------------------------------------------------------+---------------------------------------------------------------+ |
| 11369 | Query | 48 | SHOW BINARY LOGS | starting | |
| 11341 | Query | 58 | create or replace table table2(id int primary key) | Waiting for semi-sync ACK from slave | |
| 9596 | Query | 54 | commit | Commit | |
| 9546 | Query | 30 | show master status | starting |
|
| 7285 | Query | 54 | insert into table3 values (null, NAME_CONST('i',88)) | Commit | |
| 53 | Binlog Dump | 4161 | NULL | Master has sent all binlog to slave; waiting for more updates | |
| 51 | Binlog Dump | 4166 | NULL | Master has sent all binlog to slave; waiting for more updates | |
| 47 | Daemon | 4197 | NULL | Waiting for next activation | |
+-------+-------------+------+------------------------------------------------------+---------------------------------------------------------------+ |
the thread 9596 is from thread 2 loop above
I repeated the same stress test in versions 10.6.15 and 10.6.16 without ever encountering any problems.
Attachments
Issue Links
- causes
-
MDEV-35477 rpl_semi_sync_no_missed_ack_after_add_slave fails after MDEV-35109
- Closed