Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35109

Semi-sync Replication stalling Primary using wait point=AFTER_SYNC

    XMLWordPrintable

Details

    Description

      Moving from 10.6.16-MariaDB-log to 10.6.18-MariaDB-log MariaDB Server,
      we are experiencing Primary stalling when a DLL is executed using
      semi-sync replication with wait point=AFTER_SYNC ( it does not happen with AFTER_COMMIT).

      Having a workload with many concurrent users executing DMLs
      the primary can stuck if we execute a DDL during a specific moment of the semy-syncs steps.

      the problem persists until it reaches the master timeout to switch to asynchronous.
      But we use a high value of this timeout and when this problem
      occurs the primary can reach the maximum number of connections.

      In order to quickly reproduce this scenario
      you can execute following steps as a easy example of what we face as same point:

      Tables:

      CREATE OR REPLACE TABLE table1 (id INT NOT NULL, col1 int, PRIMARY KEY (id)) ENGINE=InnoDB;
      INSERT INTO table1 VALUES (1,0);
      CREATE OR REPLACE TABLE table3 (id int auto_increment primary key, col1 int) ENGINE=InnoDB;;
      

      Execute following 3 intensive loop in parallel

       --- thread 1
      delimiter //
      for i in 1..500 do CREATE OR REPLACE TABLE table2(id int primary key);  end for; //
       
       --- thread 2
      delimiter //
      for i in 1..2000 do set autocommit=0; update table1 set col1=i where id=1; select * from table1 where id=1; commit; select sleep(0.2);  end for; //
       
       --- thread 3
      delimiter //
      for i in 1..1000 do insert into table3 values (null,i); select sleep(0.1); end for //
      
      

      After few second from processlist:
      All commit and SHOW commands using binlog are stuck

      +-------+-------------+------+------------------------------------------------------+---------------------------------------------------------------+
      | id    | command     | time | info                                                 | state                                                         |
      +-------+-------------+------+------------------------------------------------------+---------------------------------------------------------------+
      | 11369 | Query       |   48 | SHOW BINARY LOGS                                     | starting                                                      |
      | 11341 | Query       |   58 | create or replace table table2(id int primary key)   | Waiting for semi-sync ACK from slave                          |
      |  9596 | Query       |   54 | commit                                               | Commit                                                        |
      |  9546 | Query       |   30 | show master status                                   | starting                                                      |
      |  7285 | Query       |   54 | insert into table3 values (null, NAME_CONST('i',88)) | Commit                                                        |
      |    53 | Binlog Dump | 4161 | NULL                                                 | Master has sent all binlog to slave; waiting for more updates |
      |    51 | Binlog Dump | 4166 | NULL                                                 | Master has sent all binlog to slave; waiting for more updates |
      |    47 | Daemon      | 4197 | NULL                                                 | Waiting for next activation                                   |
      +-------+-------------+------+------------------------------------------------------+---------------------------------------------------------------+
      

      the thread 9596 is from thread 2 loop above

      I repeated the same stress test in versions 10.6.15 and 10.6.16 without ever encountering any problems.

      Attachments

        Issue Links

          Activity

            People

              bnestere Brandon Nesterenko
              andrea.ponzo Andrea Ponzo
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.