Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-40062

Galera + Async Repl. hangs in "Waiting for certification" on Sequence Conflict

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.11.18
    • None
    • Galera, Replication
    • None

    Description

      We reported a problem with Galera and async replication earlier but couldn't pinpoint it exactly (MDEV-38257). We since then have set up a test scenario to further investigate.

      We have two Galera clusters (A and B) each having a dedicated write node (A1 and B1). The Clusters are connected via bidirectional async replication on nodes A2 and B2 (A3 and B3 left out for clarity):

      A1 <--- WSREP ---> A2 <---- async repl. ----> B2 <--- WSREP ---> B1
      

      Parallel usage of a sequence on A1 and B1 leads to a deadlock between the write node and the replication node. Symptoms are a replica thread stuck in "Commit" and the write node stuck in "Waiting for certification".

      We reproduced the situation with this setup:

      Some relevant settings:

      wsrep_on = ON
      wsrep_provider_options="gmcast.peer_timeout=PT10S;evs.inactive_timeout=PT20S;evs.suspect_timeout=PT10S"
      binlog_format = ROW
      log_slave_updates=ON
      wsrep_gtid_mode=ON
      innodb_autoinc_lock_mode = 2
      wsrep_slave_threads=1
      wsrep_restart_slave=1
      auto_increment_increment=10
      auto_increment_offset=7
      wsrep_auto_increment_control=0
      

      Necessary DDL:

      CREATE SEQUENCE IF NOT EXISTS event_seq
          START WITH 1 INCREMENT BY 0 NOCACHE;
       
      CREATE TABLE IF NOT EXISTS event_log (
          id         BIGINT NOT NULL,
          event_type VARCHAR(64),
          PRIMARY KEY (id)
      ) ENGINE=InnoDB;
      

      and then loop over using the sequence:

      CREATE OR REPLACE PROCEDURE generate_sequence_load()
      BEGIN
          DECLARE i INT DEFAULT 0;
          WHILE i < 10000 DO
              START TRANSACTION;
              INSERT INTO event_log (id, event_type)
              VALUES (NEXT VALUE FOR event_seq, 'test');
              COMMIT;
              SET i = i + 1;
          END WHILE;
      END
      

      and then start this procedure in parallel on node A1 and B1.

      This was reproducible on 10.11.11, 10.11.15 and 10.11.18. Our production system is running on 10.11.10 now, that versions seems to be unaffected, but we are running into MDEV-35447 so an upgrade would be appreciated.

      We are happy to provide any helpful logs or further infos.

      Attachments

        Issue Links

          Activity

            People

              seppo Seppo Jaakola
              Andreas.Vogler@geneon.de Andreas Vogler
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.