Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20645

Replication consistency is broken as workers miss the error notification from an earlier failed group.

Details

    Description

      Enable parallel replication on slave.
      slave_parallel_mode='optimistic'.

      Table Structure:
      CREATE TABLE t2 (a int PRIMARY KEY) ENGINE=InnoDB;

      Execute following DML operations on master.

      INSERT INTO t2 VALUES (32);
      INSERT INTO t2 VALUES (33);
      INSERT INTO t2 VALUES (34);

      The above three transactions are scheduled for parallel execution on slave.
      The first insert fails on slave due to duplicate key error. Upon error the rest
      of the workers should abort but transaction 34 gets committed.

      Attachments

        Activity

          sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - Hello Andrei, Please review the fix for MDEV-20645 . https://github.com/MariaDB/server/commit/e07caf401c26cf8144899336d103e4c7aafd3d7a http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.1-sujatha Thank you.
          Elkin Andrei Elkin added a comment -

          Sujatha, the patch looks nice, and thanks for a good piece of work!

          I only was not happy about cluttering up the code block now with the 2nd
          simulation

          @@ -1096,6 +1102,13 @@ handle_rpl_parallel_thread(void *arg)
          bool did_enter_cond= false;
          PSI_stage_info old_stage;

          + DBUG_EXECUTE_IF("hold_worker_on_schedule", {
          + if (rgi->current_gtid.domain_id == 0 &&
          + rgi->current_gtid.seq_no == 100)

          { + debug_sync_set_action(thd, + STRING_WITH_LEN("now SIGNAL reached_pause WAIT_FOR continue_worker")); + }

          + });
          DBUG_EXECUTE_IF("rpl_parallel_scheduled_gtid_0_x_100", {
          if (rgi->current_gtid.domain_id == 0 &&
          rgi->current_gtid.seq_no == 100) {

          which largely copies the 1st one. We could actually keep just one generic simulation block
          and employ the statement format's user variables to carry to the worker various
          things like
          gtid and the reaction string. Eventually it would be something like this
          pseudo-code:

              if ((event_type= qev->ev->get_type_code()) == GTID_EVENT)
                {
                  ...
                  DBUG_EXECUTE_IF("hold_worker_before_gco",
                                                   {
                                                      if (rgi->current_gtid.seq_no == "@gtid_for_hold_worker_before_gco")
                                                                 debug_sync_set_action(thd, "@action_for_hold_worker_before_gco")
                                                    });
                  }
          

          I am just throwing in the idea without urging yet to discuss it and implement. We would certainly benefit in having this sort of simulation policy which just bound with STATEMENT format a bit to much to my taste (a tentative feeling; should we have user variables logged along with Rows_log_events ...).
          some policy and

          Elkin Andrei Elkin added a comment - Sujatha, the patch looks nice, and thanks for a good piece of work! I only was not happy about cluttering up the code block now with the 2nd simulation @@ -1096,6 +1102,13 @@ handle_rpl_parallel_thread(void *arg) bool did_enter_cond= false; PSI_stage_info old_stage; + DBUG_EXECUTE_IF("hold_worker_on_schedule", { + if (rgi->current_gtid.domain_id == 0 && + rgi->current_gtid.seq_no == 100) { + debug_sync_set_action(thd, + STRING_WITH_LEN("now SIGNAL reached_pause WAIT_FOR continue_worker")); + } + }); DBUG_EXECUTE_IF("rpl_parallel_scheduled_gtid_0_x_100", { if (rgi->current_gtid.domain_id == 0 && rgi->current_gtid.seq_no == 100) { which largely copies the 1st one. We could actually keep just one generic simulation block and employ the statement format's user variables to carry to the worker various things like gtid and the reaction string. Eventually it would be something like this pseudo-code: if ((event_type= qev->ev->get_type_code()) == GTID_EVENT) { ... DBUG_EXECUTE_IF( "hold_worker_before_gco" , { if (rgi->current_gtid.seq_no == "@gtid_for_hold_worker_before_gco" ) debug_sync_set_action(thd, "@action_for_hold_worker_before_gco" ) }); } I am just throwing in the idea without urging yet to discuss it and implement. We would certainly benefit in having this sort of simulation policy which just bound with STATEMENT format a bit to much to my taste (a tentative feeling; should we have user variables logged along with Rows_log_events ...). some policy and

          Fix for the issue is implemented in 10.1.42.

          The patch has been tested on higher versions.

          10.2 patch: There is a minor change in test for 10.2. The 'enable_connect_log' and 'disable_connect_log' are not required. Hence they are removed.

          https://github.com/MariaDB/server/commit/62c05dd14a37b7c4dff3bf9069eca6dd1deb9235

          10.3 patch:
          https://github.com/MariaDB/server/commit/7d7d741cc33fb51a0f3f226728769255d4e73c1e

          10.4 patch:
          https://github.com/MariaDB/server/commit/fa0fc3e38af44b685872b7846beb631999ea01b5

          sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - Fix for the issue is implemented in 10.1.42. The patch has been tested on higher versions. 10.2 patch: There is a minor change in test for 10.2. The 'enable_connect_log' and 'disable_connect_log' are not required. Hence they are removed. https://github.com/MariaDB/server/commit/62c05dd14a37b7c4dff3bf9069eca6dd1deb9235 10.3 patch: https://github.com/MariaDB/server/commit/7d7d741cc33fb51a0f3f226728769255d4e73c1e 10.4 patch: https://github.com/MariaDB/server/commit/fa0fc3e38af44b685872b7846beb631999ea01b5

          People

            sujatha.sivakumar Sujatha Sivakumar (Inactive)
            sujatha.sivakumar Sujatha Sivakumar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.