Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13915

STOP SLAVE takes very long time on a busy system

Details

    Description

      150     system user             NULL    Slave_IO        66361   Waiting for master to send event        NULL    0.000
      152     system user             NULL    Slave_worker    0       Write_rows_log_event::write_row(-1)     NULL    0.000
      153     system user             NULL    Slave_worker    0       Write_rows_log_event::write_row(-1)     NULL    0.000
      154     system user             NULL    Slave_worker    0       Write_rows_log_event::write_row(-1)     NULL    0.000
      155     system user             NULL    Slave_worker    0       Write_rows_log_event::write_row(-1)     NULL    0.000
      156     system user             NULL    Slave_worker    0       Write_rows_log_event::write_row(-1)     NULL    0.000
      157     system user             NULL    Slave_worker    0       Write_rows_log_event::write_row(-1)     NULL    0.000
      158     system user             NULL    Slave_worker    0       Write_rows_log_event::write_row(-1)     NULL    0.000
      159     system user             NULL    Slave_worker    0       Write_rows_log_event::write_row(-1)     NULL    0.000
      160     system user             NULL    Slave_worker    0       Write_rows_log_event::write_row(-1)     NULL    0.000
      162     system user             NULL    Slave_worker    0       Write_rows_log_event::write_row(-1)     NULL    0.000
      161     system user             NULL    Slave_worker    0       Write_rows_log_event::write_row(-1)     NULL    0.000
      163     system user             NULL    Slave_worker    0       Write_rows_log_event::write_row(-1)     NULL    0.000
      151     system user             NULL    Slave_SQL       1090    Reading event from the relay log        NULL    0.000
      2415    root    127.0.0.1:42392 NULL    Query   835     Killing slave   STOP SLAVE      0.000
      

      slave machine configuration:

      sync_master_info          = 500000
      sync_relay_log            = 100000
      sync_relay_log_info       = 500000
      slave_parallel_max_queued = 67108864
      slave_parallel_mode       = optimistic
      slave_parallel_threads    = 12
      

      Is that normal?

      Attachments

        Issue Links

          Activity

            mxu Michael Xu created issue -
            elenst Elena Stepanova added a comment - - edited

            Indeed, parallel replication can make STOP SLAVE much longer if sizable transactions are running. The test case below demonstrates that. It updates many rows on master in a table without PK, so that RBR is really slow, then waits till the updates start running on the slave, executes STOP SLAVE, waits till it's finished and checks what's happened to the contents of the table on slave.

            If the test is run without parallel replication, STOP SLAVE finishes very fast, and the contents of the table remains unchanged – that is, updates on the slave are interrupted and not committed (rolled back).
            If the test is run with parallel replication, even if it's slave-parallel-threads=1, STOP SLAVE takes a long time, and at the end the contents of the table is updated – that is, the slave executes the transaction to the end and commits it before stopping.

            I expect it to be a design choice of parallel replication, hopefully Elkin will check it and confirm (or not).

            Note: the test case below is for reproducing only, do not put it into the regression suite!

            # Optionally run with --mysqld=--slave-parallel-threads=N
             
            --source include/have_innodb.inc
            --source include/master-slave.inc
            --source include/have_binlog_format_row.inc
             
            create table t1 (i int, c varchar(255)) engine=InnoDB;
            begin;
            insert into t1 values (1,'foo'),(2,'bar'),(3,'abc'),(4,'def');
            insert into t1 select * from t1;
            insert into t1 select * from t1;
            insert into t1 select * from t1;
            insert into t1 select * from t1;
            insert into t1 select * from t1;
            insert into t1 select * from t1;
            insert into t1 select * from t1;
            insert into t1 select * from t1;
            insert into t1 select * from t1;
            commit;
             
            --sync_slave_with_master
             
            --connection master
            begin;
            update t1 set c = 'foobar';
            update t1 set c = 'qux';
            commit;
             
            --connection slave
             
            --let $show_statement= SHOW PROCESSLIST
            --let $field= State
            --let $condition= like 'Update_rows_log_event::find_row%'
            --source include/wait_show_condition.inc
             
            send stop slave;
             
            --connection slave1
             
            while (`SELECT COUNT(*) FROM information_schema.processlist where state like '%Killing slave%'`)
            {
              select user,command,time,state,info from information_schema.processlist where command like 'slave%' or state like '%slave%';
              --sleep 5
            }
             
            select distinct(c) from t1;
             
            # Cleanup
             
            --connection slave
            --reap
             
            drop table t1;
             
            --connection master
            drop table t1;
            

            elenst Elena Stepanova added a comment - - edited Indeed, parallel replication can make STOP SLAVE much longer if sizable transactions are running. The test case below demonstrates that. It updates many rows on master in a table without PK, so that RBR is really slow, then waits till the updates start running on the slave, executes STOP SLAVE , waits till it's finished and checks what's happened to the contents of the table on slave. If the test is run without parallel replication, STOP SLAVE finishes very fast, and the contents of the table remains unchanged – that is, updates on the slave are interrupted and not committed (rolled back). If the test is run with parallel replication, even if it's slave-parallel-threads=1 , STOP SLAVE takes a long time, and at the end the contents of the table is updated – that is, the slave executes the transaction to the end and commits it before stopping. I expect it to be a design choice of parallel replication, hopefully Elkin will check it and confirm (or not). Note: the test case below is for reproducing only, do not put it into the regression suite! # Optionally run with --mysqld=--slave-parallel-threads=N   --source include/have_innodb.inc --source include/master-slave.inc --source include/have_binlog_format_row.inc   create table t1 (i int , c varchar (255)) engine=InnoDB; begin ; insert into t1 values (1, 'foo' ),(2, 'bar' ),(3, 'abc' ),(4, 'def' ); insert into t1 select * from t1; insert into t1 select * from t1; insert into t1 select * from t1; insert into t1 select * from t1; insert into t1 select * from t1; insert into t1 select * from t1; insert into t1 select * from t1; insert into t1 select * from t1; insert into t1 select * from t1; commit ;   --sync_slave_with_master   --connection master begin ; update t1 set c = 'foobar' ; update t1 set c = 'qux' ; commit ;   --connection slave   --let $show_statement= SHOW PROCESSLIST --let $field= State --let $condition= like 'Update_rows_log_event::find_row%' --source include/wait_show_condition.inc   send stop slave;   --connection slave1   while (` SELECT COUNT (*) FROM information_schema.processlist where state like '%Killing slave%' `) { select user ,command, time ,state,info from information_schema.processlist where command like 'slave%' or state like '%slave%' ; --sleep 5 }   select distinct (c) from t1;   # Cleanup   --connection slave --reap   drop table t1;   --connection master drop table t1;
            elenst Elena Stepanova made changes -
            Field Original Value New Value
            Fix Version/s 10.1 [ 16100 ]
            Fix Version/s 10.2 [ 14601 ]
            Affects Version/s 10.0 [ 16000 ]
            Affects Version/s 10.1 [ 16100 ]
            Affects Version/s 10.2 [ 14601 ]
            Assignee Andrei Elkin [ elkin ]
            elenst Elena Stepanova made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.1 [ 16100 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 82798 ] MariaDB v4 [ 143972 ]
            angelique.sklavounos Angelique Sklavounos (Inactive) made changes -
            Affects Version/s 10.8.2 [ 26816 ]

            I ran the test case above against 10.8.2-debug (commit 12cd3dc78d2a58a15377000a7a8adb92d4fa74fb) and could see the same 10x increase in time with

            --mysqld=--slave-parallel-threads=1
            

            as with 10.2.43-debug (commit 941bc7053616d5ca8c9e6538828c4a65802e8c3d).

            angelique.sklavounos Angelique Sklavounos (Inactive) added a comment - - edited I ran the test case above against 10.8.2-debug (commit 12cd3dc78d2a58a15377000a7a8adb92d4fa74fb) and could see the same 10x increase in time with --mysqld=--slave-parallel-threads=1 as with 10.2.43-debug (commit 941bc7053616d5ca8c9e6538828c4a65802e8c3d).
            alice Alice Sherepa made changes -
            Affects Version/s 10.3 [ 22126 ]
            Affects Version/s 10.4 [ 22408 ]
            Affects Version/s 10.5 [ 23123 ]
            Affects Version/s 10.6 [ 24028 ]
            Affects Version/s 10.7 [ 24805 ]
            Affects Version/s 10.8 [ 26121 ]
            Affects Version/s 10.8.2 [ 26816 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.2 [ 14601 ]
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ] Brandon Nesterenko [ JIRAUSER48702 ]
            bnestere Brandon Nesterenko made changes -

            Hi woosang, sure. It won't be done in time for the release this week, but we are planning on having it done in the next release (planned for June 8). Do they need a custom build to have that sooner?

            bnestere Brandon Nesterenko added a comment - Hi woosang , sure. It won't be done in time for the release this week, but we are planning on having it done in the next release (planned for June 8). Do they need a custom build to have that sooner?
            bnestere Brandon Nesterenko made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]

            Hi Andrei!

            This is ready for review: PR-2534.

            bnestere Brandon Nesterenko added a comment - Hi Andrei! This is ready for review: PR-2534 .
            bnestere Brandon Nesterenko made changes -
            Assignee Brandon Nesterenko [ JIRAUSER48702 ] Andrei Elkin [ elkin ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            Elkin Andrei Elkin made changes -
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]
            Elkin Andrei Elkin added a comment -

            A round of review is done. Waiting for a new patch.

            Elkin Andrei Elkin added a comment - A round of review is done. Waiting for a new patch.
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ] Brandon Nesterenko [ JIRAUSER48702 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            julien.fritsch Julien Fritsch made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            chanjong.yu Chanjong Yu added a comment - - edited

            bnestere

            The customer is asking if we have tested this fix with the same status of master as they encountered at Feb 2nd 13:33:07.
            They need it to acknowledge that the fix will work in case of they encounter the same issue.

            IMHO, these are available options.
            1. Perform tests with the same status of master as they encountered
            2. Provide additional message helps the customer acknowledge that this fix will work for them. (e.g. explain that the fix works regardless of the status of master)
            3. Provide the status of master where you tested for this fix.

            Please refer to the SR (CS0537360)
            When the maxscale tried to perform automatic failover, the master was starting-up after threw signal-6 at that time.
            maxcale log entries (attachment : maxscale.log_mbhapp01) :
            2023-02-07 13:33:07 notice : [mariadbmon] Performing automatic failover to replace failed master 'server1'.
            master log entries (attachment : maria.err_mbsdbp01):
            2023-02-07 13:33:13 0 [Note] InnoDB: Read redo log up to LSN=3516540925952

            woosangcc

            chanjong.yu Chanjong Yu added a comment - - edited bnestere The customer is asking if we have tested this fix with the same status of master as they encountered at Feb 2nd 13:33:07. They need it to acknowledge that the fix will work in case of they encounter the same issue. IMHO, these are available options. 1. Perform tests with the same status of master as they encountered 2. Provide additional message helps the customer acknowledge that this fix will work for them. (e.g. explain that the fix works regardless of the status of master) 3. Provide the status of master where you tested for this fix. Please refer to the SR (CS0537360) When the maxscale tried to perform automatic failover, the master was starting-up after threw signal-6 at that time. maxcale log entries (attachment : maxscale.log_mbhapp01) : 2023-02-07 13:33:07 notice : [mariadbmon] Performing automatic failover to replace failed master 'server1'. master log entries (attachment : maria.err_mbsdbp01): 2023-02-07 13:33:13 0 [Note] InnoDB: Read redo log up to LSN=3516540925952 woosang cc

            Hi chanjong.yu!

            I'm still working on the fix, but generally speaking, it should work with the version of their master, as it is change to the replica-side only.

            bnestere Brandon Nesterenko added a comment - Hi chanjong.yu ! I'm still working on the fix, but generally speaking, it should work with the version of their master, as it is change to the replica-side only.

            Adjustments for the first round of review have been made PR 2534

            bnestere Brandon Nesterenko added a comment - Adjustments for the first round of review have been made PR 2534
            bnestere Brandon Nesterenko made changes -
            Assignee Brandon Nesterenko [ JIRAUSER48702 ] Andrei Elkin [ elkin ]
            Status Stalled [ 10000 ] In Review [ 10002 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.8 [ 26121 ]
            Elkin Andrei Elkin added a comment -

            The overall patch looks pretty good.
            Thanks for creative and tedious work done!

            Elkin Andrei Elkin added a comment - The overall patch looks pretty good. Thanks for creative and tedious work done!
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ] Brandon Nesterenko [ JIRAUSER48702 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            julien.fritsch Julien Fritsch made changes -
            bnestere Brandon Nesterenko made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            bnestere Brandon Nesterenko added a comment - - edited

            Pushed into 10.4 as 0a99d457b

            Merge conflicts observed through manual cherry picking in 10.5 and 10.9, with fixes in branches bb-10.5-MDEV-13915-mergefix and bb-10.9-MDEV-13915-mergefix, respectively

            bnestere Brandon Nesterenko added a comment - - edited Pushed into 10.4 as 0a99d457b Merge conflicts observed through manual cherry picking in 10.5 and 10.9, with fixes in branches bb-10.5-MDEV-13915-mergefix and bb-10.9-MDEV-13915-mergefix , respectively
            bnestere Brandon Nesterenko made changes -
            issue.field.resolutiondate 2023-06-06 00:52:32.0 2023-06-06 00:52:32.943
            bnestere Brandon Nesterenko made changes -
            Fix Version/s 11.1.1 [ 28704 ]
            Fix Version/s 11.0.2 [ 28706 ]
            Fix Version/s 10.4.30 [ 28912 ]
            Fix Version/s 10.5.21 [ 28913 ]
            Fix Version/s 10.6.14 [ 28914 ]
            Fix Version/s 10.9.7 [ 28916 ]
            Fix Version/s 10.10.5 [ 28917 ]
            Fix Version/s 10.11.4 [ 28918 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            knielsen Kristian Nielsen added a comment - - edited

            Please don't do this. And certainly not in a GA release!
            STOP SLAVE isn't expected to roll back any transactions, it completes what has started and then stops.
            This has always been so, it's the same in non-parallel replication.
            If users want to stop the slave faster due to a long-running query, they can kill the specific query.

            Edited: Ok, apparently we do that for non-parallel, but only when reading the next event.
            In any case, you cannot do this in a GA release. How will 10.4 users get security fixes without also getting this change?

            knielsen Kristian Nielsen added a comment - - edited Please don't do this. And certainly not in a GA release! STOP SLAVE isn't expected to roll back any transactions, it completes what has started and then stops. This has always been so, it's the same in non-parallel replication. If users want to stop the slave faster due to a long-running query, they can kill the specific query. Edited: Ok, apparently we do that for non-parallel, but only when reading the next event. In any case, you cannot do this in a GA release. How will 10.4 users get security fixes without also getting this change?
            Elkin Andrei Elkin added a comment - - edited

            knielsen, salve. First my apologies that you were not explicitly notified on the patch for reviewing. It's not terminally late now I hope, provided you have time.

            It turns out 10.4 push was our mistake in the version that the support case needs. We then are free to play safe and revert it from 10.4.

            To the technical part of the patch, yes its idea is to make the sequential and parallel STOP:s work consistently,
            including concern of non-transactional "transactions" that must complete to "commit".
            To my reasoning there's nothing that the user of the patch can find unusual (apart from quick stopping perhaps).
            Indeed, the user cannot see what has been scheduled to workers. So any quicker stop may happen without the patch.
            In other words, the patch emulates slow scheduling.

            The patch is being reverted in 10.4 and rebased now on the top 10.5.
            I hope you'll find that acceptable at least for reviewing and your risk assessment.

            bnestere and myself can discuss it interactively on zulip if you'll need that.
            We found in our team github review is somewhat useful, so feel free to comment in detail on the patch over there.
            A new PR that is being arranged to be published soon.

            Cheers,
            Andrei

            Elkin Andrei Elkin added a comment - - edited knielsen , salve. First my apologies that you were not explicitly notified on the patch for reviewing. It's not terminally late now I hope, provided you have time. It turns out 10.4 push was our mistake in the version that the support case needs. We then are free to play safe and revert it from 10.4. To the technical part of the patch, yes its idea is to make the sequential and parallel STOP:s work consistently, including concern of non-transactional "transactions" that must complete to "commit". To my reasoning there's nothing that the user of the patch can find unusual (apart from quick stopping perhaps). Indeed, the user cannot see what has been scheduled to workers. So any quicker stop may happen without the patch. In other words, the patch emulates slow scheduling. The patch is being reverted in 10.4 and rebased now on the top 10.5. I hope you'll find that acceptable at least for reviewing and your risk assessment. bnestere and myself can discuss it interactively on zulip if you'll need that. We found in our team github review is somewhat useful, so feel free to comment in detail on the patch over there. A new PR that is being arranged to be published soon. Cheers, Andrei

            Re-opening to address additional findings

            bnestere Brandon Nesterenko added a comment - Re-opening to address additional findings
            bnestere Brandon Nesterenko made changes -
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            dbart Daniel Bartholomew made changes -
            Fix Version/s 10.4.31 [ 29010 ]
            Fix Version/s 10.5.22 [ 29011 ]
            Fix Version/s 10.6.15 [ 29013 ]
            Fix Version/s 10.9.8 [ 29015 ]
            Fix Version/s 10.10.6 [ 29017 ]
            Fix Version/s 10.11.5 [ 29019 ]
            Fix Version/s 10.4.30 [ 28912 ]
            Fix Version/s 10.5.21 [ 28913 ]
            Fix Version/s 10.6.14 [ 28914 ]
            Fix Version/s 10.9.7 [ 28916 ]
            Fix Version/s 10.10.5 [ 28917 ]
            Fix Version/s 10.11.4 [ 28918 ]
            dbart Daniel Bartholomew made changes -
            Fix Version/s 11.0.3 [ 28920 ]
            Fix Version/s 11.1.2 [ 28921 ]
            Fix Version/s 11.1.1 [ 28704 ]
            Fix Version/s 11.0.2 [ 28706 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0.3 [ 28920 ]
            Fix Version/s 11.1.2 [ 28921 ]
            Fix Version/s 10.4.31 [ 29010 ]
            Fix Version/s 10.5.22 [ 29011 ]
            Fix Version/s 10.6.15 [ 29013 ]
            Fix Version/s 10.9.8 [ 29015 ]
            Fix Version/s 10.10.6 [ 29017 ]
            Fix Version/s 10.11.5 [ 29019 ]

            Something that is missing from the discussion here is that the main reason STOP SLAVE is slow in parallel replication is not because it doesn't roll back running transactions. The main problem is that in many cases parallel replication will replicate all queued events (@@slave_parallel_max_queued).

            I think this is a left-over of when only conservative mode existed. The current STOP SLAVE mechanism is seen in do_gco_wait(), it continues until the current GCO is completed (wait_count > entry->stop_count). But in optimistic mode, the GCO can be very large, potentially all queued events, thus stop is delayed longer than needed.

            I think a much simpler solution is to fix this, so that stop_count is initialised to largest_started_sub_id, and compared against rgi->gtid_sub_id.

            This will not rollback an existing long-running transaction, but I think that's actually good. Forcing stop immediately will cause massive rollback when many threads are configured (Jean Francois Gagné tested using > 1000 threads), which seems undesirable. And forcing stop does not guarantee fast stop anyway, a long-running statement will not be aborted.

            knielsen Kristian Nielsen added a comment - Something that is missing from the discussion here is that the main reason STOP SLAVE is slow in parallel replication is not because it doesn't roll back running transactions. The main problem is that in many cases parallel replication will replicate all queued events (@@slave_parallel_max_queued). I think this is a left-over of when only conservative mode existed. The current STOP SLAVE mechanism is seen in do_gco_wait(), it continues until the current GCO is completed (wait_count > entry->stop_count). But in optimistic mode, the GCO can be very large, potentially all queued events, thus stop is delayed longer than needed. I think a much simpler solution is to fix this, so that stop_count is initialised to largest_started_sub_id, and compared against rgi->gtid_sub_id. This will not rollback an existing long-running transaction, but I think that's actually good. Forcing stop immediately will cause massive rollback when many threads are configured (Jean Francois Gagné tested using > 1000 threads), which seems undesirable. And forcing stop does not guarantee fast stop anyway, a long-running statement will not be aborted.

            I implemented a much simpler fix for this in the branch knielsen_faster_stop_slave:

            https://github.com/MariaDB/server/commits/knielsen_faster_stop_slave

            knielsen Kristian Nielsen added a comment - I implemented a much simpler fix for this in the branch knielsen_faster_stop_slave: https://github.com/MariaDB/server/commits/knielsen_faster_stop_slave
            knielsen Kristian Nielsen added a comment - And a better fix for the MDEV-31448 fix included in Brandon's patch for this issue: https://github.com/MariaDB/server/commits/knielsen_faster_stop_slave https://github.com/MariaDB/server/commit/6ce9c839997c1fc78c2989540dd21155a96fb419
            bnestere Brandon Nesterenko made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            knielsen Kristian Nielsen added a comment - - edited

            I now pushed another patch to the branch knielsen_faster_stop_slave:

            https://github.com/MariaDB/server/commits/knielsen_faster_stop_slave
            https://github.com/MariaDB/server/commit/c20ce0dff404890df59f1fd305bd85be0ada86f8

            This implements a STOP SLAVE FORCE option, which can be used to optionally force a quick slave stop by rolling back all active transactions on the next event or row operation, if no non-transactional event groups are blocking it. I included the test cases from Brandon's pull request, they are passing with this patch (replacing STOP SLAVE with STOP SLAVE FORCE).

            I think this is the way to go if we want this rollback functionality, leaving the option to not roll back a lot of work needlessly. I also think this is a new feature and should only go to development version (11.1?). Comments welcome, of course.

            The simple fix earlier on my branch (without forcing rollback) should be sufficient to solve the user/customers problem. I think that can go in 10.5 if we want (or even 10.4, it should be safe). Some of the other bug fixes on the branch related to stop may also be appropriate for some earlier branch, suggestions wellcome.

            knielsen Kristian Nielsen added a comment - - edited I now pushed another patch to the branch knielsen_faster_stop_slave: https://github.com/MariaDB/server/commits/knielsen_faster_stop_slave https://github.com/MariaDB/server/commit/c20ce0dff404890df59f1fd305bd85be0ada86f8 This implements a STOP SLAVE FORCE option, which can be used to optionally force a quick slave stop by rolling back all active transactions on the next event or row operation, if no non-transactional event groups are blocking it. I included the test cases from Brandon's pull request, they are passing with this patch (replacing STOP SLAVE with STOP SLAVE FORCE). I think this is the way to go if we want this rollback functionality, leaving the option to not roll back a lot of work needlessly. I also think this is a new feature and should only go to development version (11.1?). Comments welcome, of course. The simple fix earlier on my branch (without forcing rollback) should be sufficient to solve the user/customers problem. I think that can go in 10.5 if we want (or even 10.4, it should be safe). Some of the other bug fixes on the branch related to stop may also be appropriate for some earlier branch, suggestions wellcome.

            I have pushed the fix for slow STOP SLAVE to 10.4 (The STOP SLAVE FORCE feature can go to a development branch later if we wish).
            So I suggest to close this bug?

            - Kristian.

            knielsen Kristian Nielsen added a comment - I have pushed the fix for slow STOP SLAVE to 10.4 (The STOP SLAVE FORCE feature can go to a development branch later if we wish). So I suggest to close this bug? - Kristian.
            Elkin Andrei Elkin made changes -
            Fix Version/s 10.4.31 [ 29010 ]
            Fix Version/s 10.5.22 [ 29011 ]
            Fix Version/s 10.6.15 [ 29013 ]
            Fix Version/s 10.9.8 [ 29015 ]
            Fix Version/s 10.10.6 [ 29017 ]
            Fix Version/s 10.11.5 [ 29019 ]
            Fix Version/s 11.0.3 [ 28920 ]
            Fix Version/s 11.1.2 [ 28921 ]
            Fix Version/s 11.2.1 [ 29034 ]
            Fix Version/s 10.8.8 [ 28518 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 187098

            People

              bnestere Brandon Nesterenko
              mxu Michael Xu
              Votes:
              3 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.