Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8291

Parallel replication causes slave threads to not pick up new global config after restart

Details

    Description

      This issue only occurs when parallel replication is enabled. It seems configuration changes are not picked up after restarting the slave.

      Reproduce:
      On slave:
      Configuration:

      sql_mode='STRICT_TRANS_TABLES'
      group_concat_max_len=1024

      On master:

      SET SESSION binlog_format=statement;
      drop table if exists testbreak; drop table if exists testdata;
      create table testbreak (big text not null) engine=MyISAM;
      create table testdata (part varchar(1024) not null) engine=MyISAM;
      insert into testdata VALUES (REPEAT('a', 1024));
      insert into testdata VALUES (REPEAT('a', 1024));
      insert into testdata VALUES (REPEAT('a', 1024));
      set session group_concat_max_len=4096;
      insert into testbreak SELECT group_concat(part) FROM testdata;

      On slave, witness:

      Last_SQL_Error: Error 'Row 2 was cut by GROUP_CONCAT()' on query. Default database: 'mariadb_test'. Query: 'insert into testbreak SELECT group_concat(part) FROM testdata'

      On slave, execute:

      STOP SLAVE;
      SET GLOBAL group_concat_max_len=4096;
      START SLAVE;

      On slave, witness:

      Last_SQL_Error: Error 'Row 2 was cut by GROUP_CONCAT()' on query. Default database: 'mariadb_test'. Query: 'insert into testbreak SELECT group_concat(part) FROM testdata'

      On slave, execute:

      STOP SLAVE;
      SET GLOBAL slave_parallel_threads=0;
      START SLAVE;

      Error goes away.

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova added a comment - - edited

            Technically, the reason is understandable: even if the slave stops, parallel threads aren't, so they are not restarted, so they cannot pick up the changed value.
            But from the users' perspective, the complaint is valid. I also have a vague (and maybe wrong) feeling it's been discussed before, but I don't remember the outcome. Assigning to knielsen for further feedback.

            See also MDEV-8294 regarding not terminated parallel threads.

            elenst Elena Stepanova added a comment - - edited Technically, the reason is understandable: even if the slave stops, parallel threads aren't, so they are not restarted, so they cannot pick up the changed value. But from the users' perspective, the complaint is valid. I also have a vague (and maybe wrong) feeling it's been discussed before, but I don't remember the outcome. Assigning to knielsen for further feedback. See also MDEV-8294 regarding not terminated parallel threads.

            Duplicate of MDEV-5289, I think. No version was given in the report, but
            here is what it looks like on recent version of 10.0:

            MariaDB [test]> change master to master_host='127.0.0.1', master_port=3310, master_user='root';
            Query OK, 0 rows affected (0.05 sec)
             
            MariaDB [test]> show processlist;
            +----+------+-----------+------+---------+------+-------+------------------+----------+
            | Id | User | Host      | db   | Command | Time | State | Info             | Progress |
            +----+------+-----------+------+---------+------+-------+------------------+----------+
            |  2 | root | localhost | test | Query   |    0 | init  | show processlist |    0.000 |
            +----+------+-----------+------+---------+------+-------+------------------+----------+
            1 row in set (0.00 sec)
             
            MariaDB [test]> start slave;
            Query OK, 0 rows affected (0.01 sec)
             
            MariaDB [test]> show processlist;
            +----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
            | Id | User        | Host      | db   | Command | Time | State                                                                       | Info             | Progress |
            +----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
            |  2 | root        | localhost | test | Query   |    0 | init                                                                        | show processlist |    0.000 |
            |  3 | system user |           | NULL | Connect |    1 | Waiting for master to send event                                            | NULL             |    0.000 |
            |  4 | system user |           | NULL | Connect |    1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
            |  5 | system user |           | NULL | Connect |    1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
            |  6 | system user |           | NULL | Connect |    1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
            |  7 | system user |           | NULL | Connect |   53 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
            |  8 | system user |           | NULL | Connect |    1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
            |  9 | system user |           | NULL | Connect |    1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
            | 10 | system user |           | NULL | Connect |    1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
            | 11 | system user |           | NULL | Connect |    1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
            | 12 | system user |           | NULL | Connect |    1 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL             |    0.000 |
            +----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
            11 rows in set (0.00 sec)
             
            MariaDB [test]> stop slave;
            Query OK, 0 rows affected (0.02 sec)
             
            MariaDB [test]> show processlist;
            +----+------+-----------+------+---------+------+-------+------------------+----------+
            | Id | User | Host      | db   | Command | Time | State | Info             | Progress |
            +----+------+-----------+------+---------+------+-------+------------------+----------+
            |  2 | root | localhost | test | Query   |    0 | init  | show processlist |    0.000 |
            +----+------+-----------+------+---------+------+-------+------------------+----------+
            1 row in set (0.00 sec)
             
            MariaDB [test]> 

            Worker threads are stopped when all slave threads are stopped.

            In earlier versions (or in case of hitting MDEV-8294, perhaps?), a
            work-around is to just change the number of threads; this will cause the
            worker threads to be re-spawned:

              SET GLOBAL <config>, <config>
              SET GLOBAL slave_parallel_threads=0;
              SET GLOBAL slave_parallel_threads=10;

            knielsen Kristian Nielsen added a comment - Duplicate of MDEV-5289 , I think. No version was given in the report, but here is what it looks like on recent version of 10.0: MariaDB [test]> change master to master_host='127.0.0.1', master_port=3310, master_user='root'; Query OK, 0 rows affected (0.05 sec)   MariaDB [test]> show processlist; +----+------+-----------+------+---------+------+-------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+------+-----------+------+---------+------+-------+------------------+----------+ | 2 | root | localhost | test | Query | 0 | init | show processlist | 0.000 | +----+------+-----------+------+---------+------+-------+------------------+----------+ 1 row in set (0.00 sec)   MariaDB [test]> start slave; Query OK, 0 rows affected (0.01 sec)   MariaDB [test]> show processlist; +----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+ | 2 | root | localhost | test | Query | 0 | init | show processlist | 0.000 | | 3 | system user | | NULL | Connect | 1 | Waiting for master to send event | NULL | 0.000 | | 4 | system user | | NULL | Connect | 1 | Waiting for work from SQL thread | NULL | 0.000 | | 5 | system user | | NULL | Connect | 1 | Waiting for work from SQL thread | NULL | 0.000 | | 6 | system user | | NULL | Connect | 1 | Waiting for work from SQL thread | NULL | 0.000 | | 7 | system user | | NULL | Connect | 53 | Waiting for work from SQL thread | NULL | 0.000 | | 8 | system user | | NULL | Connect | 1 | Waiting for work from SQL thread | NULL | 0.000 | | 9 | system user | | NULL | Connect | 1 | Waiting for work from SQL thread | NULL | 0.000 | | 10 | system user | | NULL | Connect | 1 | Waiting for work from SQL thread | NULL | 0.000 | | 11 | system user | | NULL | Connect | 1 | Waiting for work from SQL thread | NULL | 0.000 | | 12 | system user | | NULL | Connect | 1 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL | 0.000 | +----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+ 11 rows in set (0.00 sec)   MariaDB [test]> stop slave; Query OK, 0 rows affected (0.02 sec)   MariaDB [test]> show processlist; +----+------+-----------+------+---------+------+-------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+------+-----------+------+---------+------+-------+------------------+----------+ | 2 | root | localhost | test | Query | 0 | init | show processlist | 0.000 | +----+------+-----------+------+---------+------+-------+------------------+----------+ 1 row in set (0.00 sec)   MariaDB [test]> Worker threads are stopped when all slave threads are stopped. In earlier versions (or in case of hitting MDEV-8294 , perhaps?), a work-around is to just change the number of threads; this will cause the worker threads to be re-spawned: SET GLOBAL <config>, <config> SET GLOBAL slave_parallel_threads=0; SET GLOBAL slave_parallel_threads=10;

            Something is not right, it can't be a duplicate of MDEV-5289, I checked it against the current 10.0 (which is why no particular minor version was given). I can double-check, but please note that it only happens if the replication aborted with an error – that is, if MDEV-8294 is hit. Basically, it's a consequence of MDEV-8294 which as I can see you've just fixed.

            elenst Elena Stepanova added a comment - Something is not right, it can't be a duplicate of MDEV-5289 , I checked it against the current 10.0 (which is why no particular minor version was given). I can double-check, but please note that it only happens if the replication aborted with an error – that is, if MDEV-8294 is hit. Basically, it's a consequence of MDEV-8294 which as I can see you've just fixed.

            Ok, let's call it a duplicate of MDEV-5289, then.

            Basically, before MDEV-5289, it was necessary to change the value of
            @@slave_parallel_threads to make the parallel replication worker threads
            respawn and pick up new configuration settings.

            After MDEV-5289, STOP SLAVE (for all slaves in case of multi-source) is
            enough to respawn the worker threads.

            However, there was a bug with the MDEV-5289 implementation (MDEV-8294), so
            that a slave stopping with an error would leave the worker threads still
            running, with old session variable values. And then a STOP SLAVE was also
            not effective (because the slave is already stopped). Then, a successful
            START SLAVE followed by normal STOP SLAVE (not error stop) was needed to
            re-spawn the worker threads.

            After fix of MDEV-8294, it should (hopefully) be enough to stop and start
            slaves to respawn the worker threads, even if the stop happens due to an
            error.

            Note that in the case of multi-source, all slaves must be stopped at once
            for worker threads to be respawned, as worker threads are shared among
            multi-source connections. As long as at least one SQL thread is running,
            worker threads will remain using old configuration values in their session
            variables.

            knielsen Kristian Nielsen added a comment - Ok, let's call it a duplicate of MDEV-5289 , then. Basically, before MDEV-5289 , it was necessary to change the value of @@slave_parallel_threads to make the parallel replication worker threads respawn and pick up new configuration settings. After MDEV-5289 , STOP SLAVE (for all slaves in case of multi-source) is enough to respawn the worker threads. However, there was a bug with the MDEV-5289 implementation ( MDEV-8294 ), so that a slave stopping with an error would leave the worker threads still running, with old session variable values. And then a STOP SLAVE was also not effective (because the slave is already stopped). Then, a successful START SLAVE followed by normal STOP SLAVE (not error stop) was needed to re-spawn the worker threads. After fix of MDEV-8294 , it should (hopefully) be enough to stop and start slaves to respawn the worker threads, even if the stop happens due to an error. Note that in the case of multi-source, all slaves must be stopped at once for worker threads to be respawned, as worker threads are shared among multi-source connections. As long as at least one SQL thread is running, worker threads will remain using old configuration values in their session variables.

            Closing as fixed in 10.0.20 because it should go away after a fix for the root cause – MDEV-8294.

            elenst Elena Stepanova added a comment - Closing as fixed in 10.0.20 because it should go away after a fix for the root cause – MDEV-8294 .

            People

              knielsen Kristian Nielsen
              michaeldg Michaël de groot
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.