[MDEV-8291] Parallel replication causes slave threads to not pick up new global config after restart Created: 2015-06-09  Updated: 2015-06-10  Resolved: 2015-06-10

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0, 10.1
Fix Version/s: 10.0.20

Type: Bug Priority: Major
Reporter: Michaël de groot Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: parallelslave
Environment:

CentOS 6.6


Issue Links:
Relates
relates to MDEV-8292 group_concat_max_len should be stored... Open
relates to MDEV-8294 Inconsistent behavior of slave parall... Closed

 Description   

This issue only occurs when parallel replication is enabled. It seems configuration changes are not picked up after restarting the slave.

Reproduce:
On slave:
Configuration:

sql_mode='STRICT_TRANS_TABLES'
group_concat_max_len=1024

On master:

SET SESSION binlog_format=statement;
drop table if exists testbreak; drop table if exists testdata;
create table testbreak (big text not null) engine=MyISAM;
create table testdata (part varchar(1024) not null) engine=MyISAM;
insert into testdata VALUES (REPEAT('a', 1024));
insert into testdata VALUES (REPEAT('a', 1024));
insert into testdata VALUES (REPEAT('a', 1024));
set session group_concat_max_len=4096;
insert into testbreak SELECT group_concat(part) FROM testdata;

On slave, witness:

Last_SQL_Error: Error 'Row 2 was cut by GROUP_CONCAT()' on query. Default database: 'mariadb_test'. Query: 'insert into testbreak SELECT group_concat(part) FROM testdata'

On slave, execute:

STOP SLAVE;
SET GLOBAL group_concat_max_len=4096;
START SLAVE;

On slave, witness:

Last_SQL_Error: Error 'Row 2 was cut by GROUP_CONCAT()' on query. Default database: 'mariadb_test'. Query: 'insert into testbreak SELECT group_concat(part) FROM testdata'

On slave, execute:

STOP SLAVE;
SET GLOBAL slave_parallel_threads=0;
START SLAVE;

Error goes away.



 Comments   
Comment by Elena Stepanova [ 2015-06-10 ]

Technically, the reason is understandable: even if the slave stops, parallel threads aren't, so they are not restarted, so they cannot pick up the changed value.
But from the users' perspective, the complaint is valid. I also have a vague (and maybe wrong) feeling it's been discussed before, but I don't remember the outcome. Assigning to knielsen for further feedback.

See also MDEV-8294 regarding not terminated parallel threads.

Comment by Kristian Nielsen [ 2015-06-10 ]

Duplicate of MDEV-5289, I think. No version was given in the report, but
here is what it looks like on recent version of 10.0:

MariaDB [test]> change master to master_host='127.0.0.1', master_port=3310, master_user='root';
Query OK, 0 rows affected (0.05 sec)
 
MariaDB [test]> show processlist;
+----+------+-----------+------+---------+------+-------+------------------+----------+
| Id | User | Host      | db   | Command | Time | State | Info             | Progress |
+----+------+-----------+------+---------+------+-------+------------------+----------+
|  2 | root | localhost | test | Query   |    0 | init  | show processlist |    0.000 |
+----+------+-----------+------+---------+------+-------+------------------+----------+
1 row in set (0.00 sec)
 
MariaDB [test]> start slave;
Query OK, 0 rows affected (0.01 sec)
 
MariaDB [test]> show processlist;
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
| Id | User        | Host      | db   | Command | Time | State                                                                       | Info             | Progress |
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
|  2 | root        | localhost | test | Query   |    0 | init                                                                        | show processlist |    0.000 |
|  3 | system user |           | NULL | Connect |    1 | Waiting for master to send event                                            | NULL             |    0.000 |
|  4 | system user |           | NULL | Connect |    1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
|  5 | system user |           | NULL | Connect |    1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
|  6 | system user |           | NULL | Connect |    1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
|  7 | system user |           | NULL | Connect |   53 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
|  8 | system user |           | NULL | Connect |    1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
|  9 | system user |           | NULL | Connect |    1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
| 10 | system user |           | NULL | Connect |    1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
| 11 | system user |           | NULL | Connect |    1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
| 12 | system user |           | NULL | Connect |    1 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL             |    0.000 |
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
11 rows in set (0.00 sec)
 
MariaDB [test]> stop slave;
Query OK, 0 rows affected (0.02 sec)
 
MariaDB [test]> show processlist;
+----+------+-----------+------+---------+------+-------+------------------+----------+
| Id | User | Host      | db   | Command | Time | State | Info             | Progress |
+----+------+-----------+------+---------+------+-------+------------------+----------+
|  2 | root | localhost | test | Query   |    0 | init  | show processlist |    0.000 |
+----+------+-----------+------+---------+------+-------+------------------+----------+
1 row in set (0.00 sec)
 
MariaDB [test]> 

Worker threads are stopped when all slave threads are stopped.

In earlier versions (or in case of hitting MDEV-8294, perhaps?), a
work-around is to just change the number of threads; this will cause the
worker threads to be re-spawned:

  SET GLOBAL <config>, <config>
  SET GLOBAL slave_parallel_threads=0;
  SET GLOBAL slave_parallel_threads=10;

Comment by Elena Stepanova [ 2015-06-10 ]

Something is not right, it can't be a duplicate of MDEV-5289, I checked it against the current 10.0 (which is why no particular minor version was given). I can double-check, but please note that it only happens if the replication aborted with an error – that is, if MDEV-8294 is hit. Basically, it's a consequence of MDEV-8294 which as I can see you've just fixed.

Comment by Kristian Nielsen [ 2015-06-10 ]

Ok, let's call it a duplicate of MDEV-5289, then.

Basically, before MDEV-5289, it was necessary to change the value of
@@slave_parallel_threads to make the parallel replication worker threads
respawn and pick up new configuration settings.

After MDEV-5289, STOP SLAVE (for all slaves in case of multi-source) is
enough to respawn the worker threads.

However, there was a bug with the MDEV-5289 implementation (MDEV-8294), so
that a slave stopping with an error would leave the worker threads still
running, with old session variable values. And then a STOP SLAVE was also
not effective (because the slave is already stopped). Then, a successful
START SLAVE followed by normal STOP SLAVE (not error stop) was needed to
re-spawn the worker threads.

After fix of MDEV-8294, it should (hopefully) be enough to stop and start
slaves to respawn the worker threads, even if the stop happens due to an
error.

Note that in the case of multi-source, all slaves must be stopped at once
for worker threads to be respawned, as worker threads are shared among
multi-source connections. As long as at least one SQL thread is running,
worker threads will remain using old configuration values in their session
variables.

Comment by Elena Stepanova [ 2015-06-10 ]

Closing as fixed in 10.0.20 because it should go away after a fix for the root cause – MDEV-8294.

Generated at Thu Feb 08 07:26:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.