[MDEV-10863] parallel replication tries to continue from wrong position after stopping and restarting slave threads - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.0.26
Fix Version/s: 10.0.29, 10.1.20
Component/s: Replication
Labels:
- parallelslave
- replication
Environment:
linux

Description

When doing STOP SLAVE SQL_THREAD and START SLAVE SQL_THREAD repeatedly parallel replication sometimes fails on START with a duplicate key error. SHOW SLAVE STATUS then always shows a relay log position at the beginning of a log file, e.g. "Relay_Log_Pos: 4". This only seems to happen if a large transaction is split across more than one relay log file.

How to reproduce:

Master config:

[mysqld]

server-id=1

log-bin=master-bin

binlog-format=statement

Slave config:

[mysqld]

server-id=2

slave_parallel_threads=20

max_relay_log_size=98304

Create replication user on the master:

CREATE USER 'repl'@'%' IDENTIFIED BY 'password';

GRANT ALL PRIVILEGES ON *.* TO 'repl'@'%';

RESET MASTER;

Start slave:

SET GLOBAL gtid_slave_pos='0-1-2';

CHANGE MASTER TO Master_host='mymasterip', Master_user='repl', Master_password='password', master_use_gtid=slave_pos;

START SLAVE;

On the master: create a simple PHP script "test.php" to populate a table using given domain ID (modulo 3) and table name in transactions of 1000 rows each (transactions will take about 40K binlog space each, so talking about half of the max_relaylog_size which makes hitting a transaction split very likely)

<?php

$domain_id = $argv[1]%3 + 1;

$table     = $argv[2];

echo "SET gtid_domain_id=$domain_id;";

echo "DROP TABLE IF EXISTS test.$table;";

echo "CREATE TABLE test.$table(id int primary key, msg varchar(100));";

$n=1;

while (true) {

  echo "BEGIN;\n";

  for ($i = 1; $i < 1000; $i++) {

    echo "INSERT INTO test.$table VALUES($n, MD5(RAND()));\n";

    $n++;

  echo "COMMIT;\n";

Then run several instances of it in parallel:

for a in $(seq 20); do (php test.php $a t$a | mysql &) ; done

Now on the slave run this script to repeatedly stop and restart the SQL thread:

( sudo mysql <<< "select version(); start slave sql_thread"; while date; do

sudo mysql <<< 'stop slave sql_thread; show slave status\G'

gsp_stopped="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos;')"; echo "$gsp_stopped";

sudo mysql <<< 'start slave sql_thread;'; sleep 5;

sss="$(sudo mysql <<< 'show slave status\G')"

gsp="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos')"; echo "$sss"; echo "$gsp";

if test "$(awk '$1 == "Slave_SQL_Running:"{print $2}' <<< "$sss")" == "Yes"; then echo; echo "sleeping..."; sleep 5; else echo "things are broken..."; break; fi; done; ) | tee test.log

This usually fails with a duplicate key error after a small number of iterations already

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

binlogs.tar.gz
2016-09-21 23:29
599 kB
Hartmut Holzgraefe
test.log
2016-09-21 23:05
14 kB
Hartmut Holzgraefe

Issue Links

relates to

MDEV-9138 Relay log position corrupted with parallel replication after interrupting LOAD DATA on master

Closed

Activity

People

Assignee:: Kristian Nielsen

Reporter:: Hartmut Holzgraefe

Votes:: 2 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2016-09-21 22:41

Updated:: 2024-07-08 01:11

Resolved:: 2016-11-15 12:02

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration