[MDEV-5941] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on DML with slave parallel threads, SBR - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.0.10
Fix Version/s: 10.0.13
Component/s: None
Labels:
- parallelslave
- replication

Description

revision-id: knielsen@knielsen-hq.org-20140321123055-8ch69qymsj1xqmw4

date: 2014-03-21 13:30:55 +0100

build-date: 2014-03-24 18:54:08 +0200

revno: 4076

branch-nick: 10.0

The timeout error was encountered while running a concurrent test (12 threads on master) with SBR replication using GTID, and slave-parallel-threads=10.

140321 20:08:47 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:10730',replication starts at GTID position ''

140321 20:12:58 [ERROR] mysqld: Lock wait timeout exceeded; try restarting transaction

140321 20:12:58 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'test'. Query: 'UPDATE  `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5', Internal MariaDB error code: 1205

140321 20:12:58 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205

140321 20:12:58 [Warning] Slave: Sort aborted: Lock wait timeout exceeded; try restarting transaction Error_code: 1028

140321 20:12:58 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000033' position 20863; GTID position '0-1-669'

The test mainly executes DML, but involves DDL and FLUSH LOGS as well.
The test flow includes a lot of statements that are unsafe for SBR or are considered to be such, but as far as I can see from the slave log, there was nothing particularly wrong done to the slave around the time when the problem occured.

Slave is not restarted during the test (neither the server, nor the logical slave).

I encountered the same error several times later, so it's not a unique occasion, but it's sporadic and not easily reproducible. Also, I couldn't reproduce it while feeding the same master binary logs to a clean slave, so it seems to be a true race condition.

The following logs are attached:

master_binlogs.tar.gz - all master binary logs
slave_error.log
slave_general.log
my.cnf (for the reference, only one meaningful option in there, slave-parallel-threads=10); identical for master and slave

Server command lines (although semisync plugins are there, they were not available at the time of server startup, so the error of not finding them can be seen in the slave error log):

<basedir>/sql/mysqld --defaults-group-suffix=.runtime --defaults-file=<master_vardir>/my.cnf --basedir=<basedir> --datadir=<master_vardir>/data --lc-messages-dir=<basedir>/sql/share --character-sets-dir=<basedir>/sql/share/charsets --tmpdir=<master_vardir>/tmp --core-file --max-allowed-packet=128Mb --port=10730 --socket=<master_vardir>/mysql.sock --pid-file=<master_vardir>/mysql.pid --general-log --general-log-file=<master_vardir>/mysql.log --server_id=1 --log-bin=mysql-bin --report-host=127.0.0.1 --report_port=10730 --sql-mode=no_engine_substitution --binlog_commit_wait_count=10 --binlog_commit_wait_usec=1000000 --plugin-load-add=semisync_master --plugin-load-add=semisync_slave --loose-rpl_semi_sync_master_timeout=600 --log-output=FILE --slave-skip-errors=1054,1317,1049,1305,1539,1505

<basedir>/sql/mysqld --defaults-group-suffix=.runtime --defaults-file=<slave_vardir>/my.cnf --basedir=<basedir> --datadir=<slave_vardir>/data --lc-messages-dir=<basedir>/sql/share --character-sets-dir=<basedir>/sql/share/charsets --tmpdir=<slave_vardir>/tmp --core-file --max-allowed-packet=128Mb --port=10732 --socket=<slave_vardir>/mysql.sock --pid-file=<slave_vardir>/mysql.pid --general-log --general-log-file=<slave_vardir>/mysql.log --server_id=2 --report-host=127.0.0.1 --report_port=10732 --sql-mode=no_engine_substitution --binlog_commit_wait_count=10 --binlog_commit_wait_usec=1000000 --plugin-load-add=semisync_master --plugin-load-add=semisync_slave --loose-rpl_semi_sync_master_timeout=600 --log-output=FILE --slave-skip-errors=1054,1317,1049,1305,1539,1505

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

master_binlogs.tar.gz
2014-03-24 19:11
71 kB
Elena Stepanova
my.cnf
2014-03-24 19:11
0.2 kB
Elena Stepanova
slave_error.log
2014-03-24 19:11
6 kB
Elena Stepanova
slave_general.log
2014-03-24 19:11
243 kB
Elena Stepanova

Issue Links

relates to

MDEV-5262 Missing retry after temp error in parallel replication

Closed

MDEV-5914 Parallel replication deadlock due to InnoDB lock conflicts

Closed

Activity

Ascending order - Click to sort in descending order

Kristian Nielsen added a comment - 2014-03-27 11:22 - edited

It seems to be these two statements that conflict with one another; these
statements were group-committed on the master, so will be attempted by slave
to run in parallel:

UPDATE  `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5;

UPDATE  `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 4113170432  ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 7;

Here is the content of the table after replaying the binlogs up to this point:

col_char_12_key	col_int	col_int_key	pk	col_char_12
NULL	NULL	NULL	2	139
NULL	NULL	NULL	3	139
NULL	NULL	NULL	4	139
NULL	NULL	NULL	5	139
NULL	NULL	NULL	6	139
NULL	108	NULL	7	86
NULL	0	NULL	8	5
NULL	NULL	NULL	9	139
NULL	NULL	NULL	23	NULL

Kristian Nielsen added a comment - 2014-03-27 11:22 - edited It seems to be these two statements that conflict with one another; these statements were group-committed on the master, so will be attempted by slave to run in parallel: UPDATE `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5; UPDATE `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 4113170432 ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 7; Here is the content of the table after replaying the binlogs up to this point: col_char_12_key col_int col_int_key pk col_char_12 NULL NULL NULL 2 139 NULL NULL NULL 3 139 NULL NULL NULL 4 139 NULL NULL NULL 5 139 NULL NULL NULL 6 139 NULL 108 NULL 7 86 NULL 0 NULL 8 5 NULL NULL NULL 9 139 NULL NULL NULL 23 NULL

Kristian Nielsen added a comment - 2014-03-27 12:07 - edited

The problem is probably caused by different execution plan on slave compared
to on master.

The first update can apparently use two different execution plans. One that
uses the `col_int_key` index to check the IN ( ... ) predicate, and one that
does not.

When using the index, the first UPDATE does not conflict with the second,
since no rows in the index match any of the values in the IN ( ... ) list.

However, when the index is not used, there is conflict with the second UPDATE,
as a table scan must be used.

This can be seen by adding IGNORE INDEX (col_int_key) to the first UPDATE; in
this case running the second update first blocks the first update:

--connect (s3,127.0.0.1,root,,test,$SLAVE_MYPORT,)

BEGIN;

UPDATE  `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 4113170432  ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 7;

--connect (s1,127.0.0.1,root,,test,$SLAVE_MYPORT,)

BEGIN;

UPDATE  `table10_innodb_int_autoinc` AS X ignore index (col_int_key) SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5;

# This UPDATE hangs here and eventually gets a lock wait timeout.

Unfortunately, this is a very serious issue, and I am not sure how it could be
solved. It is a fundamental assumption of parallel replication that if two
statements ran without lock conflicts on the master, then they will also be
able to run without lock conflicts on the slave. Without this, it is not safe
to enforce the same commit order on the slave, as it may cause deadlocks, as
seen in this example.

I am not sure that this requires using unsafe statements to trigger (like the
LIMIT in this case), I will try to investigate this further.

Kristian Nielsen added a comment - 2014-03-27 12:07 - edited The problem is probably caused by different execution plan on slave compared to on master. The first update can apparently use two different execution plans. One that uses the `col_int_key` index to check the IN ( ... ) predicate, and one that does not. When using the index, the first UPDATE does not conflict with the second, since no rows in the index match any of the values in the IN ( ... ) list. However, when the index is not used, there is conflict with the second UPDATE, as a table scan must be used. This can be seen by adding IGNORE INDEX (col_int_key) to the first UPDATE; in this case running the second update first blocks the first update: --connect (s3,127.0.0.1,root,,test,$SLAVE_MYPORT,) BEGIN ; UPDATE `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 4113170432 ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 7; --connect (s1,127.0.0.1,root,,test,$SLAVE_MYPORT,) BEGIN ; UPDATE `table10_innodb_int_autoinc` AS X ignore index (col_int_key) SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5; # This UPDATE hangs here and eventually gets a lock wait timeout. Unfortunately, this is a very serious issue, and I am not sure how it could be solved. It is a fundamental assumption of parallel replication that if two statements ran without lock conflicts on the master, then they will also be able to run without lock conflicts on the slave. Without this, it is not safe to enforce the same commit order on the slave, as it may cause deadlocks, as seen in this example. I am not sure that this requires using unsafe statements to trigger (like the LIMIT in this case), I will try to investigate this further.

Elena Stepanova added a comment - 2014-03-27 12:15

For a side note, these UPDATEs are probably not real unsafe statements, they look more like those that are currently marked unsafe because it is easier to do so. They have ORDER BY clauses which include, among other fields, pk, so the LIMIT should be deterministic and hence safe.
There are open upstream bugs about it, e.g. #62537 and #71105

Elena Stepanova added a comment - 2014-03-27 12:15 For a side note, these UPDATEs are probably not real unsafe statements, they look more like those that are currently marked unsafe because it is easier to do so. They have ORDER BY clauses which include, among other fields, pk, so the LIMIT should be deterministic and hence safe. There are open upstream bugs about it, e.g. #62537 and #71105

Jonas Oreland added a comment - 2014-04-04 16:31

Hi Kristian,

May I propose a solution to this problem (that I think might be useful also for a different scenario).
1) Record relay log position for when starting a set of parallel transactions
2) process transactions given parallelism
3) IF a lock wait time-out occurs, restart this set with parallelism set to 1.
4a) IF it fails even then too (with parallelism = 1) stop slave, report to user etc
4b) IF it works like this, reset parallelism to original value

Add counters so that DBA can see if this is a frequently occurring scenario.

(one can also imagine have more heuristics for how to handle the parallelism assuming
if there is a solid framework for retires exists)

—

An other scenario where a framework like above would be useful is to allow for
optimistic slave concurrency. I.e in certain scenarios guess that transactions can
be processed in parallel (still committing in order), and dynamically adjust parallelism down to 1
if this guess turns out to be wrong.

Way of guessing could be

if transactions are created by different connections on master
(would need to record this in binlog)
if transactions use different databases
etc etc...

What do you think ?

/Jonas

Jonas Oreland added a comment - 2014-04-04 16:31 Hi Kristian, May I propose a solution to this problem (that I think might be useful also for a different scenario). 1) Record relay log position for when starting a set of parallel transactions 2) process transactions given parallelism 3) IF a lock wait time-out occurs, restart this set with parallelism set to 1. 4a) IF it fails even then too (with parallelism = 1) stop slave, report to user etc 4b) IF it works like this, reset parallelism to original value Add counters so that DBA can see if this is a frequently occurring scenario. (one can also imagine have more heuristics for how to handle the parallelism assuming if there is a solid framework for retires exists) — An other scenario where a framework like above would be useful is to allow for optimistic slave concurrency. I.e in certain scenarios guess that transactions can be processed in parallel (still committing in order), and dynamically adjust parallelism down to 1 if this guess turns out to be wrong. Way of guessing could be if transactions are created by different connections on master (would need to record this in binlog) if transactions use different databases etc etc... What do you think ? /Jonas

Kristian Nielsen added a comment - 2014-04-08 13:57 - edited

Hi Jonas,

> May I propose a solution to this problem (that I think might be useful also for a different scenario).
> 1) Record relay log position for when starting a set of parallel transactions
> 2) process transactions given parallelism
> 3) IF a lock wait time-out occurs, restart this set with parallelism set to 1.
> 4a) IF it fails even then too (with parallelism = 1) stop slave, report to user etc
> 4b) IF it works like this, reset parallelism to original value

The main problem is the need for a timeout. This timeout needs to be small
enough that it doesn't cause too much slave delay, yet large enough that it
doesn't cause spurious failure due to transactions taking too long without any
deadlocks.

However, we could actually detect the deadlock immediately without
timeouts. Whenever in InnoDB we are about to the transaction T1 wait for a
lock owned by T2, we can check if T1 needs to commit before T2 - if so,
instead of letting T1 wait, we will fail and rollback T2 with a deadlock
error. We then proceed to re-try T2 as you describe.

I think this is an interesting solution worth pursuing. I believe this is also
more or less what Galera does.

> An other scenario where a framework like above would be useful is to allow for
> optimistic slave concurrency. I.e in certain scenarios guess that transactions can
> be processed in parallel (still committing in order), and dynamically adjust parallelism down to 1
> if this guess turns out to be wrong.
>
> Way of guessing could be
>
> if transactions are created by different connections on master
> (would need to record this in binlog)
> if transactions use different databases
> etc etc...

Yes, I agree. Since we already know the desired commit order, we can just
start them in parallel and immediately detect if there is any conflict, seen
by a later transaction owning a row or gap lock needed by an earlier
transaction. In case of conflict, we can roll back the later transaction and
re-try it. As long as rollbacks are sufficiently infrequent, this should be a
win...

It is an interesting idea for sure. Maybe the time to try it out is getting
nearer.

- Kristian.

Kristian Nielsen added a comment - 2014-04-08 13:57 - edited Hi Jonas, > May I propose a solution to this problem (that I think might be useful also for a different scenario). > 1) Record relay log position for when starting a set of parallel transactions > 2) process transactions given parallelism > 3) IF a lock wait time-out occurs, restart this set with parallelism set to 1. > 4a) IF it fails even then too (with parallelism = 1) stop slave, report to user etc > 4b) IF it works like this, reset parallelism to original value The main problem is the need for a timeout. This timeout needs to be small enough that it doesn't cause too much slave delay, yet large enough that it doesn't cause spurious failure due to transactions taking too long without any deadlocks. However, we could actually detect the deadlock immediately without timeouts. Whenever in InnoDB we are about to the transaction T1 wait for a lock owned by T2, we can check if T1 needs to commit before T2 - if so, instead of letting T1 wait, we will fail and rollback T2 with a deadlock error. We then proceed to re-try T2 as you describe. I think this is an interesting solution worth pursuing. I believe this is also more or less what Galera does. > An other scenario where a framework like above would be useful is to allow for > optimistic slave concurrency. I.e in certain scenarios guess that transactions can > be processed in parallel (still committing in order), and dynamically adjust parallelism down to 1 > if this guess turns out to be wrong. > > Way of guessing could be > > if transactions are created by different connections on master > (would need to record this in binlog) > if transactions use different databases > etc etc... Yes, I agree. Since we already know the desired commit order, we can just start them in parallel and immediately detect if there is any conflict, seen by a later transaction owning a row or gap lock needed by an earlier transaction. In case of conflict, we can roll back the later transaction and re-try it. As long as rollbacks are sufficiently infrequent, this should be a win... It is an interesting idea for sure. Maybe the time to try it out is getting nearer. - Kristian.

Kristian Nielsen added a comment - 2014-04-11 12:18 - edited

So here is how I think this bug should be fixed:

1. First, ~~MDEV-5262~~ should be fixed, so that transactions that fail during
parallel replication due to deadlock or other temporary error can be re-tried
and eventually succeed.

2. Second, when InnoDB is about to let transaction T1 wait for a lock owned by
transaction T2, it should invoke a callback into the upper layer. This
callback will return a status code about what to do - either let T1 wait; roll
back T2 with deadlock error and let T1 proceed; or roll back T1 with deadlock
error. For parallel replication, we will then check if T1 comes before T2 in
commit order (same Relay_log_info and domain_id, and sub_id of T1 smaller than
T2); if so we ask to roll back T2 so that T1 can proceed and T2 can be
re-tried.

This should solve this and other similar deadlocks due to the enforced commit
order. Later, we can also use this to implement more opportunistic search for
possible parallelims, as suggested by Jonas.

Kristian Nielsen added a comment - 2014-04-11 12:18 - edited So here is how I think this bug should be fixed: 1. First, MDEV-5262 should be fixed, so that transactions that fail during parallel replication due to deadlock or other temporary error can be re-tried and eventually succeed. 2. Second, when InnoDB is about to let transaction T1 wait for a lock owned by transaction T2, it should invoke a callback into the upper layer. This callback will return a status code about what to do - either let T1 wait; roll back T2 with deadlock error and let T1 proceed; or roll back T1 with deadlock error. For parallel replication, we will then check if T1 comes before T2 in commit order (same Relay_log_info and domain_id, and sub_id of T1 smaller than T2); if so we ask to roll back T2 so that T1 can proceed and T2 can be re-tried. This should solve this and other similar deadlocks due to the enforced commit order. Later, we can also use this to implement more opportunistic search for possible parallelims, as suggested by Jonas.

Kristian Nielsen added a comment - 2014-06-10 14:14

Temporary re-assigned to Serg for review

Kristian Nielsen added a comment - 2014-06-10 14:14 Temporary re-assigned to Serg for review

Kristian Nielsen added a comment - 2014-07-11 13:09

Pushed to 10.0.13.

Kristian Nielsen added a comment - 2014-07-11 13:09 Pushed to 10.0.13.

People

Assignee:: Kristian Nielsen

Reporter:: Elena Stepanova

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2014-03-24 19:11

Updated:: 2014-07-11 13:09

Resolved:: 2014-07-11 13:09

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration