[MDEV-5941] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on DML with slave parallel threads, SBR Created: 2014-03-24  Updated: 2014-07-11  Resolved: 2014-07-11

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.10
Fix Version/s: 10.0.13

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: parallelslave, replication

Attachments: File master_binlogs.tar.gz     File my.cnf     File slave_error.log     File slave_general.log    
Issue Links:
Relates
relates to MDEV-5262 Missing retry after temp error in par... Closed
relates to MDEV-5914 Parallel replication deadlock due to ... Closed

 Description   

revision-id: knielsen@knielsen-hq.org-20140321123055-8ch69qymsj1xqmw4
date: 2014-03-21 13:30:55 +0100
build-date: 2014-03-24 18:54:08 +0200
revno: 4076
branch-nick: 10.0

The timeout error was encountered while running a concurrent test (12 threads on master) with SBR replication using GTID, and slave-parallel-threads=10.

140321 20:08:47 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:10730',replication starts at GTID position ''
140321 20:12:58 [ERROR] mysqld: Lock wait timeout exceeded; try restarting transaction
140321 20:12:58 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'test'. Query: 'UPDATE  `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5', Internal MariaDB error code: 1205
140321 20:12:58 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
140321 20:12:58 [Warning] Slave: Sort aborted: Lock wait timeout exceeded; try restarting transaction Error_code: 1028
140321 20:12:58 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000033' position 20863; GTID position '0-1-669'

The test mainly executes DML, but involves DDL and FLUSH LOGS as well.
The test flow includes a lot of statements that are unsafe for SBR or are considered to be such, but as far as I can see from the slave log, there was nothing particularly wrong done to the slave around the time when the problem occured.

Slave is not restarted during the test (neither the server, nor the logical slave).

I encountered the same error several times later, so it's not a unique occasion, but it's sporadic and not easily reproducible. Also, I couldn't reproduce it while feeding the same master binary logs to a clean slave, so it seems to be a true race condition.

The following logs are attached:

  • master_binlogs.tar.gz - all master binary logs
  • slave_error.log
  • slave_general.log
  • my.cnf (for the reference, only one meaningful option in there, slave-parallel-threads=10); identical for master and slave

Server command lines (although semisync plugins are there, they were not available at the time of server startup, so the error of not finding them can be seen in the slave error log):

<basedir>/sql/mysqld --defaults-group-suffix=.runtime --defaults-file=<master_vardir>/my.cnf --basedir=<basedir> --datadir=<master_vardir>/data --lc-messages-dir=<basedir>/sql/share --character-sets-dir=<basedir>/sql/share/charsets --tmpdir=<master_vardir>/tmp --core-file --max-allowed-packet=128Mb --port=10730 --socket=<master_vardir>/mysql.sock --pid-file=<master_vardir>/mysql.pid --general-log --general-log-file=<master_vardir>/mysql.log --server_id=1 --log-bin=mysql-bin --report-host=127.0.0.1 --report_port=10730 --sql-mode=no_engine_substitution --binlog_commit_wait_count=10 --binlog_commit_wait_usec=1000000 --plugin-load-add=semisync_master --plugin-load-add=semisync_slave --loose-rpl_semi_sync_master_timeout=600 --log-output=FILE --slave-skip-errors=1054,1317,1049,1305,1539,1505

<basedir>/sql/mysqld --defaults-group-suffix=.runtime --defaults-file=<slave_vardir>/my.cnf --basedir=<basedir> --datadir=<slave_vardir>/data --lc-messages-dir=<basedir>/sql/share --character-sets-dir=<basedir>/sql/share/charsets --tmpdir=<slave_vardir>/tmp --core-file --max-allowed-packet=128Mb --port=10732 --socket=<slave_vardir>/mysql.sock --pid-file=<slave_vardir>/mysql.pid --general-log --general-log-file=<slave_vardir>/mysql.log --server_id=2 --report-host=127.0.0.1 --report_port=10732 --sql-mode=no_engine_substitution --binlog_commit_wait_count=10 --binlog_commit_wait_usec=1000000 --plugin-load-add=semisync_master --plugin-load-add=semisync_slave --loose-rpl_semi_sync_master_timeout=600 --log-output=FILE --slave-skip-errors=1054,1317,1049,1305,1539,1505



 Comments   
Comment by Kristian Nielsen [ 2014-03-27 ]

It seems to be these two statements that conflict with one another; these
statements were group-committed on the master, so will be attempted by slave
to run in parallel:

UPDATE  `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5;
 
UPDATE  `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 4113170432  ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 7;

Here is the content of the table after replaying the binlogs up to this point:

col_char_12_key col_int col_int_key pk col_char_12
NULL NULL NULL 2 139
NULL NULL NULL 3 139
NULL NULL NULL 4 139
NULL NULL NULL 5 139
NULL NULL NULL 6 139
NULL 108 NULL 7 86
NULL 0 NULL 8 5
NULL NULL NULL 9 139
NULL NULL NULL 23 NULL
Comment by Kristian Nielsen [ 2014-03-27 ]

The problem is probably caused by different execution plan on slave compared
to on master.

The first update can apparently use two different execution plans. One that
uses the `col_int_key` index to check the IN ( ... ) predicate, and one that
does not.

When using the index, the first UPDATE does not conflict with the second,
since no rows in the index match any of the values in the IN ( ... ) list.

However, when the index is not used, there is conflict with the second UPDATE,
as a table scan must be used.

This can be seen by adding IGNORE INDEX (col_int_key) to the first UPDATE; in
this case running the second update first blocks the first update:

--connect (s3,127.0.0.1,root,,test,$SLAVE_MYPORT,)
BEGIN;
UPDATE  `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 4113170432  ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 7;
 
--connect (s1,127.0.0.1,root,,test,$SLAVE_MYPORT,)
BEGIN;
UPDATE  `table10_innodb_int_autoinc` AS X ignore index (col_int_key) SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5;
# This UPDATE hangs here and eventually gets a lock wait timeout.

Unfortunately, this is a very serious issue, and I am not sure how it could be
solved. It is a fundamental assumption of parallel replication that if two
statements ran without lock conflicts on the master, then they will also be
able to run without lock conflicts on the slave. Without this, it is not safe
to enforce the same commit order on the slave, as it may cause deadlocks, as
seen in this example.

I am not sure that this requires using unsafe statements to trigger (like the
LIMIT in this case), I will try to investigate this further.

Comment by Elena Stepanova [ 2014-03-27 ]

For a side note, these UPDATEs are probably not real unsafe statements, they look more like those that are currently marked unsafe because it is easier to do so. They have ORDER BY clauses which include, among other fields, pk, so the LIMIT should be deterministic and hence safe.
There are open upstream bugs about it, e.g. #62537 and #71105

Comment by Jonas Oreland [ 2014-04-04 ]

Hi Kristian,

May I propose a solution to this problem (that I think might be useful also for a different scenario).
1) Record relay log position for when starting a set of parallel transactions
2) process transactions given parallelism
3) IF a lock wait time-out occurs, restart this set with parallelism set to 1.
4a) IF it fails even then too (with parallelism = 1) stop slave, report to user etc
4b) IF it works like this, reset parallelism to original value

Add counters so that DBA can see if this is a frequently occurring scenario.

(one can also imagine have more heuristics for how to handle the parallelism assuming
if there is a solid framework for retires exists)

An other scenario where a framework like above would be useful is to allow for
optimistic slave concurrency. I.e in certain scenarios guess that transactions can
be processed in parallel (still committing in order), and dynamically adjust parallelism down to 1
if this guess turns out to be wrong.

Way of guessing could be

  • if transactions are created by different connections on master
    (would need to record this in binlog)
  • if transactions use different databases
    etc etc...

What do you think ?

/Jonas

Comment by Kristian Nielsen [ 2014-04-08 ]

Hi Jonas,

> May I propose a solution to this problem (that I think might be useful also for a different scenario).
> 1) Record relay log position for when starting a set of parallel transactions
> 2) process transactions given parallelism
> 3) IF a lock wait time-out occurs, restart this set with parallelism set to 1.
> 4a) IF it fails even then too (with parallelism = 1) stop slave, report to user etc
> 4b) IF it works like this, reset parallelism to original value

The main problem is the need for a timeout. This timeout needs to be small
enough that it doesn't cause too much slave delay, yet large enough that it
doesn't cause spurious failure due to transactions taking too long without any
deadlocks.

However, we could actually detect the deadlock immediately without
timeouts. Whenever in InnoDB we are about to the transaction T1 wait for a
lock owned by T2, we can check if T1 needs to commit before T2 - if so,
instead of letting T1 wait, we will fail and rollback T2 with a deadlock
error. We then proceed to re-try T2 as you describe.

I think this is an interesting solution worth pursuing. I believe this is also
more or less what Galera does.

> An other scenario where a framework like above would be useful is to allow for
> optimistic slave concurrency. I.e in certain scenarios guess that transactions can
> be processed in parallel (still committing in order), and dynamically adjust parallelism down to 1
> if this guess turns out to be wrong.
>
> Way of guessing could be
>
> if transactions are created by different connections on master
> (would need to record this in binlog)
> if transactions use different databases
> etc etc...

Yes, I agree. Since we already know the desired commit order, we can just
start them in parallel and immediately detect if there is any conflict, seen
by a later transaction owning a row or gap lock needed by an earlier
transaction. In case of conflict, we can roll back the later transaction and
re-try it. As long as rollbacks are sufficiently infrequent, this should be a
win...

It is an interesting idea for sure. Maybe the time to try it out is getting
nearer.

- Kristian.

Comment by Kristian Nielsen [ 2014-04-11 ]

So here is how I think this bug should be fixed:

1. First, MDEV-5262 should be fixed, so that transactions that fail during
parallel replication due to deadlock or other temporary error can be re-tried
and eventually succeed.

2. Second, when InnoDB is about to let transaction T1 wait for a lock owned by
transaction T2, it should invoke a callback into the upper layer. This
callback will return a status code about what to do - either let T1 wait; roll
back T2 with deadlock error and let T1 proceed; or roll back T1 with deadlock
error. For parallel replication, we will then check if T1 comes before T2 in
commit order (same Relay_log_info and domain_id, and sub_id of T1 smaller than
T2); if so we ask to roll back T2 so that T1 can proceed and T2 can be
re-tried.

This should solve this and other similar deadlocks due to the enforced commit
order. Later, we can also use this to implement more opportunistic search for
possible parallelims, as suggested by Jonas.

Comment by Kristian Nielsen [ 2014-06-10 ]

Temporary re-assigned to Serg for review

Comment by Kristian Nielsen [ 2014-07-11 ]

Pushed to 10.0.13.

Generated at Thu Feb 08 07:08:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.