Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.0.10
-
None
Description
revision-id: knielsen@knielsen-hq.org-20140321123055-8ch69qymsj1xqmw4
|
date: 2014-03-21 13:30:55 +0100
|
build-date: 2014-03-24 18:54:08 +0200
|
revno: 4076
|
branch-nick: 10.0
|
The timeout error was encountered while running a concurrent test (12 threads on master) with SBR replication using GTID, and slave-parallel-threads=10.
140321 20:08:47 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:10730',replication starts at GTID position ''
|
140321 20:12:58 [ERROR] mysqld: Lock wait timeout exceeded; try restarting transaction
|
140321 20:12:58 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'test'. Query: 'UPDATE `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5', Internal MariaDB error code: 1205
|
140321 20:12:58 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
|
140321 20:12:58 [Warning] Slave: Sort aborted: Lock wait timeout exceeded; try restarting transaction Error_code: 1028
|
140321 20:12:58 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000033' position 20863; GTID position '0-1-669'
|
The test mainly executes DML, but involves DDL and FLUSH LOGS as well.
The test flow includes a lot of statements that are unsafe for SBR or are considered to be such, but as far as I can see from the slave log, there was nothing particularly wrong done to the slave around the time when the problem occured.
Slave is not restarted during the test (neither the server, nor the logical slave).
I encountered the same error several times later, so it's not a unique occasion, but it's sporadic and not easily reproducible. Also, I couldn't reproduce it while feeding the same master binary logs to a clean slave, so it seems to be a true race condition.
The following logs are attached:
- master_binlogs.tar.gz - all master binary logs
- slave_error.log
- slave_general.log
- my.cnf (for the reference, only one meaningful option in there, slave-parallel-threads=10); identical for master and slave
Server command lines (although semisync plugins are there, they were not available at the time of server startup, so the error of not finding them can be seen in the slave error log):
<basedir>/sql/mysqld --defaults-group-suffix=.runtime --defaults-file=<master_vardir>/my.cnf --basedir=<basedir> --datadir=<master_vardir>/data --lc-messages-dir=<basedir>/sql/share --character-sets-dir=<basedir>/sql/share/charsets --tmpdir=<master_vardir>/tmp --core-file --max-allowed-packet=128Mb --port=10730 --socket=<master_vardir>/mysql.sock --pid-file=<master_vardir>/mysql.pid --general-log --general-log-file=<master_vardir>/mysql.log --server_id=1 --log-bin=mysql-bin --report-host=127.0.0.1 --report_port=10730 --sql-mode=no_engine_substitution --binlog_commit_wait_count=10 --binlog_commit_wait_usec=1000000 --plugin-load-add=semisync_master --plugin-load-add=semisync_slave --loose-rpl_semi_sync_master_timeout=600 --log-output=FILE --slave-skip-errors=1054,1317,1049,1305,1539,1505
|
<basedir>/sql/mysqld --defaults-group-suffix=.runtime --defaults-file=<slave_vardir>/my.cnf --basedir=<basedir> --datadir=<slave_vardir>/data --lc-messages-dir=<basedir>/sql/share --character-sets-dir=<basedir>/sql/share/charsets --tmpdir=<slave_vardir>/tmp --core-file --max-allowed-packet=128Mb --port=10732 --socket=<slave_vardir>/mysql.sock --pid-file=<slave_vardir>/mysql.pid --general-log --general-log-file=<slave_vardir>/mysql.log --server_id=2 --report-host=127.0.0.1 --report_port=10732 --sql-mode=no_engine_substitution --binlog_commit_wait_count=10 --binlog_commit_wait_usec=1000000 --plugin-load-add=semisync_master --plugin-load-add=semisync_slave --loose-rpl_semi_sync_master_timeout=600 --log-output=FILE --slave-skip-errors=1054,1317,1049,1305,1539,1505
|