Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5941

Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on DML with slave parallel threads, SBR

    XMLWordPrintable

    Details

      Description

      revision-id: knielsen@knielsen-hq.org-20140321123055-8ch69qymsj1xqmw4
      date: 2014-03-21 13:30:55 +0100
      build-date: 2014-03-24 18:54:08 +0200
      revno: 4076
      branch-nick: 10.0

      The timeout error was encountered while running a concurrent test (12 threads on master) with SBR replication using GTID, and slave-parallel-threads=10.

      140321 20:08:47 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:10730',replication starts at GTID position ''
      140321 20:12:58 [ERROR] mysqld: Lock wait timeout exceeded; try restarting transaction
      140321 20:12:58 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'test'. Query: 'UPDATE  `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5', Internal MariaDB error code: 1205
      140321 20:12:58 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
      140321 20:12:58 [Warning] Slave: Sort aborted: Lock wait timeout exceeded; try restarting transaction Error_code: 1028
      140321 20:12:58 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000033' position 20863; GTID position '0-1-669'

      The test mainly executes DML, but involves DDL and FLUSH LOGS as well.
      The test flow includes a lot of statements that are unsafe for SBR or are considered to be such, but as far as I can see from the slave log, there was nothing particularly wrong done to the slave around the time when the problem occured.

      Slave is not restarted during the test (neither the server, nor the logical slave).

      I encountered the same error several times later, so it's not a unique occasion, but it's sporadic and not easily reproducible. Also, I couldn't reproduce it while feeding the same master binary logs to a clean slave, so it seems to be a true race condition.

      The following logs are attached:

      • master_binlogs.tar.gz - all master binary logs
      • slave_error.log
      • slave_general.log
      • my.cnf (for the reference, only one meaningful option in there, slave-parallel-threads=10); identical for master and slave

      Server command lines (although semisync plugins are there, they were not available at the time of server startup, so the error of not finding them can be seen in the slave error log):

      <basedir>/sql/mysqld --defaults-group-suffix=.runtime --defaults-file=<master_vardir>/my.cnf --basedir=<basedir> --datadir=<master_vardir>/data --lc-messages-dir=<basedir>/sql/share --character-sets-dir=<basedir>/sql/share/charsets --tmpdir=<master_vardir>/tmp --core-file --max-allowed-packet=128Mb --port=10730 --socket=<master_vardir>/mysql.sock --pid-file=<master_vardir>/mysql.pid --general-log --general-log-file=<master_vardir>/mysql.log --server_id=1 --log-bin=mysql-bin --report-host=127.0.0.1 --report_port=10730 --sql-mode=no_engine_substitution --binlog_commit_wait_count=10 --binlog_commit_wait_usec=1000000 --plugin-load-add=semisync_master --plugin-load-add=semisync_slave --loose-rpl_semi_sync_master_timeout=600 --log-output=FILE --slave-skip-errors=1054,1317,1049,1305,1539,1505

      <basedir>/sql/mysqld --defaults-group-suffix=.runtime --defaults-file=<slave_vardir>/my.cnf --basedir=<basedir> --datadir=<slave_vardir>/data --lc-messages-dir=<basedir>/sql/share --character-sets-dir=<basedir>/sql/share/charsets --tmpdir=<slave_vardir>/tmp --core-file --max-allowed-packet=128Mb --port=10732 --socket=<slave_vardir>/mysql.sock --pid-file=<slave_vardir>/mysql.pid --general-log --general-log-file=<slave_vardir>/mysql.log --server_id=2 --report-host=127.0.0.1 --report_port=10732 --sql-mode=no_engine_substitution --binlog_commit_wait_count=10 --binlog_commit_wait_usec=1000000 --plugin-load-add=semisync_master --plugin-load-add=semisync_slave --loose-rpl_semi_sync_master_timeout=600 --log-output=FILE --slave-skip-errors=1054,1317,1049,1305,1539,1505

        Attachments

        1. master_binlogs.tar.gz
          71 kB
        2. my.cnf
          0.2 kB
        3. slave_error.log
          6 kB
        4. slave_general.log
          243 kB

          Issue Links

            Activity

              People

              Assignee:
              knielsen Kristian Nielsen
              Reporter:
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: