Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5941

Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on DML with slave parallel threads, SBR

Details

    Description

      revision-id: knielsen@knielsen-hq.org-20140321123055-8ch69qymsj1xqmw4
      date: 2014-03-21 13:30:55 +0100
      build-date: 2014-03-24 18:54:08 +0200
      revno: 4076
      branch-nick: 10.0

      The timeout error was encountered while running a concurrent test (12 threads on master) with SBR replication using GTID, and slave-parallel-threads=10.

      140321 20:08:47 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:10730',replication starts at GTID position ''
      140321 20:12:58 [ERROR] mysqld: Lock wait timeout exceeded; try restarting transaction
      140321 20:12:58 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'test'. Query: 'UPDATE  `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5', Internal MariaDB error code: 1205
      140321 20:12:58 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
      140321 20:12:58 [Warning] Slave: Sort aborted: Lock wait timeout exceeded; try restarting transaction Error_code: 1028
      140321 20:12:58 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000033' position 20863; GTID position '0-1-669'

      The test mainly executes DML, but involves DDL and FLUSH LOGS as well.
      The test flow includes a lot of statements that are unsafe for SBR or are considered to be such, but as far as I can see from the slave log, there was nothing particularly wrong done to the slave around the time when the problem occured.

      Slave is not restarted during the test (neither the server, nor the logical slave).

      I encountered the same error several times later, so it's not a unique occasion, but it's sporadic and not easily reproducible. Also, I couldn't reproduce it while feeding the same master binary logs to a clean slave, so it seems to be a true race condition.

      The following logs are attached:

      • master_binlogs.tar.gz - all master binary logs
      • slave_error.log
      • slave_general.log
      • my.cnf (for the reference, only one meaningful option in there, slave-parallel-threads=10); identical for master and slave

      Server command lines (although semisync plugins are there, they were not available at the time of server startup, so the error of not finding them can be seen in the slave error log):

      <basedir>/sql/mysqld --defaults-group-suffix=.runtime --defaults-file=<master_vardir>/my.cnf --basedir=<basedir> --datadir=<master_vardir>/data --lc-messages-dir=<basedir>/sql/share --character-sets-dir=<basedir>/sql/share/charsets --tmpdir=<master_vardir>/tmp --core-file --max-allowed-packet=128Mb --port=10730 --socket=<master_vardir>/mysql.sock --pid-file=<master_vardir>/mysql.pid --general-log --general-log-file=<master_vardir>/mysql.log --server_id=1 --log-bin=mysql-bin --report-host=127.0.0.1 --report_port=10730 --sql-mode=no_engine_substitution --binlog_commit_wait_count=10 --binlog_commit_wait_usec=1000000 --plugin-load-add=semisync_master --plugin-load-add=semisync_slave --loose-rpl_semi_sync_master_timeout=600 --log-output=FILE --slave-skip-errors=1054,1317,1049,1305,1539,1505

      <basedir>/sql/mysqld --defaults-group-suffix=.runtime --defaults-file=<slave_vardir>/my.cnf --basedir=<basedir> --datadir=<slave_vardir>/data --lc-messages-dir=<basedir>/sql/share --character-sets-dir=<basedir>/sql/share/charsets --tmpdir=<slave_vardir>/tmp --core-file --max-allowed-packet=128Mb --port=10732 --socket=<slave_vardir>/mysql.sock --pid-file=<slave_vardir>/mysql.pid --general-log --general-log-file=<slave_vardir>/mysql.log --server_id=2 --report-host=127.0.0.1 --report_port=10732 --sql-mode=no_engine_substitution --binlog_commit_wait_count=10 --binlog_commit_wait_usec=1000000 --plugin-load-add=semisync_master --plugin-load-add=semisync_slave --loose-rpl_semi_sync_master_timeout=600 --log-output=FILE --slave-skip-errors=1054,1317,1049,1305,1539,1505

      Attachments

        1. master_binlogs.tar.gz
          71 kB
        2. my.cnf
          0.2 kB
        3. slave_error.log
          6 kB
        4. slave_general.log
          243 kB

        Issue Links

          Activity

            elenst Elena Stepanova created issue -
            knielsen Kristian Nielsen made changes -
            Field Original Value New Value
            Status Open [ 1 ] In Progress [ 3 ]
            knielsen Kristian Nielsen made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.0.11 [ 15200 ]
            Fix Version/s 10.0.10 [ 14500 ]
            elenst Elena Stepanova made changes -
            Affects Version/s 10.0.10 [ 14500 ]
            knielsen Kristian Nielsen made changes -
            knielsen Kristian Nielsen made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            knielsen Kristian Nielsen made changes -
            knielsen Kristian Nielsen made changes -
            Priority Critical [ 2 ] Minor [ 4 ]
            knielsen Kristian Nielsen made changes -
            Priority Minor [ 4 ] Major [ 3 ]
            knielsen Kristian Nielsen made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            knielsen Kristian Nielsen made changes -
            Labels parallelslave
            serg Sergei Golubchik made changes -
            Fix Version/s 10.0.12 [ 15201 ]
            Fix Version/s 10.0.11 [ 15200 ]
            knielsen Kristian Nielsen made changes -
            Priority Major [ 3 ] Minor [ 4 ]
            serg Sergei Golubchik made changes -
            Priority Minor [ 4 ] Critical [ 2 ]
            knielsen Kristian Nielsen made changes -
            Assignee Kristian Nielsen [ knielsen ] Sergei Golubchik [ serg ]
            serg Sergei Golubchik made changes -
            Labels parallelslave parallelslave replication
            serg Sergei Golubchik made changes -
            Workflow defaullt [ 37101 ] MariaDB v2 [ 42457 ]
            serg Sergei Golubchik made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            serg Sergei Golubchik made changes -
            Status Stalled [ 10000 ] In Review [ 10002 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.0.13 [ 16000 ]
            Fix Version/s 10.0.12 [ 15201 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.0.13 [ 16300 ]
            Fix Version/s 10.0 [ 16000 ]
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Kristian Nielsen [ knielsen ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            knielsen Kristian Nielsen made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            knielsen Kristian Nielsen made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Workflow MariaDB v2 [ 42457 ] MariaDB v3 [ 61414 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 61414 ] MariaDB v4 [ 147690 ]

            People

              knielsen Kristian Nielsen
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.