Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5941

Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on DML with slave parallel threads, SBR

Details

    Description

      revision-id: knielsen@knielsen-hq.org-20140321123055-8ch69qymsj1xqmw4
      date: 2014-03-21 13:30:55 +0100
      build-date: 2014-03-24 18:54:08 +0200
      revno: 4076
      branch-nick: 10.0

      The timeout error was encountered while running a concurrent test (12 threads on master) with SBR replication using GTID, and slave-parallel-threads=10.

      140321 20:08:47 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:10730',replication starts at GTID position ''
      140321 20:12:58 [ERROR] mysqld: Lock wait timeout exceeded; try restarting transaction
      140321 20:12:58 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'test'. Query: 'UPDATE  `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5', Internal MariaDB error code: 1205
      140321 20:12:58 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
      140321 20:12:58 [Warning] Slave: Sort aborted: Lock wait timeout exceeded; try restarting transaction Error_code: 1028
      140321 20:12:58 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000033' position 20863; GTID position '0-1-669'

      The test mainly executes DML, but involves DDL and FLUSH LOGS as well.
      The test flow includes a lot of statements that are unsafe for SBR or are considered to be such, but as far as I can see from the slave log, there was nothing particularly wrong done to the slave around the time when the problem occured.

      Slave is not restarted during the test (neither the server, nor the logical slave).

      I encountered the same error several times later, so it's not a unique occasion, but it's sporadic and not easily reproducible. Also, I couldn't reproduce it while feeding the same master binary logs to a clean slave, so it seems to be a true race condition.

      The following logs are attached:

      • master_binlogs.tar.gz - all master binary logs
      • slave_error.log
      • slave_general.log
      • my.cnf (for the reference, only one meaningful option in there, slave-parallel-threads=10); identical for master and slave

      Server command lines (although semisync plugins are there, they were not available at the time of server startup, so the error of not finding them can be seen in the slave error log):

      <basedir>/sql/mysqld --defaults-group-suffix=.runtime --defaults-file=<master_vardir>/my.cnf --basedir=<basedir> --datadir=<master_vardir>/data --lc-messages-dir=<basedir>/sql/share --character-sets-dir=<basedir>/sql/share/charsets --tmpdir=<master_vardir>/tmp --core-file --max-allowed-packet=128Mb --port=10730 --socket=<master_vardir>/mysql.sock --pid-file=<master_vardir>/mysql.pid --general-log --general-log-file=<master_vardir>/mysql.log --server_id=1 --log-bin=mysql-bin --report-host=127.0.0.1 --report_port=10730 --sql-mode=no_engine_substitution --binlog_commit_wait_count=10 --binlog_commit_wait_usec=1000000 --plugin-load-add=semisync_master --plugin-load-add=semisync_slave --loose-rpl_semi_sync_master_timeout=600 --log-output=FILE --slave-skip-errors=1054,1317,1049,1305,1539,1505

      <basedir>/sql/mysqld --defaults-group-suffix=.runtime --defaults-file=<slave_vardir>/my.cnf --basedir=<basedir> --datadir=<slave_vardir>/data --lc-messages-dir=<basedir>/sql/share --character-sets-dir=<basedir>/sql/share/charsets --tmpdir=<slave_vardir>/tmp --core-file --max-allowed-packet=128Mb --port=10732 --socket=<slave_vardir>/mysql.sock --pid-file=<slave_vardir>/mysql.pid --general-log --general-log-file=<slave_vardir>/mysql.log --server_id=2 --report-host=127.0.0.1 --report_port=10732 --sql-mode=no_engine_substitution --binlog_commit_wait_count=10 --binlog_commit_wait_usec=1000000 --plugin-load-add=semisync_master --plugin-load-add=semisync_slave --loose-rpl_semi_sync_master_timeout=600 --log-output=FILE --slave-skip-errors=1054,1317,1049,1305,1539,1505

      Attachments

        1. master_binlogs.tar.gz
          71 kB
          Elena Stepanova
        2. my.cnf
          0.2 kB
          Elena Stepanova
        3. slave_error.log
          6 kB
          Elena Stepanova
        4. slave_general.log
          243 kB
          Elena Stepanova

        Issue Links

          Activity

            knielsen Kristian Nielsen added a comment - - edited

            It seems to be these two statements that conflict with one another; these
            statements were group-committed on the master, so will be attempted by slave
            to run in parallel:

            UPDATE  `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5;
             
            UPDATE  `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 4113170432  ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 7;

            Here is the content of the table after replaying the binlogs up to this point:

            col_char_12_key col_int col_int_key pk col_char_12
            NULL NULL NULL 2 139
            NULL NULL NULL 3 139
            NULL NULL NULL 4 139
            NULL NULL NULL 5 139
            NULL NULL NULL 6 139
            NULL 108 NULL 7 86
            NULL 0 NULL 8 5
            NULL NULL NULL 9 139
            NULL NULL NULL 23 NULL
            knielsen Kristian Nielsen added a comment - - edited It seems to be these two statements that conflict with one another; these statements were group-committed on the master, so will be attempted by slave to run in parallel: UPDATE `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5;   UPDATE `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 4113170432 ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 7; Here is the content of the table after replaying the binlogs up to this point: col_char_12_key col_int col_int_key pk col_char_12 NULL NULL NULL 2 139 NULL NULL NULL 3 139 NULL NULL NULL 4 139 NULL NULL NULL 5 139 NULL NULL NULL 6 139 NULL 108 NULL 7 86 NULL 0 NULL 8 5 NULL NULL NULL 9 139 NULL NULL NULL 23 NULL
            knielsen Kristian Nielsen added a comment - - edited

            The problem is probably caused by different execution plan on slave compared
            to on master.

            The first update can apparently use two different execution plans. One that
            uses the `col_int_key` index to check the IN ( ... ) predicate, and one that
            does not.

            When using the index, the first UPDATE does not conflict with the second,
            since no rows in the index match any of the values in the IN ( ... ) list.

            However, when the index is not used, there is conflict with the second UPDATE,
            as a table scan must be used.

            This can be seen by adding IGNORE INDEX (col_int_key) to the first UPDATE; in
            this case running the second update first blocks the first update:

            --connect (s3,127.0.0.1,root,,test,$SLAVE_MYPORT,)
            BEGIN;
            UPDATE  `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 4113170432  ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 7;
             
            --connect (s1,127.0.0.1,root,,test,$SLAVE_MYPORT,)
            BEGIN;
            UPDATE  `table10_innodb_int_autoinc` AS X ignore index (col_int_key) SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5;
            # This UPDATE hangs here and eventually gets a lock wait timeout.

            Unfortunately, this is a very serious issue, and I am not sure how it could be
            solved. It is a fundamental assumption of parallel replication that if two
            statements ran without lock conflicts on the master, then they will also be
            able to run without lock conflicts on the slave. Without this, it is not safe
            to enforce the same commit order on the slave, as it may cause deadlocks, as
            seen in this example.

            I am not sure that this requires using unsafe statements to trigger (like the
            LIMIT in this case), I will try to investigate this further.

            knielsen Kristian Nielsen added a comment - - edited The problem is probably caused by different execution plan on slave compared to on master. The first update can apparently use two different execution plans. One that uses the `col_int_key` index to check the IN ( ... ) predicate, and one that does not. When using the index, the first UPDATE does not conflict with the second, since no rows in the index match any of the values in the IN ( ... ) list. However, when the index is not used, there is conflict with the second UPDATE, as a table scan must be used. This can be seen by adding IGNORE INDEX (col_int_key) to the first UPDATE; in this case running the second update first blocks the first update: --connect (s3,127.0.0.1,root,,test,$SLAVE_MYPORT,) BEGIN ; UPDATE `table10_innodb_int_autoinc` AS X SET `col_char_12_key` = 4113170432 ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 7;   --connect (s1,127.0.0.1,root,,test,$SLAVE_MYPORT,) BEGIN ; UPDATE `table10_innodb_int_autoinc` AS X ignore index (col_int_key) SET `col_char_12_key` = 'j' WHERE X . `col_int_key` IN ( 223 , 4 , 113 , 1674772480 , 242679808 ) ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 5; # This UPDATE hangs here and eventually gets a lock wait timeout. Unfortunately, this is a very serious issue, and I am not sure how it could be solved. It is a fundamental assumption of parallel replication that if two statements ran without lock conflicts on the master, then they will also be able to run without lock conflicts on the slave. Without this, it is not safe to enforce the same commit order on the slave, as it may cause deadlocks, as seen in this example. I am not sure that this requires using unsafe statements to trigger (like the LIMIT in this case), I will try to investigate this further.

            For a side note, these UPDATEs are probably not real unsafe statements, they look more like those that are currently marked unsafe because it is easier to do so. They have ORDER BY clauses which include, among other fields, pk, so the LIMIT should be deterministic and hence safe.
            There are open upstream bugs about it, e.g. #62537 and #71105

            elenst Elena Stepanova added a comment - For a side note, these UPDATEs are probably not real unsafe statements, they look more like those that are currently marked unsafe because it is easier to do so. They have ORDER BY clauses which include, among other fields, pk, so the LIMIT should be deterministic and hence safe. There are open upstream bugs about it, e.g. #62537 and #71105
            jonaso Jonas Oreland added a comment -

            Hi Kristian,

            May I propose a solution to this problem (that I think might be useful also for a different scenario).
            1) Record relay log position for when starting a set of parallel transactions
            2) process transactions given parallelism
            3) IF a lock wait time-out occurs, restart this set with parallelism set to 1.
            4a) IF it fails even then too (with parallelism = 1) stop slave, report to user etc
            4b) IF it works like this, reset parallelism to original value

            Add counters so that DBA can see if this is a frequently occurring scenario.

            (one can also imagine have more heuristics for how to handle the parallelism assuming
            if there is a solid framework for retires exists)

            —

            An other scenario where a framework like above would be useful is to allow for
            optimistic slave concurrency. I.e in certain scenarios guess that transactions can
            be processed in parallel (still committing in order), and dynamically adjust parallelism down to 1
            if this guess turns out to be wrong.

            Way of guessing could be

            • if transactions are created by different connections on master
              (would need to record this in binlog)
            • if transactions use different databases
              etc etc...

            What do you think ?

            /Jonas

            jonaso Jonas Oreland added a comment - Hi Kristian, May I propose a solution to this problem (that I think might be useful also for a different scenario). 1) Record relay log position for when starting a set of parallel transactions 2) process transactions given parallelism 3) IF a lock wait time-out occurs, restart this set with parallelism set to 1. 4a) IF it fails even then too (with parallelism = 1) stop slave, report to user etc 4b) IF it works like this, reset parallelism to original value Add counters so that DBA can see if this is a frequently occurring scenario. (one can also imagine have more heuristics for how to handle the parallelism assuming if there is a solid framework for retires exists) — An other scenario where a framework like above would be useful is to allow for optimistic slave concurrency. I.e in certain scenarios guess that transactions can be processed in parallel (still committing in order), and dynamically adjust parallelism down to 1 if this guess turns out to be wrong. Way of guessing could be if transactions are created by different connections on master (would need to record this in binlog) if transactions use different databases etc etc... What do you think ? /Jonas
            knielsen Kristian Nielsen added a comment - - edited

            Hi Jonas,

            > May I propose a solution to this problem (that I think might be useful also for a different scenario).
            > 1) Record relay log position for when starting a set of parallel transactions
            > 2) process transactions given parallelism
            > 3) IF a lock wait time-out occurs, restart this set with parallelism set to 1.
            > 4a) IF it fails even then too (with parallelism = 1) stop slave, report to user etc
            > 4b) IF it works like this, reset parallelism to original value

            The main problem is the need for a timeout. This timeout needs to be small
            enough that it doesn't cause too much slave delay, yet large enough that it
            doesn't cause spurious failure due to transactions taking too long without any
            deadlocks.

            However, we could actually detect the deadlock immediately without
            timeouts. Whenever in InnoDB we are about to the transaction T1 wait for a
            lock owned by T2, we can check if T1 needs to commit before T2 - if so,
            instead of letting T1 wait, we will fail and rollback T2 with a deadlock
            error. We then proceed to re-try T2 as you describe.

            I think this is an interesting solution worth pursuing. I believe this is also
            more or less what Galera does.

            > An other scenario where a framework like above would be useful is to allow for
            > optimistic slave concurrency. I.e in certain scenarios guess that transactions can
            > be processed in parallel (still committing in order), and dynamically adjust parallelism down to 1
            > if this guess turns out to be wrong.
            >
            > Way of guessing could be
            >
            > if transactions are created by different connections on master
            > (would need to record this in binlog)
            > if transactions use different databases
            > etc etc...

            Yes, I agree. Since we already know the desired commit order, we can just
            start them in parallel and immediately detect if there is any conflict, seen
            by a later transaction owning a row or gap lock needed by an earlier
            transaction. In case of conflict, we can roll back the later transaction and
            re-try it. As long as rollbacks are sufficiently infrequent, this should be a
            win...

            It is an interesting idea for sure. Maybe the time to try it out is getting
            nearer.

            - Kristian.

            knielsen Kristian Nielsen added a comment - - edited Hi Jonas, > May I propose a solution to this problem (that I think might be useful also for a different scenario). > 1) Record relay log position for when starting a set of parallel transactions > 2) process transactions given parallelism > 3) IF a lock wait time-out occurs, restart this set with parallelism set to 1. > 4a) IF it fails even then too (with parallelism = 1) stop slave, report to user etc > 4b) IF it works like this, reset parallelism to original value The main problem is the need for a timeout. This timeout needs to be small enough that it doesn't cause too much slave delay, yet large enough that it doesn't cause spurious failure due to transactions taking too long without any deadlocks. However, we could actually detect the deadlock immediately without timeouts. Whenever in InnoDB we are about to the transaction T1 wait for a lock owned by T2, we can check if T1 needs to commit before T2 - if so, instead of letting T1 wait, we will fail and rollback T2 with a deadlock error. We then proceed to re-try T2 as you describe. I think this is an interesting solution worth pursuing. I believe this is also more or less what Galera does. > An other scenario where a framework like above would be useful is to allow for > optimistic slave concurrency. I.e in certain scenarios guess that transactions can > be processed in parallel (still committing in order), and dynamically adjust parallelism down to 1 > if this guess turns out to be wrong. > > Way of guessing could be > > if transactions are created by different connections on master > (would need to record this in binlog) > if transactions use different databases > etc etc... Yes, I agree. Since we already know the desired commit order, we can just start them in parallel and immediately detect if there is any conflict, seen by a later transaction owning a row or gap lock needed by an earlier transaction. In case of conflict, we can roll back the later transaction and re-try it. As long as rollbacks are sufficiently infrequent, this should be a win... It is an interesting idea for sure. Maybe the time to try it out is getting nearer. - Kristian.
            knielsen Kristian Nielsen added a comment - - edited

            So here is how I think this bug should be fixed:

            1. First, MDEV-5262 should be fixed, so that transactions that fail during
            parallel replication due to deadlock or other temporary error can be re-tried
            and eventually succeed.

            2. Second, when InnoDB is about to let transaction T1 wait for a lock owned by
            transaction T2, it should invoke a callback into the upper layer. This
            callback will return a status code about what to do - either let T1 wait; roll
            back T2 with deadlock error and let T1 proceed; or roll back T1 with deadlock
            error. For parallel replication, we will then check if T1 comes before T2 in
            commit order (same Relay_log_info and domain_id, and sub_id of T1 smaller than
            T2); if so we ask to roll back T2 so that T1 can proceed and T2 can be
            re-tried.

            This should solve this and other similar deadlocks due to the enforced commit
            order. Later, we can also use this to implement more opportunistic search for
            possible parallelims, as suggested by Jonas.

            knielsen Kristian Nielsen added a comment - - edited So here is how I think this bug should be fixed: 1. First, MDEV-5262 should be fixed, so that transactions that fail during parallel replication due to deadlock or other temporary error can be re-tried and eventually succeed. 2. Second, when InnoDB is about to let transaction T1 wait for a lock owned by transaction T2, it should invoke a callback into the upper layer. This callback will return a status code about what to do - either let T1 wait; roll back T2 with deadlock error and let T1 proceed; or roll back T1 with deadlock error. For parallel replication, we will then check if T1 comes before T2 in commit order (same Relay_log_info and domain_id, and sub_id of T1 smaller than T2); if so we ask to roll back T2 so that T1 can proceed and T2 can be re-tried. This should solve this and other similar deadlocks due to the enforced commit order. Later, we can also use this to implement more opportunistic search for possible parallelims, as suggested by Jonas.

            Temporary re-assigned to Serg for review

            knielsen Kristian Nielsen added a comment - Temporary re-assigned to Serg for review

            Pushed to 10.0.13.

            knielsen Kristian Nielsen added a comment - Pushed to 10.0.13.

            People

              knielsen Kristian Nielsen
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.