Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20628

If temporary error occurs with optimistic mode of parallel replication, error message has false information

    XMLWordPrintable

    Details

      Description

      With optimistic mode of parallel replication, if a query fails, then you can end up with log messages like this:

      2019-09-17 19:27:07 140190015801088 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2019-09-17 19:27:07 140190015801088 [Warning] Slave: Duplicate entry '...' for key 'name' Error_code: 1062
      2019-09-17 19:27:07 140190015801088 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
      2019-09-17 19:27:07 140190015801088 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2019-09-17 19:27:07 140190015801088 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.000001' position 50284765; GTID position '0-1-119489'
      2019-09-17 19:27:07 140190315325184 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2019-09-17 19:27:07 140190315325184 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
      2019-09-17 19:27:07 140190315325184 [Warning] Slave: Connection was killed Error_code: 1927
      2019-09-17 19:27:07 140190315325184 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
      2019-09-17 19:27:07 140190315325184 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2019-09-17 19:27:07 140190315325184 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.000001' position 50284765; GTID position '0-1-119489'
      

      These messages are printed in slave_output_error_info. See here:

      https://github.com/MariaDB/server/blob/mariadb-10.2.27/sql/slave.cc#L4756

      With optimistic mode of parallel replication, the slave_output_error_info function can be called when a transaction has failed because another transaction that it depends on has failed. See here:

      https://github.com/MariaDB/server/blob/mariadb-10.2.27/sql/rpl_parallel.cc#L1221

      Some errors are considered "temporary errors". The only temporary errors are currently just deadlocks and timeouts. See here:

      https://github.com/MariaDB/server/blob/mariadb-10.2.27/sql/slave.cc#L3381

      In the case of a "temporary error", the event group can actually be retried. See here:

      https://github.com/MariaDB/server/blob/mariadb-10.2.27/sql/rpl_parallel.cc#L1305

      These retries happen if slave_transaction_retries is greater than 0.

      https://mariadb.com/kb/en/library/replication-and-binary-log-system-variables/#slave_transaction_retries

      In the example shown above, the errors were deadlocks, so the transactions were indeed retried. Notice that the following message was still included in the above log snippet:

      2019-09-17 19:27:07 140190315325184 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.000001' position 50284765; GTID position '0-1-119489'

      This message is not quite correct in this case.

      • The SQL thread was not aborted.
      • The SQL thread does not need to be manually restarted.

      It seems like this error message in slave_output_error_info should be slightly different if the SQL thread encounters a temporary error. In that case, maybe it should be a Warning instead of an Error, and maybe it should say something like this instead:

      2019-09-17 19:27:07 140190315325184 [Warning] Temporary error running query. The slave SQL thread will retry the query. We encountered the temporary error at log 'mariadb-bin.000001' position 50284765; GTID position '0-1-119489'

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              sachin.setiya.007 Sachin Setiya
              Reporter:
              GeoffMontee Geoff Montee
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated: