Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6020

Slave SQL: "Deadlock found when trying to get lock" with parallel replication, RBR

Details

    Description

      The attached binlog was generated during a normal concurrent test, no tampering has been done.
      In case it is relevant, the master was run with the following options:

      mysqld --loose-slave_ddl_exec_mode=STRICT \
      --loose-rpl_semi_sync_slave_enabled=0 \
      --loose-slave_parallel_threads=12 \
      --core-file \
      --max-allowed-packet=128Mb \
      --general-log \
      --log-bin=mysql-bin \
      --sql-mode=no_engine_substitution \
      --binlog_commit_wait_count=10 \
      --binlog_commit_wait_usec=1000000 \
      --plugin-load-add=semisync_master \
      --plugin-load-add=semisync_slave \
      --loose-rpl_semi_sync_master_timeout=600 \
      --log-output=FILE \
      --ignore-builtin-innodb \
      --plugin-load-add=ha_innodb.so \
      --slave-skip-errors=1049,1305,1539,1505

      Binlog format was set to 'row' globally at runtime.

      Note: Although the master command line contains slave-related options, it is just due to uniformity; master was not configured and used as a slave at any point of time.

      Slave was initially run with identical parameters, but none of them, except for slave_parallel_threads, makes any difference. I reproduced the failure a number of times by feeding the same error log from a master to a new clean slave, and was able to discard all non-defaultoptions except for slave_parallel_threads.

      It does not happen every time upon re-using the binary log, on average on my machine the failure happens once in 4-5 attempts, but it can depend on the machine of course.

      Replication stops in several places, normally between GTID 0-1-604 and 0-1-611, but at least once it happened later. Below are extracts from error logs upon different failures that I encountered.

      The failure happens both with master_use_gtid=current_pos and master_use_gtid=no.

      Additional observations:

      When replication stops, the last position is reported in the error log; but in SHOW SLAVE STATUS Exec_Master_Log_Pos points at the end of the master binlog, it looks wrong. See SHOW SLAVE STATUS output in examples below.

      Sometimes SHOW SLAVE STATUS reports "Last_Error: Error 'Commit failed due to failure of an earlier commit on which this one depends' on query. Default database: 'test'. Query: 'COMMIT'" instead of the initial error.

      In addition to the binary log, 3 error logs are attached:

      • occurrence of the failure with GTID (slave_gtid.err)
      • occurrence of the failure without GTID (slave_no_gtid.err)
      • occurrence of the failure when the secondary error was reported by SHOW SLAVE STATUS (slave_gtid_wrong_error.err)

      Examples of failure points:

      end_log_pos 237673
      stopped at position 237504; GTID position '0-1-611'
       
      [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.table10_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 237673, Internal MariaDB error code: 1213
      [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 237504; GTID position '0-1-611'
       
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 127.0.0.1
                        Master_User: root
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mysql-bin.000002
                Read_Master_Log_Pos: 326
                     Relay_Log_File: wheezy-64-relay-bin.000003
                      Relay_Log_Pos: 237791
              Relay_Master_Log_File: mysql-bin.000002
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: No
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 1213
                         Last_Error: Could not execute Write_rows_v1 event on table test.table10_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 237673
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 326
                    Relay_Log_Space: 583505
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 1213
                     Last_SQL_Error: Could not execute Write_rows_v1 event on table test.table10_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 237673
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 1
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: Current_Pos
                        Gtid_IO_Pos: 0-1-1556

      end_log_pos 237673
      stopped at position 235796; GTID position '0-1-604'
       
      [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.table10_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 237673, Internal MariaDB error code: 1213
      [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 235796; GTID position '0-1-604'
       
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 127.0.0.1
                        Master_User: root
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mysql-bin.000002
                Read_Master_Log_Pos: 326
                     Relay_Log_File: wheezy-64-relay-bin.000003
                      Relay_Log_Pos: 237791
              Relay_Master_Log_File: mysql-bin.000002
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: No
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 1213
                         Last_Error: Could not execute Write_rows_v1 event on table test.table10_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 237673
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 326
                    Relay_Log_Space: 583505
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 1213
                     Last_SQL_Error: Could not execute Write_rows_v1 event on table test.table10_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 237673
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 1
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: Current_Pos
                        Gtid_IO_Pos: 0-1-1556

      end_log_pos 288531
      stopped at position 288359; GTID position '0-1-738'
       
      [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.table1_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 288531, Internal MariaDB error code: 1213
      [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 288359; GTID position '0-1-738'
      [ERROR] Slave SQL: Error 'Commit failed due to failure of an earlier commit on which this one depends' on query. Default database: 'test'. Query: 'COMMIT', Internal MariaDB error code: 1964
       
      MariaDB [test]> show slave status \G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 127.0.0.1
                        Master_User: root
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mysql-bin.000002
                Read_Master_Log_Pos: 326
                     Relay_Log_File: wheezy-64-relay-bin.000003
                      Relay_Log_Pos: 288646
              Relay_Master_Log_File: mysql-bin.000002
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: No
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 1964
                         Last_Error: Error 'Commit failed due to failure of an earlier commit on which this one depends' on query. Default database: 'test'. Query: 'COMMIT'
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 326
                    Relay_Log_Space: 583505
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 1964
                     Last_SQL_Error: Error 'Commit failed due to failure of an earlier commit on which this one depends' on query. Default database: 'test'. Query: 'COMMIT'
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 1
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: Current_Pos
                        Gtid_IO_Pos: 0-1-1556

      end_log_pos 237435
      stopped at position 237261; GTID position '0-1-610'
       
      [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.table10_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 237435, Internal MariaDB error code: 1213
      [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 237261; GTID position '0-1-610'
      [ERROR] Slave SQL: Error 'Commit failed due to failure of an earlier commit on which this one depends' on query. Default database: 'test'. Query: 'COMMIT', Internal MariaDB error code: 1964
       
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 127.0.0.1
                        Master_User: root
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mysql-bin.000002
                Read_Master_Log_Pos: 326
                     Relay_Log_File: wheezy-64-relay-bin.000003
                      Relay_Log_Pos: 237548
              Relay_Master_Log_File: mysql-bin.000002
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: No
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 1964
                         Last_Error: Error 'Commit failed due to failure of an earlier commit on which this one depends' on query. Default database: 'test'. Query: 'COMMIT'
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 326
                    Relay_Log_Space: 583505
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 1964
                     Last_SQL_Error: Error 'Commit failed due to failure of an earlier commit on which this one depends' on query. Default database: 'test'. Query: 'COMMIT'
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 1
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: Current_Pos
                        Gtid_IO_Pos: 0-1-1556

      end_log_pos 237435
      stopped at position 236413; GTID position '0-1-606'
       
      [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.table10_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 237435, Internal MariaDB error code: 1213
      [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 236413; GTID position '0-1-606'
      [ERROR] Slave SQL: Error 'Commit failed due to failure of an earlier commit on which this one depends' on query. Default database: 'test'. Query: 'COMMIT', Internal MariaDB error code: 1964
       
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 127.0.0.1
                        Master_User: root
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mysql-bin.000002
                Read_Master_Log_Pos: 326
                     Relay_Log_File: wheezy-64-relay-bin.000003
                      Relay_Log_Pos: 237548
              Relay_Master_Log_File: mysql-bin.000002
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: No
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 1964
                         Last_Error: Error 'Commit failed due to failure of an earlier commit on which this one depends' on query. Default database: 'test'. Query: 'COMMIT'
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 326
                    Relay_Log_Space: 583505
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 1964
                     Last_SQL_Error: Error 'Commit failed due to failure of an earlier commit on which this one depends' on query. Default database: 'test'. Query: 'COMMIT'
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 1
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: Current_Pos
                        Gtid_IO_Pos: 0-1-1556
       

      end_log_pos 237673
      stopped at position 237504 (GTID disabled)
       
      [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.table10_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 237673, Internal MariaDB error code: 1213
      [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 237504
       
      *************************** 1. row ***************************
                    Slave_IO_State: Waiting for master to send event
                        Master_Host: 127.0.0.1
                        Master_User: root
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mysql-bin.000002
                Read_Master_Log_Pos: 326
                     Relay_Log_File: wheezy-64-relay-bin.000003
                      Relay_Log_Pos: 237791
              Relay_Master_Log_File: mysql-bin.000002
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: No
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 1213
                         Last_Error: Could not execute Write_rows_v1 event on table test.table10_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 237673
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 326
                    Relay_Log_Space: 583505
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 1213
                     Last_SQL_Error: Could not execute Write_rows_v1 event on table test.table10_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 237673
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 1
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: No
                        Gtid_IO_Pos: 

      end_log_pos 237435
      stopped at position 236836; GTID position '0-1-608'
       
      [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.table10_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 237435, Internal MariaDB error code: 1213
      [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 236836; GTID position '0-1-608'
      [ERROR] Slave SQL: Error 'Commit failed due to failure of an earlier commit on which this one depends' on query. Default database: 'test'. Query: 'COMMIT', Internal MariaDB error code: 1964
       
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 127.0.0.1
                        Master_User: root
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mysql-bin.000002
                Read_Master_Log_Pos: 326
                     Relay_Log_File: wheezy-64-relay-bin.000003
                      Relay_Log_Pos: 237548
              Relay_Master_Log_File: mysql-bin.000002
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: No
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 1964
                         Last_Error: Error 'Commit failed due to failure of an earlier commit on which this one depends' on query. Default database: 'test'. Query: 'COMMIT'
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 326
                    Relay_Log_Space: 583505
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 1964
                     Last_SQL_Error: Error 'Commit failed due to failure of an earlier commit on which this one depends' on query. Default database: 'test'. Query: 'COMMIT'
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 1
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: Current_Pos
                        Gtid_IO_Pos: 0-1-1556

      Attachments

        1. master.log.gz
          155 kB
        2. mdev6020-2.tar.gz
          276 kB
        3. mysql-bin.000001
          569 kB
        4. slave_gtid_wrong_error.err
          4 kB
        5. slave_gtid.err
          4 kB
        6. slave_no_gtid.err
          4 kB
        7. slave.log
          325 kB

        Issue Links

          Activity

            It is possible that this (the deadlock errors part) is another symptom of the same bug that causes MDEV-6067. That bug was seriously messing up the marking of transactions as read/write; possibly this could cause stuff to go wrong during commit, perhaps leaving row locks lingering and eventually causing deadlocks. But it is not sure; re-testing will be needed on a tree with MDEV-6067 fixed to be sure.

            knielsen Kristian Nielsen added a comment - It is possible that this (the deadlock errors part) is another symptom of the same bug that causes MDEV-6067 . That bug was seriously messing up the marking of transactions as read/write; possibly this could cause stuff to go wrong during commit, perhaps leaving row locks lingering and eventually causing deadlocks. But it is not sure; re-testing will be needed on a tree with MDEV-6067 fixed to be sure.

            Attached a new set (binary log, master and slave general logs, slave error log) as mdev6020-2.tar.gz. It was collected on 10.0-mdev6067 tree:

            revision-id: knielsen@knielsen-hq.org-20140411071140-xebtd0oxn0qg3gef
            date: 2014-04-11 09:11:40 +0200
            build-date: 2014-04-11 19:30:14 +0300
            revno: 4146
            branch-nick: 10.0-mdev6067

            Same test as before, same server options.

            140411 13:00:24 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.table100_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 147943, Internal MariaDB error code: 1213
            140411 13:00:24 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
            140411 13:00:24 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 146732; GTID position '0-1-392'
            140411 13:00:24 [Note] Error reading relay log event: slave SQL thread was killed
            140411 13:00:24 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
            140411 13:00:24 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
            140411 13:00:24 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 146732; GTID position '0-1-392'

            elenst Elena Stepanova added a comment - Attached a new set (binary log, master and slave general logs, slave error log) as mdev6020-2.tar.gz. It was collected on 10.0-mdev6067 tree: revision-id: knielsen@knielsen-hq.org-20140411071140-xebtd0oxn0qg3gef date: 2014-04-11 09:11:40 +0200 build-date: 2014-04-11 19:30:14 +0300 revno: 4146 branch-nick: 10.0-mdev6067 Same test as before, same server options. 140411 13:00:24 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.table100_key_pk_parts_2_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.000001, end_log_pos 147943, Internal MariaDB error code: 1213 140411 13:00:24 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213 140411 13:00:24 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 146732; GTID position '0-1-392' 140411 13:00:24 [Note] Error reading relay log event: slave SQL thread was killed 140411 13:00:24 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964 140411 13:00:24 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964 140411 13:00:24 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 146732; GTID position '0-1-392'
            knielsen Kristian Nielsen added a comment - - edited

            Ok, seems like this is what is happening:

            The binlog at the point of failure has two different insert transactions into
            the same table table100_key_pk_parts_2_int_autoinc; these two transactions are
            in the same group commit, so can run in parallel on the slave.

            Here is a snippit from InnoDB status about the deadlock:

            *** (2) HOLDS THE LOCK(S):
            TABLE LOCK table `test`.`table100_key_pk_parts_2_int_autoinc` /* Partition `p1` */ trx id 1838 lock mode AUTO-INC
            *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
            TABLE LOCK table `test`.`table100_key_pk_parts_2_int_autoinc` /* Partition `p0` */ trx id 1838 lock mode AUTO-INC waiting

            So it looks like the deadlock is on the auto-increment lock. Apparently, one
            insert transaction takes first the lock for one partition and then the other,
            while the other transaction takes the locks in the opposite order. Thus
            causing the deadlock.

            I am not sure how autoincrement locking should work with partitioning. But it
            seems somewhat unfortunate that the locks are taken and released in this
            way. Maybe one should be released before the next one is taken? This might
            also be the replication code doing something wrong, as it uses a different
            code path for row-based insertion events than normal INSERT statements.

            One can also consider if there might be problems lurking here for
            statement-based replication. If the results of the transactions really depend
            on locking the auto-increment for every partition, I wonder if serialisable
            commit order is really ensured on the master...

            So it is worth it to research if this can be fixed to not deadlock on the
            slave. Though at worst, the proposed fix for MDEV-5941 should also work to fix
            this one.

            knielsen Kristian Nielsen added a comment - - edited Ok, seems like this is what is happening: The binlog at the point of failure has two different insert transactions into the same table table100_key_pk_parts_2_int_autoinc; these two transactions are in the same group commit, so can run in parallel on the slave. Here is a snippit from InnoDB status about the deadlock: *** (2) HOLDS THE LOCK(S): TABLE LOCK table `test`.`table100_key_pk_parts_2_int_autoinc` /* Partition `p1` */ trx id 1838 lock mode AUTO-INC *** (2) WAITING FOR THIS LOCK TO BE GRANTED: TABLE LOCK table `test`.`table100_key_pk_parts_2_int_autoinc` /* Partition `p0` */ trx id 1838 lock mode AUTO-INC waiting So it looks like the deadlock is on the auto-increment lock. Apparently, one insert transaction takes first the lock for one partition and then the other, while the other transaction takes the locks in the opposite order. Thus causing the deadlock. I am not sure how autoincrement locking should work with partitioning. But it seems somewhat unfortunate that the locks are taken and released in this way. Maybe one should be released before the next one is taken? This might also be the replication code doing something wrong, as it uses a different code path for row-based insertion events than normal INSERT statements. One can also consider if there might be problems lurking here for statement-based replication. If the results of the transactions really depend on locking the auto-increment for every partition, I wonder if serialisable commit order is really ensured on the master... So it is worth it to research if this can be fixed to not deadlock on the slave. Though at worst, the proposed fix for MDEV-5941 should also work to fix this one.

            Here is a mysql-test-run test case that gives the problem (may need to be run a few times to trigger). The test requires the mysql-bin.000001 that Elena attached to be copied into mysql-test/std_data/mdev6020-mysql-bin.000001

            --source include/have_innodb.inc
            --source include/have_partition.inc
            --source include/have_binlog_format_mixed_or_row.inc
            --source include/master-slave.inc
             
            --connection slave
            --source include/stop_slave.inc
             
            --connection master
            --let $datadir= `SELECT @@datadir`
             
            --let $rpl_server_number= 1
            --source include/rpl_stop_server.inc
             
            --remove_file $datadir/master-bin.000001
            --remove_file $datadir/master-bin.state
            --copy_file $MYSQL_TEST_DIR/std_data/mdev6020-mysql-bin.000001 $datadir/master-bin.000001
             
            --let $rpl_server_number= 1
            --source include/rpl_start_server.inc
             
            --source include/wait_until_connected_again.inc
             
            --connection slave
            SET @old_engine= @@GLOBAL.default_storage_engine;
            SET GLOBAL default_storage_engine=InnoDB;
            SET @old_parallel= @@GLOBAL.slave_parallel_threads;
            SET GLOBAL slave_parallel_threads=12;
            --replace_result $SERVER_MYPORT_1 SERVER_MYPORT_1
            eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$SERVER_MYPORT_1, master_user='root', master_log_file='master-bin.000001', master_log_pos=4;
            #eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$SERVER_MYPORT_1, master_user='root', master_use_gtid=slave_pos;
            --source include/start_slave.inc
             
            --connection master
            SELECT @@gtid_binlog_pos;
            --save_master_pos
             
            --connection slave
            --sync_with_master
            SELECT @@gtid_binlog_pos;
            SELECT @@gtid_slave_pos;
             
            --source include/stop_slave.inc
            SET GLOBAL default_storage_engine= @old_engine;
            SET GLOBAL slave_parallel_threads=@old_parallel;
            SET sql_log_bin=0;
            DROP TABLE table0_int_autoinc;
            DROP TABLE table0_key_pk_parts_2_int_autoinc;
            DROP TABLE table100_int_autoinc;
            query_vertical SHOW CREATE TABLE table100_key_pk_parts_2_int_autoinc;
            DROP TABLE table100_key_pk_parts_2_int_autoinc;
            DROP TABLE table10_int_autoinc;
            DROP TABLE table10_key_pk_parts_2_int_autoinc;
            DROP TABLE table1_int_autoinc;
            DROP TABLE table1_key_pk_parts_2_int_autoinc;
            DROP TABLE table2_int_autoinc;
            DROP TABLE table2_key_pk_parts_2_int_autoinc;
            SET sql_log_bin=1;
            --source include/start_slave.inc
             
            --connection master
             
            --source include/rpl_end.inc

            knielsen Kristian Nielsen added a comment - Here is a mysql-test-run test case that gives the problem (may need to be run a few times to trigger). The test requires the mysql-bin.000001 that Elena attached to be copied into mysql-test/std_data/mdev6020-mysql-bin.000001 --source include/have_innodb.inc --source include/have_partition.inc --source include/have_binlog_format_mixed_or_row.inc --source include/master-slave.inc   --connection slave --source include/stop_slave.inc   --connection master --let $datadir= `SELECT @@datadir`   --let $rpl_server_number= 1 --source include/rpl_stop_server.inc   --remove_file $datadir/master-bin.000001 --remove_file $datadir/master-bin.state --copy_file $MYSQL_TEST_DIR/std_data/mdev6020-mysql-bin.000001 $datadir/master-bin.000001   --let $rpl_server_number= 1 --source include/rpl_start_server.inc   --source include/wait_until_connected_again.inc   --connection slave SET @old_engine= @@GLOBAL.default_storage_engine; SET GLOBAL default_storage_engine=InnoDB; SET @old_parallel= @@GLOBAL.slave_parallel_threads; SET GLOBAL slave_parallel_threads=12; --replace_result $SERVER_MYPORT_1 SERVER_MYPORT_1 eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$SERVER_MYPORT_1, master_user='root', master_log_file='master-bin.000001', master_log_pos=4; #eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$SERVER_MYPORT_1, master_user='root', master_use_gtid=slave_pos; --source include/start_slave.inc   --connection master SELECT @@gtid_binlog_pos; --save_master_pos   --connection slave --sync_with_master SELECT @@gtid_binlog_pos; SELECT @@gtid_slave_pos;   --source include/stop_slave.inc SET GLOBAL default_storage_engine= @old_engine; SET GLOBAL slave_parallel_threads=@old_parallel; SET sql_log_bin=0; DROP TABLE table0_int_autoinc; DROP TABLE table0_key_pk_parts_2_int_autoinc; DROP TABLE table100_int_autoinc; query_vertical SHOW CREATE TABLE table100_key_pk_parts_2_int_autoinc; DROP TABLE table100_key_pk_parts_2_int_autoinc; DROP TABLE table10_int_autoinc; DROP TABLE table10_key_pk_parts_2_int_autoinc; DROP TABLE table1_int_autoinc; DROP TABLE table1_key_pk_parts_2_int_autoinc; DROP TABLE table2_int_autoinc; DROP TABLE table2_key_pk_parts_2_int_autoinc; SET sql_log_bin=1; --source include/start_slave.inc   --connection master   --source include/rpl_end.inc

            Pushed to 10.0.13.

            knielsen Kristian Nielsen added a comment - Pushed to 10.0.13.

            People

              knielsen Kristian Nielsen
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.