Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17515

GTID Replication in optimistic mode deadlock

Details

    Description

      Hi,

      Setup is like this
      Relay(MySQL5.5-MIXED-log_slave_updates)
      ->Relay(MDB10.1.36 MIXED-log_slave_updates )
      -> Slave(MDB10.1.36 MIXED-log_slave_updates/optimistic)
      -> Slave(MDB10.1.36 MIXED-log_slave_updates/conservative)

      • We get deadlock on a slave in optimistic.
      • Other slaves in conservative do not deadlock.
      • Impossible to stop slave when deadlock

      The scenario that trigger such deadlock was some optimize tables on the first relay and mysql restart after optimize

      • the restart let relay master without events from master

       
       Slave_IO_Running: Connecting
                  Slave_SQL_Running: Yes
                      Last_IO_Errno: 2003
                      Last_IO_Error: error reconnecting to master 'replication@x.x.x.x:3306' - retry-time: 60  maximum-retries: 86400  message: Can't connect to MySQL server on 'x.x.x.x' (111 "Connection refused")
      

      The deadlock:

      +--------+--------------+----------------------+-------------+---------+--------+-----------------------------------------------+---------------------------+----------+
      | Id     | User         | Host                 | db          | Command | Time   | State                                         | Info                      | Progress |
      +--------+--------------+----------------------+-------------+---------+--------+-----------------------------------------------+---------------------------+----------+
      | 190042 | system user  |                      | NULL        | Connect | 196362 | Waiting for master to send event              | NULL                      |    0.000 |
      | 190043 | system user  |                      | NULL        | Connect | 196362 | Waiting for work from SQL thread              | NULL                      |    0.000 |
      | 190044 | system user  |                      | NULL        | Connect | 196362 | Waiting for work from SQL thread              | NULL                      |    0.000 |
      | 190045 | system user  |                      | NULL        | Connect | 196362 | Waiting for work from SQL thread              | NULL                      |    0.000 |
      | 190046 | system user  |                      | NULL        | Connect | 196362 | Waiting for work from SQL thread              | NULL                      |    0.000 |
      | 190047 | system user  |                      | NULL        | Connect | 196362 | Waiting for work from SQL thread              | NULL                      |    0.000 |
      | 190048 | system user  |                      | NULL        | Connect | 196362 | Waiting for work from SQL thread              | NULL                      |    0.000 |
      | 190049 | system user  |                      | tsce_unedic | Connect | 171856 | Waiting for table metadata lock               | OPTIMIZE TABLE `requetes` |    0.000 |
      | 190050 | system user  |                      | NULL        | Connect | 171856 | Waiting for prior transaction to commit       | NULL                      |    0.000 |
      | 190051 | system user  |                      | NULL        | Connect | 177872 | Waiting for room in worker thread event queue | NULL                      |    0.000 |
      | 307368 | root         | localhost            | NULL        | Killed  | 116880 | Killing slave                                 | stop slave                |    0.000 |
      | 434731 | root         | localhost            | NULL        | Killed  |  30480 | Killing slave                                 | stop slave                |    0.000 |
      | 474959 | root         | localhost            | NULL        | Killed  |   3167 | Killing slave                                 | stop slave                |    0.000 |
      | 479253 | root         | localhost            | NULL        | Query   |      0 | init                                          | show processlist          |    0.000 |
      | 479422 | proxysql     | ?:38626 | NULL        | Sleep   |      0 |                                               | NULL                      |    0.000 |
      | 479451 | proxysql     | ?:42168 | NULL        | Sleep   |      0 |                                               | NULL                      |    0.000 |
      | 479490 | netdata      | localhost            | NULL        | Sleep   |      0 |                                               | NULL                      |    0.000 |
      | 479518 | repl_manager | 192.168.185.44:33222 | NULL        | Sleep   |      1 |                                               | NULL                      |    0.000 |
      +--------+--------------+----------------------+-------------+---------+--------+-----------------------------------------------+---------------------------+----------+
      18 rows in set (0.00 sec)
      

      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: x.x.x.x
                        Master_User: repl
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mysql-bin.000035
                Read_Master_Log_Pos: 800365460
                     Relay_Log_File: relay-bin.000004
                      Relay_Log_Pos: 100745890
              Relay_Master_Log_File: mysql-bin.000032
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: Yes
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 0
                         Last_Error: 
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 100745602
                    Relay_Log_Space: 4809515065
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: 0
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 0
                     Last_SQL_Error: 
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 21
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: Slave_Pos
                        Gtid_IO_Pos: 0-21-28343538
            Replicate_Do_Domain_Ids: 
        Replicate_Ignore_Domain_Ids: 
                      Parallel_Mode: optimistic
      
      

      Attachments

        Issue Links

          Activity

            A better testcase would be to grep the binlog from OPTIMIZE TABLE to check that it is marked as DDL.
            Even if the included testcase would somehow not hang, it's still a bug that OPTIMIZE TABLE is not marked as DDL.
            Anything that takes table locks or other locks without implementing the parallel replication deadlock detection-and-handling mechanism must be marked as DDL (or non-transactional) as appropriate.

            • Kristian.
            knielsen Kristian Nielsen added a comment - A better testcase would be to grep the binlog from OPTIMIZE TABLE to check that it is marked as DDL. Even if the included testcase would somehow not hang, it's still a bug that OPTIMIZE TABLE is not marked as DDL. Anything that takes table locks or other locks without implementing the parallel replication deadlock detection-and-handling mechanism must be marked as DDL (or non-transactional) as appropriate. Kristian.
            sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - Hello Elkin Can you please review the changes for MDEV-17515 Patch: https://github.com/MariaDB/server/commit/70377dc9be5205286806e01365f33cbfd32e1d27 BuildBot: http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-sujatha Thank you.

            Hello Elkin

            I made some minor changes on top of the above mentioned commit. Please find the following commit
            which includes patches for MDEV-17515 and MDEV-22530. MDEV-22530 changes are implemented on top of MDEV-17515.
            https://github.com/MariaDB/server/commit/44eb746c226d3eabe93d476790bb0c6bb30106b2

            Please review top two commits:

            1. commit 44eb746c226d3eabe93d476790bb0c6bb30106b2
            MDEV-22530: Aborting OPTIMIZE TABLE still logs in binary log and replicates to the Slave server.

            2. commit 502a6b9b3ce119f507cb8ccc9ec034a9953ad3cc
            MDEV-17515: GTID Replication in optimistic mode deadlock

            sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - Hello Elkin I made some minor changes on top of the above mentioned commit. Please find the following commit which includes patches for MDEV-17515 and MDEV-22530 . MDEV-22530 changes are implemented on top of MDEV-17515 . https://github.com/MariaDB/server/commit/44eb746c226d3eabe93d476790bb0c6bb30106b2 Please review top two commits: 1. commit 44eb746c226d3eabe93d476790bb0c6bb30106b2 MDEV-22530 : Aborting OPTIMIZE TABLE still logs in binary log and replicates to the Slave server. 2. commit 502a6b9b3ce119f507cb8ccc9ec034a9953ad3cc MDEV-17515 : GTID Replication in optimistic mode deadlock

            Okay to push , one comment needs to addressed

            sachin.setiya.007 Sachin Setiya (Inactive) added a comment - Okay to push , one comment needs to addressed

            MDEV-17515 and MDEV-22530 are related. MDEV-22530's fix is implemented on MDEV-17515's fix.

            Hence both are tested and pushed together.

            10.3: http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.3-sujatha
            ====
            commit 07da206ccd21080351363a2c1f080320367c52bb (HEAD -> 10.3, origin/bb-10.3-sujatha)
            MDEV-22530: Aborting OPTIMIZE TABLE still logs in binary log and replicates to the Slave server.

            10.3 cherry-pick testing.

            commit e683e8c010bd2ffa9415857828e89f94b9ac0550
            MDEV-17515: GTID Replication in optimistic mode deadlock

            10.3 cherry-pick testing.

            10.4: http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.4-sujatha
            ====
            commit 0f34165ec1427eccbf18db9f71744537fe6a1354 (HEAD -> 10.4, origin/bb-10.4-sujatha)
            MDEV-22530: Aborting OPTIMIZE TABLE still logs in binary log and replicates to the Slave server.

            10.4 cherry-pick testing.

            commit 4169e6571c342b57a3ce89e7ae45328c06beb788
            MDEV-17515: GTID Replication in optimistic mode deadlock

            10.4 cherry-pick testing.

            Merge conflicts addressed.
            Result file re-recorded.

            10.5: http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.5-sujatha
            ====
            commit 0f9598100aa37f48e8eb93669396e9fac37c493a (HEAD -> 10.5, origin/bb-10.5-sujatha)
            MDEV-22530: Aborting OPTIMIZE TABLE still logs in binary log and replicates to the Slave server.

            10.5 cherry-pick testing.

            No Merge conflicts

            commit f0c220898fbe8f58393d386aa0cb6da6b7cedf9f
            MDEV-17515: GTID Replication in optimistic mode deadlock

            10.5 cherry-pick testing.

            Moved changes to log_event_server.cc

            sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - MDEV-17515 and MDEV-22530 are related. MDEV-22530 's fix is implemented on MDEV-17515 's fix. Hence both are tested and pushed together. 10.3: http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.3-sujatha ==== commit 07da206ccd21080351363a2c1f080320367c52bb (HEAD -> 10.3, origin/bb-10.3-sujatha) MDEV-22530 : Aborting OPTIMIZE TABLE still logs in binary log and replicates to the Slave server. 10.3 cherry-pick testing. commit e683e8c010bd2ffa9415857828e89f94b9ac0550 MDEV-17515 : GTID Replication in optimistic mode deadlock 10.3 cherry-pick testing. 10.4: http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.4-sujatha ==== commit 0f34165ec1427eccbf18db9f71744537fe6a1354 (HEAD -> 10.4, origin/bb-10.4-sujatha) MDEV-22530 : Aborting OPTIMIZE TABLE still logs in binary log and replicates to the Slave server. 10.4 cherry-pick testing. commit 4169e6571c342b57a3ce89e7ae45328c06beb788 MDEV-17515 : GTID Replication in optimistic mode deadlock 10.4 cherry-pick testing. Merge conflicts addressed. Result file re-recorded. 10.5: http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.5-sujatha ==== commit 0f9598100aa37f48e8eb93669396e9fac37c493a (HEAD -> 10.5, origin/bb-10.5-sujatha) MDEV-22530 : Aborting OPTIMIZE TABLE still logs in binary log and replicates to the Slave server. 10.5 cherry-pick testing. No Merge conflicts commit f0c220898fbe8f58393d386aa0cb6da6b7cedf9f MDEV-17515 : GTID Replication in optimistic mode deadlock 10.5 cherry-pick testing. Moved changes to log_event_server.cc

            People

              sujatha.sivakumar Sujatha Sivakumar (Inactive)
              stephane@skysql.com VAROQUI Stephane
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.