Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30423

Deadlock on Replica during BACKUP STAGE BLOCK_COMMIT on XA transactions

Details

    Description

      Note: This bug fix is not complete. To get a complete fix for this issue, MDEV-35110 also needs to be fixed

      We are seeing deadlocks on slave sql thread, during the backup, it causes the slave_sql_thread to stuck. Affected version is 10.6.11

      show processlist;

      | 3994 | system user | | bankfrm | Slave_worker | 47515 | Waiting for prior transaction to commit | XA COMMIT ... | 0.000 |
      | 3996 | system user | | NULL | Slave_worker | 47515 | Waiting for prior transaction to commit | NULL | 0.000 |
      | 3995 | system user | | NULL | Slave_worker | 47515 | Waiting for prior transaction to commit | NULL | 0.000 |
      | 3997 | system user | | NULL | Slave_worker | 47515 | Waiting for prior transaction to commit | NULL | 0.000 |
      | 3991 | system user | | NULL | Slave_SQL | 44523 | Waiting for room in worker thread event queue | NULL | 0.000 |
      | 5114 | ...... | 10.93.99.158:52012 | NULL | Query | 0 | Optimizing | SELECT Event_schema, Event_name FROM information_schema.EVENTS WHERE Status = 'ENABLED' | 0.000 |
      | 715112 | ..oper | localhost | NULL | Query | 47515 | Waiting for backup lock | BACKUP STAGE BLOCK_COMMIT | 0.000 |
      | 724545 | ....frm | 10.93.97.49:44948 | ....frm | Query | 2291 | Waiting for backup lock | XA ROLLBACK ... | 0.000 |
      | 751381 | ....frm | 10.93.97.50:46208 | ....frm | Query | 1310 | Waiting for backup lock | XA ROLLBACK ... | 0.000 |
      | 752581 | myoper | localhost | NULL | Query | 0 | starting | show processlist | 0.000 |
      

      Show replica status\G
       
           Connection_name: 
                     Slave_SQL_State: Slave has read all relay log; waiting for more updates
                      Slave_IO_State: Waiting for master to send event
                         Master_Host: 10.93.99.101
                         Master_User: ......
                         Master_Port: 6603
                       Connect_Retry: 10
                     Master_Log_File: bin_log.001019
                 Read_Master_Log_Pos: 30985295
                      Relay_Log_File: relay_log.000131
                       Relay_Log_Pos: 2570503
               Relay_Master_Log_File: bin_log.001019
                    Slave_IO_Running: Yes
                   Slave_SQL_Running: Yes
                     Replicate_Do_DB: 
                 Replicate_Ignore_DB: 
                  Replicate_Do_Table: 
              Replicate_Ignore_Table: 
             Replicate_Wild_Do_Table: 
         Replicate_Wild_Ignore_Table: 
                          Last_Errno: 0
                          Last_Error: 
                        Skip_Counter: 0
                 Exec_Master_Log_Pos: 2570206
                     Relay_Log_Space: 31275705
                     Until_Condition: None
                      Until_Log_File: 
                       Until_Log_Pos: 0
                  Master_SSL_Allowed: No
                  Master_SSL_CA_File: 
                  Master_SSL_CA_Path: 
                     Master_SSL_Cert: 
                   Master_SSL_Cipher: 
                      Master_SSL_Key: 
               Seconds_Behind_Master: 162
       Master_SSL_Verify_Server_Cert: No
                       Last_IO_Errno: 0
                       Last_IO_Error: 
                      Last_SQL_Errno: 0
                      Last_SQL_Error: 
         Replicate_Ignore_Server_Ids: 
                    Master_Server_Id: 2
                      Master_SSL_Crl: 
                  Master_SSL_Crlpath: 
                          Using_Gtid: Slave_Pos
                         Gtid_IO_Pos: 1-2-6301093730
             Replicate_Do_Domain_Ids: 
         Replicate_Ignore_Domain_Ids: 
                       Parallel_Mode: optimistic
                           SQL_Delay: 0
                 SQL_Remaining_Delay: NULL
             Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
                    Slave_DDL_Groups: 46
      Slave_Non_Transactional_Groups: 19
          Slave_Transactional_Groups: 23249343
                Retried_transactions: 1
                  Max_relay_log_size: 268435456
                Executed_log_entries: 86984839
           Slave_received_heartbeats: 78647
              Slave_heartbeat_period: 5.000
                      Gtid_Slave_Pos: 1-2-6301025975
      

      +------------------------------------------+
      | WhoLocksWho                              |
      +------------------------------------------+
      | Thread 715112 IS LOCKED BY Thread 715112 |
      | Thread 715112 IS LOCKED BY Thread 3994   |
      | Thread 715112 IS LOCKED BY Thread 3993   |
      | Thread 3993 IS LOCKED BY Thread 715112   |
      | Thread 3993 IS LOCKED BY Thread 3994     |
      | Thread 3993 IS LOCKED BY Thread 3993     |
      +------------------------------------------+
      

      Attachments

        Issue Links

          Activity

            Elkin Andrei Elkin added a comment -

            Howdy Brandon.

            The patch is pushed {{012c8120399 HEAD -> bb-10.5-andrei }} having passed
            only regression tests.
            Please take on review sooner while I'll be watching BB processing.

            Cheers,

            Andrei

            Elkin Andrei Elkin added a comment - Howdy Brandon. The patch is pushed {{012c8120399 HEAD -> bb-10.5-andrei }} having passed only regression tests. Please take on review sooner while I'll be watching BB processing. Cheers, Andrei

            Patch is approved. Discussion of findings took place via Slack.

            Thanks, Andrei!

            bnestere Brandon Nesterenko added a comment - Patch is approved. Discussion of findings took place via Slack. Thanks, Andrei!

            I re-opened this as there are still bug in this code

            monty Michael Widenius added a comment - I re-opened this as there are still bug in this code

            Solves some issues that the previous patch did not.

            monty Michael Widenius added a comment - Solves some issues that the previous patch did not.

            See MDEV-35110 as the next part of this issue

            monty Michael Widenius added a comment - See MDEV-35110 as the next part of this issue

            People

              monty Michael Widenius
              pandi.gurusamy Pandikrishnan Gurusamy
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.