Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30780

parallel slave hangs after hit an error

    XMLWordPrintable

Details

    Description

      After a parallel worker thread hits an error that must be error-stopping the slave
      show slave status does display the error along with YES of the slave running status, e.g

      show slave status\G
       
      *************************** 1. row ***************************
       
                      Slave_IO_State: Waiting for master to send event
       
                         Master_Host: 172.31.15.61
       
                         Master_User: db02replication
       
                         Master_Port: 3306
       
                       Connect_Retry: 60
       
                     Master_Log_File: mysql-bin.028940
       
                 Read_Master_Log_Pos: 1050157656
       
                      Relay_Log_File: relay-bin.000134
       
                       Relay_Log_Pos: 964684321
       
               Relay_Master_Log_File: mysql-bin.028938
       
                    Slave_IO_Running: Yes
       
                   Slave_SQL_Running: Yes
       
                     Replicate_Do_DB: 
       
                 Replicate_Ignore_DB: 
       
                  Replicate_Do_Table: 
       
              Replicate_Ignore_Table: 
       
             Replicate_Wild_Do_Table: 
       
         Replicate_Wild_Ignore_Table: 
       
                          Last_Errno: 1062
       
                          Last_Error: Could not execute Write_rows_v1 event on table pingtree.campaignOutboundDupeEmail; Duplicate entry '877-damien_cunningham88@outlook.com' for key 'codePrimaryEmail', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.028938, end_log_pos 964684486
       
                        Skip_Counter: 0
       
                 Exec_Master_Log_Pos: 964684022
      

      Slave threads however instead of expected exiting may hang like

      +------+--------------+--------------------+------+--------------+-------+-----------------------------------------------+------------------+----------+
      | Id   | User         | Host               | db   | Command      | Time  | State                                         | Info             | Progress |
      +------+--------------+--------------------+------+--------------+-------+-----------------------------------------------+------------------+----------+
      |    5 | system user  |                    | NULL | Slave_IO     | 51160 | Waiting for master to send event              | NULL             |    0.000 |
      |   19 | mariadbadmin | 172.31.15.18:58548 | NULL | Sleep        |     5 |                                               | NULL             |    0.000 |
      |   61 | mariadbadmin | 172.31.15.18:46002 | NULL | Sleep        |    10 |                                               | NULL             |    0.000 |
      | 2394 | system user  |                    | NULL | Slave_worker | 50852 | closing tables                                | NULL             |    0.000 |
      | 2395 | system user  |                    | NULL | Slave_worker | 50852 | Waiting for prior transaction to start commit | NULL             |    0.000 |
      | 2396 | system user  |                    | NULL | Slave_worker | 50852 | Waiting for prior transaction to start commit | NULL             |    0.000 |
      | 2397 | system user  |                    | NULL | Slave_worker | 50852 | Waiting for prior transaction to start commit | NULL             |    0.000 |
      | 2398 | system user  |                    | NULL | Slave_worker | 50852 | Waiting for prior transaction to start commit | NULL             |    0.000 |
      | 2399 | system user  |                    | NULL | Slave_worker | 50852 | Waiting for prior transaction to start commit | NULL             |    0.000 |
      | 2400 | system user  |                    | NULL | Slave_worker | 50852 | Waiting for prior transaction to start commit | NULL             |    0.000 |
      | 2401 | system user  |                    | NULL | Slave_worker | 50852 | Waiting for prior transaction to start commit | NULL             |    0.000 |
      | 2402 | system user  |                    | NULL | Slave_worker | 50852 | Waiting for prior transaction to start commit | NULL             |    0.000 |
      | 2403 | system user  |                    | NULL | Slave_worker | 50852 | Waiting for prior transaction to start commit | NULL             |    0.000 |
      | 2404 | system user  |                    | NULL | Slave_worker | 50852 | Waiting for prior transaction to start commit | NULL             |    0.000 |
      | 2405 | system user  |                    | NULL | Slave_worker | 50852 | Waiting for prior transaction to start commit | NULL             |    0.000 |
      | 2393 | system user  |                    | NULL | Slave_SQL    | 50860 | Waiting for room in worker thread event queue | NULL             |    0.000 |
      

      Slave_SQL may also hang in a different state.

      Upon analysis it turned out that closing tables worker got entrapped in endless looping
      in mark_start_commit_inner across already garbage-collected items including rgi->gco itself.
      The reason of the belated access is identified as possible out-of-order group committing
      in the error branch.

      The issue applies to both the conservative and optimistic modes.
      A patch, to be committed soon, fixes the case to reinforce group_commit_orderer-based order for errored-out workers.

      Attachments

        Issue Links

          Activity

            People

              Elkin Andrei Elkin
              Elkin Andrei Elkin
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.