Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34135

Replication slave is stuck without any error

    XMLWordPrintable

Details

    Description

      I have an issue very similar to MDEV-8311. As it was never solved due to lack of activity, and it is almost 10 years old, I took the liberty to open a new issue.

      My replica stopped replicating, and it is stuck.
      One slave worker is stuck with the status "Waiting for prior transaction to start commit".
      The slave itself is stuck with the status "Waiting for room in worker thread event queue".

      I cannot "stop slave;" as it is also stuck.
      I can manually kill the slave PIDs. But then if I "start slave", it is stuck directly.
      After manually killing the slave PIDs, I can do:

      SET GLOBAL sql_slave_skip_counter = 1;
      start slave;
      

      Then it starts properly ("Write_rows_log_event::write_row(-1) on table `my_table`"), and get stuck again a few minutes later.

      Please note that I have 2 other replicas running on the same master with the exact same config.
      Please also note that those 3 replicas have been running like that for months/years.
      Potentially interesting context: most of my tables are using RocksDB as an engine. The rest use InnoDB.


      mariadb --version
      mariadb  Ver 15.1 Distrib 10.6.16-MariaDB, for debian-linux-gnu (x86_64) using  EditLine wrapper
      

      Client output when started:

      Server version: 10.6.16-MariaDB-0ubuntu0.22.04.1 Ubuntu 22.04
      

      MariaDB [my_db]> show slave status\G
      *************************** 1. row ***************************
                      Slave_IO_State: Waiting for master to send event
                         Master_Host: 192.168.1.154
                         Master_User: my_user
                         Master_Port: 3306
                       Connect_Retry: 60
                     Master_Log_File: master10-bin.000291
                 Read_Master_Log_Pos: 107402253
                      Relay_Log_File: mysqld-relay-bin.000552
                       Relay_Log_Pos: 12124351
               Relay_Master_Log_File: master10-bin.000265
                    Slave_IO_Running: Yes
                   Slave_SQL_Running: Yes
                     Replicate_Do_DB: 
                 Replicate_Ignore_DB: 
                  Replicate_Do_Table: 
              Replicate_Ignore_Table: 
             Replicate_Wild_Do_Table: 
         Replicate_Wild_Ignore_Table: 
                          Last_Errno: 0
                          Last_Error: 
                        Skip_Counter: 0
                 Exec_Master_Log_Pos: 417796162
                     Relay_Log_Space: 10110622341
                     Until_Condition: None
                      Until_Log_File: 
                       Until_Log_Pos: 0
                  Master_SSL_Allowed: No
                  Master_SSL_CA_File: 
                  Master_SSL_CA_Path: 
                     Master_SSL_Cert: 
                   Master_SSL_Cipher: 
                      Master_SSL_Key: 
               Seconds_Behind_Master: 38815
       Master_SSL_Verify_Server_Cert: No
                       Last_IO_Errno: 0
                       Last_IO_Error: 
                      Last_SQL_Errno: 0
                      Last_SQL_Error: 
         Replicate_Ignore_Server_Ids: 
                    Master_Server_Id: 10
                      Master_SSL_Crl: 
                  Master_SSL_Crlpath: 
                          Using_Gtid: No
                         Gtid_IO_Pos: 
             Replicate_Do_Domain_Ids: 
         Replicate_Ignore_Domain_Ids: 
                       Parallel_Mode: conservative
                           SQL_Delay: 0
                 SQL_Remaining_Delay: NULL
             Slave_SQL_Running_State: Waiting for room in worker thread event queue
                    Slave_DDL_Groups: 0
      Slave_Non_Transactional_Groups: 0
          Slave_Transactional_Groups: 464
      1 row in set (0.025 sec)
      

      MariaDB [my_db]> show processlist;
      +------+-------------+---------------------------------------------------+--------------+--------------+------+-----------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
      | Id   | User        | Host                                              | db           | Command      | Time | State                                         | Info                                                                                                 | Progress |
      +------+-------------+---------------------------------------------------+--------------+--------------+------+-----------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
      | 1544 | system user |                                                   | NULL         | Slave_IO     | 1263 | Waiting for master to send event              | NULL                                                                                                 |    0.000 |
      | 1548 | system user |                                                   | NULL         | Slave_worker | 1187 | Waiting for work from SQL thread              | NULL                                                                                                 |    0.000 |
      | 1546 | system user |                                                   | NULL         | Slave_worker | 1187 | Waiting for prior transaction to start commit | NULL                                                                                                 |    0.000 |
      | 1549 | system user |                                                   | NULL         | Slave_worker | 1187 | Waiting for work from SQL thread              | NULL                                                                                                 |    0.000 |
      | 1547 | system user |                                                   | NULL         | Slave_worker | 1187 | Waiting for work from SQL thread              | NULL                                                                                                 |    0.000 |
      | 1550 | system user |                                                   | NULL         | Slave_worker | 1187 | Waiting for work from SQL thread              | NULL                                                                                                 |    0.000 |
      | 1545 | system user |                                                   | NULL         | Slave_SQL    | 1187 | Waiting for room in worker thread event queue | NULL                                                                                                 |    0.000 |
      [...] Normal queries [...]
      +------+-------------+---------------------------------------------------+--------------+--------------+------+-----------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
      10 rows in set (0.024 sec)
      

      Attachments

        Issue Links

          Activity

            People

              knielsen Kristian Nielsen
              Vongo Adrian Delmarre
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.