[MDEV-31572] STOP SLAVE hangs on 10.3.39 Created: 2023-06-28  Updated: 2023-07-28  Resolved: 2023-07-28

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.3.39
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Michaël de groot Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-9573 'Stop slave' hangs on replication slave Closed

 Description   

Hi,

I am runnign the newest MariaDB 10.3. STOP SLAVE hangs for 5 mintues now. The server is not very busy.

I am running Multi-source replication, from two primaries. Both primaries have been shut down.
The primary of the default master connection (without a name) was shut down several months ago, the primary of the other connection was shut down more recently.

Both replication streams have a REPLICATE_IGNORE_DB set.

This was the state of the 2 connections before I issued the command:

 
[8:20 PM] Michaël de Groot
 
MariaDB [(none)]> show all slaves status\G
 
*************************** 1. row ***************************
 
               Connection_name: 
 
               Slave_SQL_State: Waiting for the next event in relay log
 
                Slave_IO_State: Connecting to master
 
                   Master_Host: 10.10.A.B
 
                   Master_User: ansi_repl
 
                   Master_Port: 3306
 
                 Connect_Retry: 60
 
               Master_Log_File: mysql-bin.015523
 
           Read_Master_Log_Pos: 882782146
 
                Relay_Log_File: prod-d1-mariadb-01-relay-bin.004518
 
                 Relay_Log_Pos: 938754441
 
         Relay_Master_Log_File: mysql-bin.015457
 
              Slave_IO_Running: Connecting
 
             Slave_SQL_Running: Yes
 
               Replicate_Do_DB: 
 
           Replicate_Ignore_DB: mysql,some_schema
 
            Replicate_Do_Table: 
 
        Replicate_Ignore_Table: 
 
       Replicate_Wild_Do_Table: 
 
   Replicate_Wild_Ignore_Table: 
 
                    Last_Errno: 0
 
                    Last_Error: 
 
                  Skip_Counter: 0
 
           Exec_Master_Log_Pos: 938754146
 
               Relay_Log_Space: 71381577192
 
               Until_Condition: None
 
                Until_Log_File: 
 
                 Until_Log_Pos: 0
 
            Master_SSL_Allowed: No
 
            Master_SSL_CA_File: 
 
            Master_SSL_CA_Path: 
 
               Master_SSL_Cert: 
 
             Master_SSL_Cipher: 
 
                Master_SSL_Key: 
 
         Seconds_Behind_Master: NULL
 
Master_SSL_Verify_Server_Cert: No
 
                 Last_IO_Errno: 2003
 
                 Last_IO_Error: error connecting to master 'ansi_repl@10.10.A.B.:3306' - retry-time: 60  maximum-retries: 86400  message: Can't connect to MySQL server on '10.10.A.B.' (113 "No route to host")
 
                Last_SQL_Errno: 0
 
                Last_SQL_Error: 
 
   Replicate_Ignore_Server_Ids: 
 
              Master_Server_Id: 0
 
                Master_SSL_Crl: 
 
            Master_SSL_Crlpath: 
 
                    Using_Gtid: No
 
                   Gtid_IO_Pos: 
 
       Replicate_Do_Domain_Ids: 
 
   Replicate_Ignore_Domain_Ids: 
 
                 Parallel_Mode: conservative
 
                     SQL_Delay: 0
 
           SQL_Remaining_Delay: NULL
 
       Slave_SQL_Running_State: Waiting for the next event in relay log
 
              Slave_DDL_Groups: 0
 
Slave_Non_Transactional_Groups: 0
 
    Slave_Transactional_Groups: 0
 
          Retried_transactions: 0
 
            Max_relay_log_size: 1073741824
 
          Executed_log_entries: 0
 
     Slave_received_heartbeats: 0
 
        Slave_heartbeat_period: 30.000
 
                Gtid_Slave_Pos: 0-1245586661-56708316
 
*************************** 2. row ***************************
 
               Connection_name: cluster_migration
 
               Slave_SQL_State: Slave has read all relay log; waiting for the slave I/O thread to update it
 
                Slave_IO_State: Connecting to master
 
                   Master_Host: 10.10.C.D
 
                   Master_User: cluster_migr_repl
 
                   Master_Port: 3306
 
                 Connect_Retry: 60
 
               Master_Log_File: mysql-cluster-01-bin.000044
 
           Read_Master_Log_Pos: 723615742
 
                Relay_Log_File: prod-d1-mariadb-01-relay-bin-cluster_migration.000050
 
                 Relay_Log_Pos: 4
 
         Relay_Master_Log_File: mysql-cluster-01-bin.000044
 
              Slave_IO_Running: Connecting
 
             Slave_SQL_Running: Yes
 
               Replicate_Do_DB: 
 
           Replicate_Ignore_DB: mysql
 
            Replicate_Do_Table: 
 
        Replicate_Ignore_Table: 
 
       Replicate_Wild_Do_Table: 
 
   Replicate_Wild_Ignore_Table: 
 
                    Last_Errno: 0
 
                    Last_Error: 
 
                  Skip_Counter: 0
 
           Exec_Master_Log_Pos: 723615742
 
               Relay_Log_Space: 256
 
               Until_Condition: None
 
                Until_Log_File: 
 
                 Until_Log_Pos: 0
 
            Master_SSL_Allowed: No
 
            Master_SSL_CA_File: 
 
            Master_SSL_CA_Path: 
 
               Master_SSL_Cert: 
 
             Master_SSL_Cipher: 
 
                Master_SSL_Key: 
 
         Seconds_Behind_Master: NULL
 
Master_SSL_Verify_Server_Cert: No
 
                 Last_IO_Errno: 2003
 
                 Last_IO_Error: error connecting to master 'cluster_migr_repl@10.10.C.D:3306' - retry-time: 60  maximum-retries: 86400  message: Can't connect to MySQL server on '10.10.C.D' (113 "No route to host")
 
                Last_SQL_Errno: 0
 
                Last_SQL_Error: 
 
   Replicate_Ignore_Server_Ids: 
 
              Master_Server_Id: 0
 
                Master_SSL_Crl: 
 
            Master_SSL_Crlpath: 
 
                    Using_Gtid: No
 
                   Gtid_IO_Pos: 
 
       Replicate_Do_Domain_Ids: 
 
   Replicate_Ignore_Domain_Ids: 
 
                 Parallel_Mode: conservative
 
                     SQL_Delay: 0
 
           SQL_Remaining_Delay: NULL
 
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
 
              Slave_DDL_Groups: 0
 
Slave_Non_Transactional_Groups: 0
 
    Slave_Transactional_Groups: 0
 
          Retried_transactions: 0
 
            Max_relay_log_size: 1073741824
 
          Executed_log_entries: 2
 
     Slave_received_heartbeats: 0
 
        Slave_heartbeat_period: 30.000
 
                Gtid_Slave_Pos: 0-1245586661-56708316

From the processlist I learned there was 1 SQL thread still running. The thread ID was 363357. I do not know from which one of the two master connections this was. I killed that thread and it dissapeared. This did not release the STOP SLAVE command.

Perhaps the issue is that the SQL thread had stopped already (before the STOP SLAVE command), and that STOP SLAVE runs into this issue because of this.

I tried to stop the other master connection, this worked without issues but did not release or return the first 'STOP SLAVE' command.

I believe the server must be restarted to release this 'stop slave' issue, but I leave it runnign in case you want to gather some more information from this system.

Earlier, this system refused to stop. Perhaps this has the same root cause (as stopping mariadb stops the replica connections in one of the steps).



 Comments   
Comment by Sergei Golubchik [ 2023-06-28 ]

10.3 has already reached its EOL.

Do you have this issue with 10.4 or any later version?

Generated at Thu Feb 08 10:24:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.