Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39334

"Waiting for the slave SQL thread to free enough relay log space" Causes silent replication failure

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.11.14
    • None
    • Replication
    • OS: Debian 12
    • Can result in unexpected behaviour

    Description

      In a MariaDB Primary - Replica setup our replica server stopped receiving new data from the primary DB despite both the Slave_IO_Running and Slave_SQL_Running variables reporting "Yes" as their status.

      The issue was noticed when we saw that "SHOW SLAVE HOSTS;" on the primary only reported one replica instead of the expected two.

      In an attempt to fix the issue, the "STOP SLAVE" and "START SLAVE" commands were ran on the replica. When the replica status was checked after restart the replica reported the error appended below.

      We believe that this replica has been effectively stopped by this issue for multiple months, but due to the fact that MariaDB never fully failed, or reported the "Slave_IO_Running" and "Slave_SQL_Running" variables as "No", our monitoring software failed to detect an issue.

      We are unsure what specific steps to follow to replicate the issue.

      MariaDB [(none)]> show slave hosts;
      +-----------+----------------+------+-----------+
      | Server_id | Host           | Port | Master_id |
      +-----------+----------------+------+-----------+
      |                2 | db2 | 3306 |         1 |
      +-----------+----------------+------+-----------+
      1 row in set (0.000 sec)
      

      DB3 (Replica) "SHOW SLAVE STATUS;" before/after restart output:

      MariaDB [(none)]> show slave status\G
      *************************** 1. row ***************************
                      Slave_IO_State: Waiting for the slave SQL thread to free enough relay log space
                         Master_Host: db1.domain.com
                         Master_User: rep_user
                         Master_Port: 3306
                       Connect_Retry: 60
                     Master_Log_File: bin-log.000108
                 Read_Master_Log_Pos: 169830505
                      Relay_Log_File: bin-relay.000017
                       Relay_Log_Pos: 169830760
               Relay_Master_Log_File: bin-log.000108
                    Slave_IO_Running: Yes
                   Slave_SQL_Running: Yes
                     Replicate_Do_DB:
                 Replicate_Ignore_DB:
                  Replicate_Do_Table:
              Replicate_Ignore_Table:
             Replicate_Wild_Do_Table:
         Replicate_Wild_Ignore_Table:
                          Last_Errno: 0
                          Last_Error:
                        Skip_Counter: 0
                 Exec_Master_Log_Pos: 169830463
                     Relay_Log_Space: 1077914767
                     Until_Condition: None
                      Until_Log_File:
                       Until_Log_Pos: 0
                  Master_SSL_Allowed: No
                  Master_SSL_CA_File:
                  Master_SSL_CA_Path:
                     Master_SSL_Cert:
                   Master_SSL_Cipher:
                      Master_SSL_Key:
               Seconds_Behind_Master: 0
       Master_SSL_Verify_Server_Cert: No
                       Last_IO_Errno: 0
                       Last_IO_Error:
                      Last_SQL_Errno: 0
                      Last_SQL_Error:
         Replicate_Ignore_Server_Ids:
                    Master_Server_Id: 1
                      Master_SSL_Crl:
                  Master_SSL_Crlpath:
                          Using_Gtid: Slave_Pos
                         Gtid_IO_Pos: 0-1-424511
             Replicate_Do_Domain_Ids:
         Replicate_Ignore_Domain_Ids:
                       Parallel_Mode: optimistic
                           SQL_Delay: 0
                 SQL_Remaining_Delay: NULL
             Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
                    Slave_DDL_Groups: 3441
      Slave_Non_Transactional_Groups: 96
          Slave_Transactional_Groups: 105912
                Replicate_Rewrite_DB:
      1 row in set (0.000 sec)
       
      MariaDB [(none)]> stop slave;
       
      Query OK, 0 rows affected (5 min 7.683 sec)
       
      MariaDB [(none)]> start slave;
      Query OK, 0 rows affected (0.059 sec)
       
      MariaDB [(none)]> show slave status\G
      *************************** 1. row ***************************
                      Slave_IO_State:
                         Master_Host: db1.domain.com
                         Master_User: rep_user
                         Master_Port: 3306
                       Connect_Retry: 60
                     Master_Log_File: bin-log.000108
                 Read_Master_Log_Pos: 169830463
                      Relay_Log_File: bin-relay.000001
                       Relay_Log_Pos: 4
               Relay_Master_Log_File: bin-log.000108
                    Slave_IO_Running: No
                   Slave_SQL_Running: Yes
                     Replicate_Do_DB:
                 Replicate_Ignore_DB:
                  Replicate_Do_Table:
              Replicate_Ignore_Table:
             Replicate_Wild_Do_Table:
         Replicate_Wild_Ignore_Table:
                          Last_Errno: 0
                          Last_Error:
                        Skip_Counter: 0
                 Exec_Master_Log_Pos: 169830463
                     Relay_Log_Space: 296
                     Until_Condition: None
                      Until_Log_File:
                       Until_Log_Pos: 0
                  Master_SSL_Allowed: No
                  Master_SSL_CA_File:
                  Master_SSL_CA_Path:
                     Master_SSL_Cert:
                   Master_SSL_Cipher:
                      Master_SSL_Key:
               Seconds_Behind_Master: NULL
       Master_SSL_Verify_Server_Cert: No
                       Last_IO_Errno: 1236
                       Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find GTID state requested by slave in any binlog files. Probably the slave state is too old and required binlog files have been purged.'
                      Last_SQL_Errno: 0
                      Last_SQL_Error:
         Replicate_Ignore_Server_Ids:
                    Master_Server_Id: 1
                      Master_SSL_Crl:
                  Master_SSL_Crlpath:
                          Using_Gtid: Slave_Pos
                         Gtid_IO_Pos: 0-1-424511
             Replicate_Do_Domain_Ids:
         Replicate_Ignore_Domain_Ids:
                       Parallel_Mode: optimistic
                           SQL_Delay: 0
                 SQL_Remaining_Delay: NULL
             Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
                    Slave_DDL_Groups: 3441
      Slave_Non_Transactional_Groups: 96
          Slave_Transactional_Groups: 105912
                Replicate_Rewrite_DB:
      1 row in set (0.000 sec)
       
      MariaDB [(none)]> 
      

      db3.err, truncated for brevity:

      2026-01-30 21:54:35 356 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013)
      2026-01-30 21:54:35 356 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000100' at position 804997841; GTID position '0-1-424259', GTID event skip 48998
      2026-01-30 21:54:36 356 [Note] Slave IO thread is reconnected to receive Gtid_log_event 0-1-424260. It is to skip 48998 already received events including the gtid one
      2026-01-30 22:00:50 356 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013)
      2026-01-30 22:00:50 356 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000104' at position 1077960884; GTID position '0-1-424421', GTID event skip 33015
      2026-01-30 22:00:50 356 [Note] Slave IO thread is reconnected to receive Gtid_log_event 0-1-424422. It is to skip 33015 already received events including the gtid one
      2026-01-30 22:27:50 357 [Note] Error reading relay log event: slave SQL thread was killed
      2026-01-30 22:27:50 357 [Note] Slave SQL thread exiting, replication stopped in log 'bin-log.000104' at position 163705474; GTID position '0-1-424421', master: db1.domain.com:3306
      2026-01-30 22:32:09 356 [ERROR] Slave I/O thread aborted while waiting for relay log space
      2026-01-30 22:32:09 356 [Note] Slave I/O thread exiting, read up to log 'bin-log.000104', position 163705516; GTID position 0-1-424421, master db1.domain.com:3306
      2026-01-30 22:32:15 5212 [Note] Slave I/O thread: Start asynchronous replication to master 'rep_user@db1.domain.com:3306' in log 'bin-log.000104' at position 163705474
      2026-01-30 22:32:15 5213 [Note] Slave SQL thread initialized, starting replication in log 'bin-log.000104' at position 163705474, relay log '/db/mysql/log-relay/bin-relay.000001' position: 4; GTID position '0-1-424421'
      2026-01-30 22:32:15 5212 [Note] Slave I/O thread: connected to master 'rep_user@db1.domain.com:3306',replication starts at GTID position '0-1-424421'
      2026-01-30 22:36:22 5212 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013)
      2026-01-30 22:36:22 5212 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000105' at position 145331453; GTID position '0-1-424433', GTID event skip 6627
      2026-01-30 22:36:22 5212 [Note] Slave IO thread is reconnected to receive Gtid_log_event 0-1-424434. It is to skip 6627 already received events including the gtid one
      2026-01-30 22:37:55 5212 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013)
      2026-01-30 22:37:55 5212 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000108' at position 479305461; GTID position '0-1-424511', GTID event skip 9403
      2026-01-30 22:37:55 5212 [Note] Slave IO thread is reconnected to receive Gtid_log_event 0-1-424512. It is to skip 9403 already received events including the gtid one
      2026-01-30 22:44:21 5212 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013)
      2026-01-30 22:44:21 5212 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000108' at position 1077911864; GTID position '0-1-424511', GTID event skip 27933
      2026-01-30 22:44:21 5212 [Note] Slave IO thread is reconnected to receive Gtid_log_event 0-1-424512. It is to skip 27933 already received events including the gtid one
       
      ...
       
      2026-04-14 14:09:45 5213 [Note] Error reading relay log event: slave SQL thread was killed
      2026-04-14 14:09:45 5213 [Note] Slave SQL thread exiting, replication stopped in log 'bin-log.000108' at position 169830463; GTID position '0-1-424511', master: db1.domain.com:3306
      2026-04-14 14:14:52 5212 [ERROR] Slave I/O thread aborted while waiting for relay log space
      2026-04-14 14:14:52 5212 [Note] Slave I/O thread exiting, read up to log 'bin-log.000108', position 169830505; GTID position 0-1-424511, master db1.domain.com:3306
      2026-04-14 14:15:01 29825 [Note] Slave I/O thread: Start asynchronous replication to master 'rep_user@db1.domain.com:3306' in log 'bin-log.000108' at position 169830463
      2026-04-14 14:15:01 29826 [Note] Slave SQL thread initialized, starting replication in log 'bin-log.000108' at position 169830463, relay log '/db/mysql/log-relay/bin-relay.000001' position: 4; GTID position '0-1-424511'
      2026-04-14 14:15:01 29825 [Note] Slave I/O thread: connected to master 'rep_user@db1.domain.com:3306',replication starts at GTID position '0-1-424511'
      2026-04-14 14:15:01 29825 [ERROR] Error reading packet from server: Could not find GTID state requested by slave in any binlog files. Probably the slave state is too old and required binlog files have been purged. (server_errno=1236)
      2026-04-14 14:15:01 29825 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Could not find GTID state requested by slave in any binlog files. Probably the slave state is too old and required binlog files have been purged.', Internal MariaDB error code: 1236
      2026-04-14 14:15:01 29825 [Note] Slave I/O thread exiting, read up to log 'bin-log.000108', position 169830463; GTID position 0-1-424511, master db1.domain.com:3306
      

      MariaDB configuration file for DB3:

      [mysqld]
      datadir                          = '/db/mysql/data'
      socket                           = '/db/mysql/mysql.sock'
      tmpdir                           = '/db/mysql/tmp'
      bind_address                                               = '0.0.0.0'
      port                                                                            = '3306'
       
      server-id                        = '3'
      gtid_strict_mode                 = '1'
      gtid_ignore_duplicates           = '1'
      log_slave_updates                = '1'
      relay-log-space-limit            = '1G'
      relay_log_purge                  = '1'
      read_only                        = '1'
      skip-slave-start
      relay-log                        = '/db/mysql/log-relay/bin-relay'
      report_host                      = 'db3'
       
      log-error
      log-warnings
      log_bin                          = '/db/mysql/log-bin/bin-log'
      expire_logs_days                 = '14'
      log_bin_trust_function_creators  = '1'
      max_binlog_size                  = '250M'
      binlog_format                    = 'ROW'
      sync_binlog                      = '1'
      sql_mode                                                                        = ''
       
      thread_cache_size                = '100'
      max_connections                  = '1000'
      tmp_table_size                   = '100M'
      max_heap_table_size              = '100M'
      max_allowed_packet               = '1G'
      query_cache_size                 = '0'
      query_cache_type                 = '0'
      table_definition_cache           = '50000'
      table_open_cache                 = '50000'
      open_files_limit                 = '100000'
      wait_timeout                     = '3600'
       
      user                             = 'mysql'
       
      innodb-buffer-pool-instances     = '4'
      innodb_file_per_table            = '1'
      innodb_data_home_dir                    = '/db/mysql/innodb'
      innodb_log_group_home_dir        = '/db/mysql/innodb'
      innodb_buffer_pool_size          = '1G'
      innodb_log_file_size             = '1G'
      innodb_log_files_in_group        = '2'
      innodb_thread_concurrency        = '8'
      innodb_flush_method              = 'O_DIRECT'
      innodb_flush_log_at_trx_commit   = '1'
      innodb_io_capacity               = '2100'
      innodb_open_files                = '50000'
       
      concurrent_insert = 2
       
      # Full Text Search
      ft_min_word_len                  = '3'
      ft_max_word_len                  = '35'
       
      # Encryption
      loose-innodb-encryption-threads        = '4'
      loose-innodb-encryption-rotate-key-age = '1'
       
      # UTF-8 Support
      init_connect='SET collation_connection = utf8_unicode_ci'
      init_connect='SET NAMES utf8'
      character-set-server             = utf8
      collation-server                 = utf8_unicode_ci
      skip-character-set-client-handshake
       
      skip-name-resolve
       
       
       
      !include /etc/mysql/mariadb.conf.d/file_key_management.cnf
       
      [client]
      socket = '/db/mysql/mysql.sock'
      socket = /db/mysql/mysql.sock
      
      

      Attachments

        Issue Links

          Activity

            People

              Elkin Andrei Elkin
              harold_garn Henry G
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.