[MDEV-13579] Incorrect value in Relay_Log_Space Created: 2017-08-18  Updated: 2017-10-20

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.1.22
Fix Version/s: 10.1

Type: Bug Priority: Major
Reporter: Sergey Chernomorets Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: None

Attachments: PNG File dbs01-03.png     PNG File sphm03.png    

 Description   

Current slave status report relay log space is 1.7Gb (Relay_Log_Space: 1722503481), but relay-bin files summary usage is only 646Mb.

show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: dbs01-03
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: dbs01-03-bin.002200
          Read_Master_Log_Pos: 646662540
               Relay_Log_File: relay-bin.005515
                Relay_Log_Pos: 646632193
        Relay_Master_Log_File: dbs01-03-bin.002200
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table:<some tables>
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: sys.date_time,mysql.%
  Replicate_Wild_Ignore_Table: tmp.%
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 646631902
              Relay_Log_Space: 1722503481
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 144
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: No
                  Gtid_IO_Pos: 
      Replicate_Do_Domain_Ids: 
  Replicate_Ignore_Domain_Ids: 
                Parallel_Mode: conservative

$ ls -l /base/binlog/relay-bin.*
-rw-rw---- 1 mysql mysql       886 Aug 18 12:18 /base/binlog/relay-bin.005513
-rw-rw---- 1 mysql mysql       338 Aug 18 12:18 /base/binlog/relay-bin.005514
-rw-rw---- 1 mysql mysql 646511803 Aug 18 12:40 /base/binlog/relay-bin.005515
-rw-rw---- 1 mysql mysql        90 Aug 18 12:18 /base/binlog/relay-bin.index
 
$ cat /base/binlog/relay-bin.index 
/base/binlog/relay-bin.005513
/base/binlog/relay-bin.005514
/base/binlog/relay-bin.005515

graph from monitoring:
Relay log space should be lesser 1Gb
Replication delay is zero - slave not outdated



 Comments   
Comment by Sergey Chernomorets [ 2017-08-18 ]

Another case: server dbs01-03 with version 10.1.24
Sometimes relay_log_space does not decrement when binlog switch to next file. And some time later after binlog switching relay_log_space come back to actual value:

dotted line - relay log space
filled peaks - replication lag in minutes

Comment by Elena Stepanova [ 2017-08-25 ]

Could you please attach your cnf file(s) from the affected server? Or, if you already did it in another issue, please point at them.

Are you using multi-source replication (named slaves, in addition to the default one)?

Comment by Sergey Chernomorets [ 2017-08-25 ]

We are not using multi-source replication.

my.cnf:

[mysqld]
datadir=/base/mysql
socket=/base/mysql/mysql.sock
character-set-server=cp1251
stack-trace
core-file
 
ignore_builtin_innodb
plugin_load=innodb=ha_innodb.so
 
 
[mysqld-safe]
core-file-size=unlimited
 
[server]
tmpdir=/base/tmp
datadir=/base/mysql
slave_transaction_retries=10000
 
concurrent_insert=2
 
max_connections=500
max_user_connections=490
max_connect_errors=50
open-files-limit=20000
 
join_buffer_size=2M
key_buffer_size=128M
myisam_sort_buffer_size=64M
read_buffer_size=128K
read_rnd_buffer_size=32M
sort_buffer_size=32M
 
max_allowed_packet=32M
max_heap_table_size=512M
max_tmp_tables=1000
table_cache=10000
tmp_table_size=512M
 
query_cache_limit=0
query_cache_size=0
query_cache_min_res_unit=1536
 
thread_cache_size=400
 
innodb_buffer_pool_size=54G
transaction-isolation=READ-COMMITTED
innodb_flush_log_at_trx_commit=0
innodb_flush_method=O_DIRECT
innodb_data_file_path=ibdata1:128M:autoextend
innodb_log_buffer_size = 128M
innodb_log_file_size = 128M
innodb_log_files_in_group = 2
 
innodb_file_per_table=1
innodb_file_format=barracuda
innodb_io_capacity=5000
innodb_read_io_threads=12
innodb_write_io_threads=12
 
innodb_buffer_pool_instances=12
thread_handling=pool-of-threads
thread_pool_size=16
slave_parallel_mode=conservative
slave_parallel_threads=4
slave_compressed_protocol=on
 
long_query_time=3
log-warnings
back_log=500
 
interactive_timeout=320
wait_timeout=320
net_read_timeout=320
net_write_timeout=420
 
server-id=98
log-bin=/base/binlog/bin
log_slave_updates
relay-log=/base/binlog/relay-bin
skip-name-resolve
expire_logs_days=1
binlog_format=ROW
 
replicate-wild_do-table=sys.date_time
replicate-wild_do-table=mysql.%
 
replicate-do-table=region.base_education
replicate-do-table=sj_common.spelling_dictionary
replicate-do-table=sj_common.regions
replicate-do-table=sj_common.positions_catalog
replicate-do-table=users_ru.reg_users
replicate-do-table=users_ru.reg_users_info
replicate-do-table=users_ru.reg_users_aliases
replicate-do-table=users_ru.reg_users_tags
replicate-do-table=users_ru.tags
 
replicate-wild-ignore-table=tmp.%
 
init-connect='SET NAMES cp1251'
 
performance_schema='off'
[mysql]
socket=/base/mysql/mysql.sock
 
[mysqldump]
socket=/base/mysql/mysql.sock
 
[mysql.server]
datadir=/base/mysql
socket=/base/mysql/mysql.sock
 
[mysqladmin]
socket=/base/mysql/mysql.sock
 
[mysqlshow]
socket=/base/mysql/mysql.sock

Comment by Elena Stepanova [ 2017-09-19 ]

I hasn't been able to reproduce it so far. I assume it is some sort of a race (or another circumstance-dependent) condition which causes a failure to update the value. Elkin, could you please take a look, maybe expert code inspection will do more good here? If you can't find the exact reason, maybe you'll at least give me some hints on where possible weak spots are.

Generated at Thu Feb 08 08:06:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.