Stuck replication - Slave SQL thread is blocked by "Update_rows_log_event::find_row(-1)" state of executed SQL command
Environment parameters of Primary and Replica server
OS: RHEL 7.6
MariaDB version: MariaDB-server-10.4.12-1.el7
DB size: 6.1TB
Binary log format: mixed
State before issue and issue day:
- replication was running without problems several weeks
- replication was stopped for several days due to SW maintenance on Primary side
- nothing changed on Primary DB only new data were still inserted or modified
- replication started after several days
- on Primary DB there were 13 GB changes in binlogs waiting for replication
- performed replication start (start slave; )
- replication started with downloading of changes from Primary binlogs to Replica relaylogs
- first changes performed properly (replication executed several positions from relaylog)
- but after some time the replication stopped with executing of changes from relaylogs
- processlist shows that Slave_SQL command is in state: Update_rows_log_event::find_row(-1)
- stop and start slave did not help to solve issue
- kill this Slave SQL command and following start slave did not solve this issue and the same state appeared again
- executing of relaylogs stoped on binlog position 902554004
- extract from slave status command
- in relay log position 902554004 is following SQL alter table command
- size of processed table is 175MB
- at the same same when this replication was started so another replication between another two DB servers were started also with the same parameters and version and there is not problem, only DB size is 500GB for this another environment
- I found that similar issue was reported here
MDEV-20398but it should be solved since 10.4.8+ version and I am using 10.4.12
- no issue found in MariaDB log file
- I am not aware about using of any unsafe SQL command when mixed binary logging is used instead of safest row binary logging
- in first two comments below I am sending next info: full slave status, full processlist, full innodb engine status, global status, global variables and show create table info
Do you see any issue in relay log?
Do you have any tip what to check next?
Do you have any tip for the issue reason?
Could the issue be caused by some hiccup that caused issue in binlog or relaylog or in some internal record in DB about processed comand?
Is it possible that issue
MDEV-20398 is not repaired in version 10.4.12 or if I met with another reason with the same error?