Details
-
Bug
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.6.16
-
Ubuntu 22.04
Description
I have an issue very similar to MDEV-8311. As it was never solved due to lack of activity, and it is almost 10 years old, I took the liberty to open a new issue.
My replica stopped replicating, and it is stuck.
One slave worker is stuck with the status "Waiting for prior transaction to start commit".
The slave itself is stuck with the status "Waiting for room in worker thread event queue".
I cannot "stop slave;" as it is also stuck.
I can manually kill the slave PIDs. But then if I "start slave", it is stuck directly.
After manually killing the slave PIDs, I can do:
SET GLOBAL sql_slave_skip_counter = 1;
|
start slave;
|
Then it starts properly ("Write_rows_log_event::write_row(-1) on table `my_table`"), and get stuck again a few minutes later.
Please note that I have 2 other replicas running on the same master with the exact same config.
Please also note that those 3 replicas have been running like that for months/years.
Potentially interesting context: most of my tables are using RocksDB as an engine. The rest use InnoDB.
mariadb --version
|
mariadb Ver 15.1 Distrib 10.6.16-MariaDB, for debian-linux-gnu (x86_64) using EditLine wrapper
|
Client output when started:
Server version: 10.6.16-MariaDB-0ubuntu0.22.04.1 Ubuntu 22.04
|
MariaDB [my_db]> show slave status\G
|
*************************** 1. row ***************************
|
Slave_IO_State: Waiting for master to send event
|
Master_Host: 192.168.1.154
|
Master_User: my_user
|
Master_Port: 3306
|
Connect_Retry: 60
|
Master_Log_File: master10-bin.000291
|
Read_Master_Log_Pos: 107402253
|
Relay_Log_File: mysqld-relay-bin.000552
|
Relay_Log_Pos: 12124351
|
Relay_Master_Log_File: master10-bin.000265
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: Yes
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 0
|
Last_Error:
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 417796162
|
Relay_Log_Space: 10110622341
|
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0
|
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: 38815
|
Master_SSL_Verify_Server_Cert: No
|
Last_IO_Errno: 0
|
Last_IO_Error:
|
Last_SQL_Errno: 0
|
Last_SQL_Error:
|
Replicate_Ignore_Server_Ids:
|
Master_Server_Id: 10
|
Master_SSL_Crl:
|
Master_SSL_Crlpath:
|
Using_Gtid: No
|
Gtid_IO_Pos:
|
Replicate_Do_Domain_Ids:
|
Replicate_Ignore_Domain_Ids:
|
Parallel_Mode: conservative
|
SQL_Delay: 0
|
SQL_Remaining_Delay: NULL
|
Slave_SQL_Running_State: Waiting for room in worker thread event queue
|
Slave_DDL_Groups: 0
|
Slave_Non_Transactional_Groups: 0
|
Slave_Transactional_Groups: 464
|
1 row in set (0.025 sec)
|
MariaDB [my_db]> show processlist;
|
+------+-------------+---------------------------------------------------+--------------+--------------+------+-----------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+------+-------------+---------------------------------------------------+--------------+--------------+------+-----------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| 1544 | system user | | NULL | Slave_IO | 1263 | Waiting for master to send event | NULL | 0.000 |
|
| 1548 | system user | | NULL | Slave_worker | 1187 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 1546 | system user | | NULL | Slave_worker | 1187 | Waiting for prior transaction to start commit | NULL | 0.000 |
|
| 1549 | system user | | NULL | Slave_worker | 1187 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 1547 | system user | | NULL | Slave_worker | 1187 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 1550 | system user | | NULL | Slave_worker | 1187 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 1545 | system user | | NULL | Slave_SQL | 1187 | Waiting for room in worker thread event queue | NULL | 0.000 |
|
[...] Normal queries [...]
|
+------+-------------+---------------------------------------------------+--------------+--------------+------+-----------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
10 rows in set (0.024 sec)
|
Attachments
Issue Links
- relates to
-
MDEV-8311 Replication slave is stuck without any error
- Closed