[MDEV-14028] No way to see that I/O thread is blocked by hitting the relay log size limit until it is aborted by timeout Created: 2017-10-09  Updated: 2018-10-30

Status: Open
Project: MariaDB Server
Component/s: Replication
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Nilnandan Joshi Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None


 Description   

When I/O thread is reaching to relay log size limit (relay_log_space_limit), It is getting blocked and aborted by timeout but we can't see this from the error log or show slave status that what happened. On slave, error log shows nothing and show slave status says Slave_IO_Running: Yes but On master, error log shows that aborted connection of slave user and show slave hosts doesn't show that slave. We need better explanation in slave error log.

How to reproduce :

Step1: with any replication setup, set relay_log_space_limit=128K in my.cnf on any slave.
Step2: restart slave and then stop slave. (here, I've one master and two slaves)
Step3: create any table and insert records till relay log filled up to 128K

MariaDB [test]> create table t1 (id int, name varchar(10), city varchar(10));
Query OK, 0 rows affected (0.04 sec)
 
MariaDB [test]> insert into t1 values (1, 'nilnandan', 'ahmedabad');
Query OK, 1 row affected (0.01 sec)
 
MariaDB [test]> insert into t1 select * from t1;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0
 
...
 
MariaDB [test]> insert into t1 select * from t1;
Query OK, 65536 rows affected (0.26 sec)
Records: 65536  Duplicates: 0  Warnings: 0

Step4: check show slave status, it says,

            Slave_IO_Running: Yes
            Slave_SQL_Running: No

In relay log dir, it will stuck with the size,

-rw-rw----. 1 nil nil   132105 Jul  3 19:04 mysql-relay.000009

but nothing in error log. On master error log, you can see msg like

2017-07-03 19:05:31 140123990644480 [Warning] Aborted connection 13 to db: 'unconnected' user: 'rsandbox' host: 'localhost' (Got an error writing communication packets)}}

and show slave hosts shows only one slave.

MariaDB [test]> show slave hosts;
+-----------+----------+-------+-----------+
| Server_id | Host     | Port  | Master_id |
+-----------+----------+-------+-----------+
|       102 | SBslave2 | 13258 |         1 |
+-----------+----------+-------+-----------+
1 row in set (0.00 sec)}}



 Comments   
Comment by Elena Stepanova [ 2017-10-09 ]

The status can be clearly seen in SHOW SLAVE STATUS, where it belongs:

Slave_IO_State	Waiting for the slave SQL thread to free enough relay log space

and in the processlist:

+----+-------------+-----------------+------+----------+------+-----------------------------------------------------------------+------------------+----------+
| Id | User        | Host            | db   | Command  | Time | State                                                           | Info             | Progress |
+----+-------------+-----------------+------+----------+------+-----------------------------------------------------------------+------------------+----------+
|  4 | system user |                 | NULL | Daemon   | NULL | InnoDB purge worker                                             | NULL             |    0.000 |
|  3 | system user |                 | NULL | Daemon   | NULL | InnoDB purge worker                                             | NULL             |    0.000 |
|  1 | system user |                 | NULL | Daemon   | NULL | InnoDB purge coordinator                                        | NULL             |    0.000 |
|  2 | system user |                 | NULL | Daemon   | NULL | InnoDB purge worker                                             | NULL             |    0.000 |
|  5 | system user |                 | NULL | Daemon   | NULL | InnoDB shutdown handler                                         | NULL             |    0.000 |
| 10 | root        | localhost:53551 | test | Sleep    |  123 |                                                                 | NULL             |    0.000 |
| 11 | root        | localhost:53552 | test | Sleep    |  123 |                                                                 | NULL             |    0.000 |
| 12 | system user |                 | NULL | Slave_IO |  123 | Waiting for the slave SQL thread to free enough relay log space | NULL             |    0.000 |
| 14 | root        | localhost:53558 | test | Sleep    |  117 |                                                                 | NULL             |    0.000 |
| 15 | root        | localhost:53559 | test | Sleep    |  122 |                                                                 | NULL             |    0.000 |
| 16 | root        | localhost:53582 | NULL | Query    |    0 | init                                                            | show processlist |    0.000 |
+----+-------------+-----------------+------+----------+------+-----------------------------------------------------------------+------------------+----------+

The error log does not show it because it's not an error in itself.
The aborted connection in the master log is most likely unrelated, if there are no other messages around it. When IO thread is indeed aborted while waiting for the relay space, there is a corresponding error message in the log:

2017-10-09 14:29:25 139678720165632 [ERROR] Slave I/O thread aborted while waiting for relay log space

All in all, I don't see a bug here, please confirm there is still something expected to be fixed.

Comment by Nilnandan Joshi [ 2017-10-09 ]

Hi Elena,

Thanks for checking. Yes, we can see in show processlist of slave that why I/O thread is waiting but I don't see this msg in error log on either master or slave.

2017-10-09 14:29:25 139678720165632 [ERROR] Slave I/O thread aborted while waiting for relay log space

Instead on master, I see this

2017-07-03 19:05:31 140123990644480 [Warning] Aborted connection 13 to db: 'unconnected' user: 'rsandbox' host: 'localhost' (Got an error writing communication packets)
2017-07-03 21:33:24 140123990644480 [Note] Start binlog_dump to slave_server(101), pos(mysql-bin.000015, 4826836)
2017-07-03 21:34:05 140123990644480 [Warning] Aborted connection 15 to db: 'unconnected' user: 'rsandbox' host: 'localhost' (Got an error writing communication packets)

and on slave I see this

2017-07-03 21:33:24 140365114197760 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)

What version you are checking? I'm testing with 10.2.6-MariaDB-log.

Comment by Elena Stepanova [ 2017-10-09 ]

It is all the same with 10.2.6, it's not the point.

You don't see an error message about "waiting for relay log space" because it is not the error, it is a status which is expected to be resolved on its own, when the slave SQL thread executes relayed events and relay logs can be purged. If anything, you need to figure out why your SQL thread is not running and thus not processing relay logs.
You will get the error message about the I/O thread when/if it is really aborted while being in this state, e.g. when you run STOP SLAVE.

"Aborted connection" messages have nothing to do with relay space. IO thread can sometimes lose connection to the master and reconnect automatically.

Comment by Nilnandan Joshi [ 2017-11-13 ]

Hi Elena,

Agree with your points but I think it would be still useful to have a way to get a warning in the slave's error log when wait for limit happens. I don't expect many same warnings, but when we rich the limit for the first time after some normal work, warning would be useful. Maybe at log_verbocity = 2 or something, but it would be really useful if we can find out from log that what happened and when. (consider might be as a feature request?)

Thanks.

Generated at Thu Feb 08 08:10:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.