[MDEV-14028] No way to see that I/O thread is blocked by hitting the relay log size limit until it is aborted by timeout Created: 2017-10-09 Updated: 2018-10-30 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Fix Version/s: | None |
| Type: | Task | Priority: | Minor |
| Reporter: | Nilnandan Joshi | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Description |
|
When I/O thread is reaching to relay log size limit (relay_log_space_limit), It is getting blocked and aborted by timeout but we can't see this from the error log or show slave status that what happened. On slave, error log shows nothing and show slave status says Slave_IO_Running: Yes but On master, error log shows that aborted connection of slave user and show slave hosts doesn't show that slave. We need better explanation in slave error log. How to reproduce : Step1: with any replication setup, set relay_log_space_limit=128K in my.cnf on any slave.
Step4: check show slave status, it says,
In relay log dir, it will stuck with the size,
but nothing in error log. On master error log, you can see msg like
and show slave hosts shows only one slave.
|
| Comments |
| Comment by Elena Stepanova [ 2017-10-09 ] | |||||||||||||||||
|
The status can be clearly seen in SHOW SLAVE STATUS, where it belongs:
and in the processlist:
The error log does not show it because it's not an error in itself.
All in all, I don't see a bug here, please confirm there is still something expected to be fixed. | |||||||||||||||||
| Comment by Nilnandan Joshi [ 2017-10-09 ] | |||||||||||||||||
|
Hi Elena, Thanks for checking. Yes, we can see in show processlist of slave that why I/O thread is waiting but I don't see this msg in error log on either master or slave.
Instead on master, I see this
and on slave I see this
What version you are checking? I'm testing with 10.2.6-MariaDB-log. | |||||||||||||||||
| Comment by Elena Stepanova [ 2017-10-09 ] | |||||||||||||||||
|
It is all the same with 10.2.6, it's not the point. You don't see an error message about "waiting for relay log space" because it is not the error, it is a status which is expected to be resolved on its own, when the slave SQL thread executes relayed events and relay logs can be purged. If anything, you need to figure out why your SQL thread is not running and thus not processing relay logs. "Aborted connection" messages have nothing to do with relay space. IO thread can sometimes lose connection to the master and reconnect automatically. | |||||||||||||||||
| Comment by Nilnandan Joshi [ 2017-11-13 ] | |||||||||||||||||
|
Hi Elena, Agree with your points but I think it would be still useful to have a way to get a warning in the slave's error log when wait for limit happens. I don't expect many same warnings, but when we rich the limit for the first time after some normal work, warning would be useful. Maybe at log_verbocity = 2 or something, but it would be really useful if we can find out from log that what happened and when. (consider might be as a feature request?) Thanks. |