Details
-
Task
-
Status: Confirmed (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
When the IO thread is attempting to reconnect to a primary, for as long as the last error continues, no updates are provided until the configured value for --master-retry-count has been exhausted.
It would be good to somehow provide insight into the reconnection status as to how many retries have been attempted so far.
A few options for this:
From Elkin
1. print both counters in the error log message, which also must include something like
'the next N identical reports are skipped', where N may grow faster than linearly (that is to follow some good logging pattern), or better2. report (A) and (B) within the text field of SSS, like
_Slave_IO_Running: Re-connecting nth time out of N max
A version of p.2 would be to add yet another line to SSS report, which I don't like as it keeps swelling the whole report while the SSS handling itself is somewhat computationally costly.
As well as
3. extend the SHOW PROCESSLIST IO thread description on retry, which already has more detailed information about the state (i.e. what the slave was doing last), where the current options are
Reconnecting after a failed registration on master
Reconnecting after a failed binlog dump request
Reconnecting after a failed master event read
With options (A) and (B) from Andrei's comment above. I imagine that this could be done in 10.5 as well.