[MDEV-35304] More Details for Ongoing IO Thread Re-connection Attempts - Jira

Details

Type: New Feature
Status: In Testing (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: 12.0.0
Component/s: Replication
Labels:
None

Description

When the IO thread is attempting to reconnect to a primary, for as long as the last error continues, no updates are provided until the configured value for --master-retry-count has been exhausted.

It would be good to somehow provide insight into the reconnection status as to (A) how many retries have been attempted so far and (B) out of how many configured.

A few options for this:

From Elkin

1. print both counters in the error log message, which also must include something like
'the next N identical reports are skipped', where N may grow faster than linearly (that is to follow some good logging pattern), or better
2. report (A) and (B) within the text field of SSS [SHOW SLAVE STATUS], like
Slave_IO_Running: Re-connecting nth time out of N max
A version of p.2 would be to add yet another line to SSS report, which I don't like as it keeps swelling the whole report while the SSS handling itself is somewhat computationally costly.

As well as

3. extend the SHOW PROCESSLIST IO thread description on retry, which already has more detailed information about the state (i.e. what the slave was doing last), where the current options are

Reconnecting after a failed registration on master

Reconnecting after a failed binlog dump request

Reconnecting after a failed master event read

With options (A) and (B) from Andrei's comment above. I imagine that this could be done in 10.5 as well.

Attachments

Issue Links

relates to

MDEV-25674 No SQL variable for master_retry_count setting

Stalled

Activity

Ascending order - Click to sort in descending order

View 11 older comments

Brandon Nesterenko added a comment - 2025-02-21 18:52

Thanks ParadoxV5! I've approved with one last minor note. Now passing the review to susil.behera to review the test cases.

Brandon Nesterenko added a comment - 2025-02-21 18:52 Thanks ParadoxV5 ! I've approved with one last minor note. Now passing the review to susil.behera to review the test cases.

Susil Behera added a comment - 2025-02-24 18:19

Added my review comments. ParadoxV5please see if those make sense.

Susil Behera added a comment - 2025-02-24 18:19 Added my review comments. ParadoxV5 please see if those make sense.

Susil Behera added a comment - 2025-02-26 16:38

ParadoxV5I've approved. Accompanied tests are looking good. MASTER_DELAY and topology 1->2>3 can be covered at QA stage.

Susil Behera added a comment - 2025-02-26 16:38 ParadoxV5 I've approved. Accompanied tests are looking good. MASTER_DELAY and topology 1->2>3 can be covered at QA stage.

Brandon Nesterenko added a comment - 2025-02-27 16:20 - edited

Hi ParadoxV5, sorry for the confusion, susil.behera only reviewed the MTR cases you wrote, he still needs to do his own QA testing once the preview branches have been released. So I'm re-opening this and setting it to be in-testing.

As the patch was already pushed to the main branch, we will just keep it there, and then if susil.behera has any findings that we can't address before the release (for some reason), we will just revert it.

Brandon Nesterenko added a comment - 2025-02-27 16:20 - edited Hi ParadoxV5 , sorry for the confusion, susil.behera only reviewed the MTR cases you wrote, he still needs to do his own QA testing once the preview branches have been released. So I'm re-opening this and setting it to be in-testing. As the patch was already pushed to the main branch, we will just keep it there, and then if susil.behera has any findings that we can't address before the release (for some reason), we will just revert it.

Jimmy Hú added a comment - 2 days ago

Ouch. multi_source.connects_tried occasionally fails in CI to what looks like unexpected sleep timing.

Jimmy Hú added a comment - 2 days ago Ouch. multi_source.connects_tried occasionally fails in CI to what looks like unexpected sleep timing.

People

Assignee:: Susil Behera

Reporter:: Brandon Nesterenko

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 2024-10-31 16:33

Updated:: 2 days ago 16:35

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server