Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35304

More Details for Ongoing IO Thread Re-connection Attempts

Details

    • New Feature
    • Status: In Testing (View Workflow)
    • Major
    • Resolution: Unresolved
    • 12.0.0
    • Replication
    • None

    Description

      When the IO thread is attempting to reconnect to a primary, for as long as the last error continues, no updates are provided until the configured value for --master-retry-count has been exhausted.

      It would be good to somehow provide insight into the reconnection status as to (A) how many retries have been attempted so far and (B) out of how many configured.

      A few options for this:

      From Elkin

      1. print both counters in the error log message, which also must include something like
      'the next N identical reports are skipped', where N may grow faster than linearly (that is to follow some good logging pattern), or better
      2. report (A) and (B) within the text field of SSS [SHOW SLAVE STATUS], like

      Slave_IO_Running: Re-connecting nth time out of N max
      

      A version of p.2 would be to add yet another line to SSS report, which I don't like as it keeps swelling the whole report while the SSS handling itself is somewhat computationally costly.

      As well as

      3. extend the SHOW PROCESSLIST IO thread description on retry, which already has more detailed information about the state (i.e. what the slave was doing last), where the current options are

      • Reconnecting after a failed registration on master
      • Reconnecting after a failed binlog dump request
      • Reconnecting after a failed master event read

      With options (A) and (B) from Andrei's comment above. I imagine that this could be done in 10.5 as well.

      Attachments

        Issue Links

          Activity

            Thanks ParadoxV5! I've approved with one last minor note. Now passing the review to susil.behera to review the test cases.

            bnestere Brandon Nesterenko added a comment - Thanks ParadoxV5 ! I've approved with one last minor note. Now passing the review to susil.behera to review the test cases.
            susil.behera Susil Behera added a comment -

            Added my review comments. ParadoxV5please see if those make sense.

            susil.behera Susil Behera added a comment - Added my review comments. ParadoxV5 please see if those make sense.
            susil.behera Susil Behera added a comment -

            ParadoxV5I've approved. Accompanied tests are looking good. MASTER_DELAY and topology 1->2>3 can be covered at QA stage.

            susil.behera Susil Behera added a comment - ParadoxV5 I've approved. Accompanied tests are looking good. MASTER_DELAY and topology 1->2>3 can be covered at QA stage.
            bnestere Brandon Nesterenko added a comment - - edited

            Hi ParadoxV5, sorry for the confusion, susil.behera only reviewed the MTR cases you wrote, he still needs to do his own QA testing once the preview branches have been released. So I'm re-opening this and setting it to be in-testing.

            As the patch was already pushed to the main branch, we will just keep it there, and then if susil.behera has any findings that we can't address before the release (for some reason), we will just revert it.

            bnestere Brandon Nesterenko added a comment - - edited Hi ParadoxV5 , sorry for the confusion, susil.behera only reviewed the MTR cases you wrote, he still needs to do his own QA testing once the preview branches have been released. So I'm re-opening this and setting it to be in-testing. As the patch was already pushed to the main branch, we will just keep it there, and then if susil.behera has any findings that we can't address before the release (for some reason), we will just revert it.
            ParadoxV5 Jimmy Hú added a comment -

            Ouch. multi_source.connects_tried occasionally fails in CI to what looks like unexpected sleep timing.

            ParadoxV5 Jimmy Hú added a comment - Ouch. multi_source.connects_tried occasionally fails in CI to what looks like unexpected sleep timing.

            People

              susil.behera Susil Behera
              bnestere Brandon Nesterenko
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.