Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34737

Replica does not try to reconnect if Primary is not reachable

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.5.26
    • None
    • Replication
    • None

    Description

      When replica fails to connect for example because the Primary is down temporarily or some connection issue happens it won't retry to connect as expected for MASTER_RETRY_COUNT times every MASTER_CONNECT_RETRY seconds.
      It will stop at first attempt.
      Easy to reproduce, just point at a ip:port with nothing listening.

      2024-08-12 14:32:48 75 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='127.0.0.1', master_port='9999', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='9999', master_log_file='', master_log_pos='4'.
      2024-08-12 14:32:51 76 [Note] Slave I/O thread: Start asynchronous replication to master 'someuser@127.0.0.1:9999' in log '' at position 4
      2024-08-12 14:32:51 77 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './fedora-relay-bin.000001' position: 4
      2024-08-12 14:32:51 76 [ERROR] Slave I/O: error connecting to master 'someuser@127.0.0.1:9999' - retry-time: 3  maximum-retries: 86400  message: Can't connect to server on '127.0.0.1' (111 "Connection refused"), Internal MariaDB error code: 2003
      root@fedora:/run/media/claudio/FedoraData/myharem/instances/10506# date
      lun 12 ago 2024, 14:33:03, CEST
      root@fedora:/run/media/claudio/FedoraData/myharem/instances/10506# tail -2 data/error.10506.log 
      2024-08-12 14:32:51 77 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './fedora-relay-bin.000001' position: 4
      2024-08-12 14:32:51 76 [ERROR] Slave I/O: error connecting to master 'someuser@127.0.0.1:9999' - retry-time: 3  maximum-retries: 86400  message: Can't connect to server on '127.0.0.1' (111 "Connection refused"), Internal MariaDB error code: 2003
      

      If you listen on the port with anything (nc -l 9999) the retry will happen.

      Googling around it seems that there are multiple scenarios reported when the Replica won't try to reconnect.

      Attachments

        Activity

          People

            Unassigned Unassigned
            claudio.nanni Claudio Nanni
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.