Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34737

Replica does not try to reconnect if Primary is not reachable

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.5.26
    • 10.6, 10.11
    • Replication
    • None

    Description

      When replica fails to connect for example because the Primary is down temporarily or some connection issue happens it won't retry to connect as expected for MASTER_RETRY_COUNT times every MASTER_CONNECT_RETRY seconds.
      It will stop at first attempt.
      Easy to reproduce, just point at a ip:port with nothing listening.

      2024-08-12 14:32:48 75 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='127.0.0.1', master_port='9999', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='9999', master_log_file='', master_log_pos='4'.
      2024-08-12 14:32:51 76 [Note] Slave I/O thread: Start asynchronous replication to master 'someuser@127.0.0.1:9999' in log '' at position 4
      2024-08-12 14:32:51 77 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './fedora-relay-bin.000001' position: 4
      2024-08-12 14:32:51 76 [ERROR] Slave I/O: error connecting to master 'someuser@127.0.0.1:9999' - retry-time: 3  maximum-retries: 86400  message: Can't connect to server on '127.0.0.1' (111 "Connection refused"), Internal MariaDB error code: 2003
      root@fedora:/run/media/claudio/FedoraData/myharem/instances/10506# date
      lun 12 ago 2024, 14:33:03, CEST
      root@fedora:/run/media/claudio/FedoraData/myharem/instances/10506# tail -2 data/error.10506.log 
      2024-08-12 14:32:51 77 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './fedora-relay-bin.000001' position: 4
      2024-08-12 14:32:51 76 [ERROR] Slave I/O: error connecting to master 'someuser@127.0.0.1:9999' - retry-time: 3  maximum-retries: 86400  message: Can't connect to server on '127.0.0.1' (111 "Connection refused"), Internal MariaDB error code: 2003
      

      If you listen on the port with anything (nc -l 9999) the retry will happen.

      Googling around it seems that there are multiple scenarios reported when the Replica won't try to reconnect.

      Attachments

        Activity

          People

            ParadoxV5 Jimmy Hú
            claudio.nanni Claudio Nanni
            Votes:
            2 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.