Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18621

wsrep_sst_mariabackup socat dead connection

    XMLWordPrintable

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.2.22
    • 10.2
    • Galera SST
    • None
    • debian 9

    Description

      When doing SST on joiner the socat receiver once donor completes SST transfer the joiner hangs in TCP connection ESTABLISHED while the socat on donor already ends. The solution is to wait 7200 seconds which is default timeout for dead tcp connections or manually kill socat on joiner which will continue with SST or my workaround currently is configuration on joiner in my.cnf -

      [sst]
      sockopt=,keepalive,keepidle=10,keepintvl=10,keepcnt=2

      which will close dead tcp connection. I suggest to put this keepalive directly into the /usr/bin/wsrep_sst_mariabackup

      I also suggest to revise or find out why the socat on donor does not sends FIN or sends EOF over the network to joiner.

      Here are some logs:

      on Donor:

      Feb 18 04:04:50 s1 -innobackupex-backup: [00] 2019-02-18 04:04:50 completed OK!
      Feb 18 04:04:50 s1 -wsrep-sst-donor: Total time on donor: 0 seconds
      Feb 18 04:04:50 s1 -wsrep-sst-donor: Cleaning up temporary directories

      on Joiner:

      Feb 18 02:20:31 s3 -wsrep-sst-joiner: Waiting for SST streaming to complete!
      Feb 18 04:08:04 s3 -wsrep-sst-joiner: 2019/02/18 04:08:04 socat[20811] E read(7, 0x55845e0c55b0, 8192): Connection timed out Feb 18 04:08:04 s3 -wsrep-sst-joiner: [00] 2019-02-18 04:08:04 xb_stream_read_chunk(): my_read() failed. Feb 18 04:08:04 s3 -wsrep-sst-joiner: Error while getting data from donor node: exit codes: 1 1 Feb 18 04:08:04 s3 -wsrep-sst-joiner: Preparing the backup at /data/mysql//.sst Feb 18 04:08:04 s3 -wsrep-sst-joiner: Evaluating /usr//bin/mariabackup --innobackupex --apply-log $rebuildcmd ${DATA} 2>&1 | logger -p daemon.err -t -innobackupex-apply
      Feb 18 04:08:04 s3 -innobackupex-apply: 190218 04:08:04 innobackupex: Starting the apply-log operation

      without the extra socket option (,keepalive,keepidle=10,keepintvl=10,keepcnt=2) the timeout will happen after 2 hours and not that fast

      Attachments

        Activity

          People

            sysprg Julius Goryavsky
            festr Martin Vit
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.