Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18621

wsrep_sst_mariabackup socat dead connection



    • Bug
    • Status: In Progress (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.2.22
    • 10.2
    • Galera SST
    • None
    • debian 9


      When doing SST on joiner the socat receiver once donor completes SST transfer the joiner hangs in TCP connection ESTABLISHED while the socat on donor already ends. The solution is to wait 7200 seconds which is default timeout for dead tcp connections or manually kill socat on joiner which will continue with SST or my workaround currently is configuration on joiner in my.cnf -


      which will close dead tcp connection. I suggest to put this keepalive directly into the /usr/bin/wsrep_sst_mariabackup

      I also suggest to revise or find out why the socat on donor does not sends FIN or sends EOF over the network to joiner.

      Here are some logs:

      on Donor:

      Feb 18 04:04:50 s1 -innobackupex-backup: [00] 2019-02-18 04:04:50 completed OK!
      Feb 18 04:04:50 s1 -wsrep-sst-donor: Total time on donor: 0 seconds
      Feb 18 04:04:50 s1 -wsrep-sst-donor: Cleaning up temporary directories

      on Joiner:

      Feb 18 02:20:31 s3 -wsrep-sst-joiner: Waiting for SST streaming to complete!
      Feb 18 04:08:04 s3 -wsrep-sst-joiner: 2019/02/18 04:08:04 socat[20811] E read(7, 0x55845e0c55b0, 8192): Connection timed out Feb 18 04:08:04 s3 -wsrep-sst-joiner: [00] 2019-02-18 04:08:04 xb_stream_read_chunk(): my_read() failed. Feb 18 04:08:04 s3 -wsrep-sst-joiner: Error while getting data from donor node: exit codes: 1 1 Feb 18 04:08:04 s3 -wsrep-sst-joiner: Preparing the backup at /data/mysql//.sst Feb 18 04:08:04 s3 -wsrep-sst-joiner: Evaluating /usr//bin/mariabackup --innobackupex --apply-log $rebuildcmd ${DATA} 2>&1 | logger -p daemon.err -t -innobackupex-apply
      Feb 18 04:08:04 s3 -innobackupex-apply: 190218 04:08:04 innobackupex: Starting the apply-log operation

      without the extra socket option (,keepalive,keepidle=10,keepintvl=10,keepcnt=2) the timeout will happen after 2 hours and not that fast




            sysprg Julius Goryavsky
            festr Martin Vit
            1 Vote for this issue
            5 Start watching this issue



              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.