Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32734

Galera SST using rsync over stunnel fails

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.5.21
    • 10.5, 10.6
    • Galera SST
    • Server version: 10.5.21-MariaDB-0+deb11u1 Debian 11
      uname: Linux 5.10.0-26-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux

    Description

      Before enabling encryption over stunnel, no issues were observed. After stunnel was enabled, SST during startup fails. The problem is not consistent, but seems to happen quite regularly when RTT between donor and joiner is high (> 100ms).

      I did some debugging, and the issue seems to be related to a race condition where stunnel is killed prematurely, before rsync is 100% complete. Below is some relevant lines from the log when the problem happens (with stunnel debug = info):

      Nov  8 15:14:01 xxx rsyncd[35829]: connect from yyy (x.x.x.x)
      Nov  8 15:14:01 xxx rsyncd[35829]: rsync to rsync_sst/ from yyy (x.x.x.x)
      Nov  8 15:14:01 xxx rsyncd[35829]: receiving file list
      Nov  8 15:14:01 xxx mariadbd[35677]: WSREP_SST: [INFO] rsync IST completed on joiner (20231108 15:14:01.897)
      Nov  8 15:14:01 xxx mariadbd[35677]: WSREP_SST: [INFO] Joiner cleanup: rsync PID=0, stunnel PID=35814 (20231108 15:14:01.900)
      Nov  8 15:14:01 xxx stunnel: LOG5[ui]: Terminated
      Nov  8 15:14:01 xxx stunnel: LOG5[ui]: Terminating 2 service thread(s)
      Nov  8 15:14:01 xxx stunnel: LOG5[ui]: Service threads terminated
      Nov  8 15:14:02 xxx rsyncd[35829]: rsync: connection unexpectedly closed (90 bytes received so far) [generator]
      Nov  8 15:14:02 xxx mariadbd[35829]: rsync: connection unexpectedly closed (90 bytes received so far) [generator]
      Nov  8 15:14:02 xxx mariadbd[35829]: rsync error: error in rsync protocol data stream (code 12) at io.c(228) [generator=3.2.3]
      Nov  8 15:14:02 xxx rsyncd[35829]: rsync error: error in rsync protocol data stream (code 12) at io.c(228) [generator=3.2.3]
      

      If adding a "sleep 5" to the end of the wsrep_sst_rsync script, just before the exit trap kills stunnel, the problem is no longer observed:

      --- wsrep_sst_rsync.bak	2023-07-23 23:26:58.000000000 +0200
      +++ wsrep_sst_rsync	2023-11-08 15:59:18.793493850 +0100
      @@ -1003,4 +1003,5 @@
      fi
       
      wsrep_log_info "$WSREP_METHOD $WSREP_TRANSFER_TYPE completed on $WSREP_SST_OPT_ROLE"
      +sleep 5
      exit 0
      

      success:

      Nov  8 16:33:17 xxx rsyncd[63392]: connect from yyy (x.x.x.x)
      Nov  8 16:33:18 xxx rsyncd[63392]: rsync to rsync_sst/ from yyy (x.x.x.x)
      Nov  8 16:33:18 xxx rsyncd[63392]: receiving file list
      Nov  8 16:33:18 xxx mariadbd[63240]: WSREP_SST: [INFO] rsync IST completed on joiner (20231108 16:33:18.489)
      Nov  8 16:33:18 xxx rsyncd[63392]: sent 48 bytes  received 194 bytes  total size 52
      Nov  8 16:33:18 xxx stunnel: LOG6[0]: Read socket closed (readsocket)
      Nov  8 16:33:18 xxx stunnel: LOG6[0]: SSL_shutdown successfully sent close_notify alert
      Nov  8 16:33:18 xxx stunnel: LOG6[ui]: Child process 63392 finished with code 0
      Nov  8 16:33:18 xxx stunnel: LOG6[0]: SSL_read: Socket is closed
      Nov  8 16:33:18 xxx stunnel: LOG6[0]: TLS socket closed (SSL_read)
      Nov  8 16:33:18 xxx stunnel: LOG5[0]: Connection closed: 111 byte(s) sent to TLS, 289 byte(s) sent to socket
      Nov  8 16:33:23 xxx mariadbd[63240]: WSREP_SST: [INFO] Joiner cleanup: rsync PID=0, stunnel PID=63377 (20231108 16:33:23.492)
      Nov  8 16:33:23 xxx stunnel: LOG5[ui]: Terminated
      Nov  8 16:33:23 xxx stunnel: LOG5[ui]: Terminating 1 service thread(s)
      Nov  8 16:33:23 xxx stunnel: LOG5[ui]: Service threads terminated
      

      Attachments

        Activity

          People

            teemu.ollakka Teemu Ollakka
            richardo Richard
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.