Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26019

Upgrading MariaDB from 10.5.10 to 10.5.11 breaks TLS mariabackup SST

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.6.2, 10.2.39, 10.3.30, 10.4.20, 10.5.11
    • 10.6.3, 10.2.40, 10.3.31, 10.4.21, 10.5.12
    • Galera SST, mariabackup, wsrep
    • Linux vc-galera01 5.4.114-1-pve #1 SMP PVE 5.4.114-1 (Sun, 09 May 2021 17:13:05 +0200) x86_64 x86_64 x86_64 GNU/Linux

      mysql Ver 15.1 Distrib 10.5.11-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2

    Description

      The upgrade from MariaDB from 10.5.10 to 10.5.11 breaks the wsrep_sst_mariabackup script. This is due to adding the commonname option to socat which at least on my end defaults to "localhost". There is also a typo (E and S swapped in ESCAPED) on line 389 that most likely is breaking something as well.

      elif is_local_ip "$WSREP_SST_OPT_HOST_UNESCAPED"; then
          CN_option=',commonname=localhost'
      else
          CN_option=",commonname='$WSREP_SST_OPT_HOST_UNSECAPED'" <- Right here
      fi
      

      To just get the node up I had to make the following change on line 391 of wsrep_sst_mariabackup, which also triggered a very inconvenient SST.

      #tcmd="$tcmd,cert='$tpem',key='$tkey',cafile='$tcert'$CN_option$sockopt"
      tcmd="$tcmd,cert='$tpem',key='$tkey',cafile='$tcert'$sockopt"
      

      Below is my log that led me to looking at the differences between the two versions:

      Jun 24 15:05:03 node1 -wsrep-sst-joiner: Decrypting with cert=/etc/mysql/certs/server-cert.pem, key=/etc/mysql/certs/server-key.pem, cafile=/etc/mysql/certs/ca.pem
      Jun 24 15:05:03 node1 -wsrep-sst-joiner: Evaluating timeout -k 310 300 socat -u openssl-listen:4444,reuseaddr,cert='/etc/mysql/certs/server-cert.pem',key='/etc/mysql/certs/server-key.pem',cafile='/etc/mysql/certs/ca.pem',commonname=localhost stdio | '/usr//bin/mbstream' -x; RC=( ${PIPESTATUS[@]} )
      Jun 24 15:05:03 node1 mariadbd[9378]: 2021-06-24 15:05:03 1 [Note] WSREP: ####### IST uuid:9795eb17-c967-11eb-896e-32dd10aa7427 f: 628742, l: 628743, STRv: 3
      Jun 24 15:05:03 node1 mariadbd[9378]: 2021-06-24 15:05:03 1 [Note] WSREP: IST receiver addr using ssl://node1.mycompany.com:4568
      Jun 24 15:05:03 node1 mariadbd[9378]: 2021-06-24 15:05:03 1 [Note] WSREP: IST receiver using ssl
      Jun 24 15:05:03 node1 mariadbd[9378]: 2021-06-24 15:05:03 1 [Note] WSREP: Prepared IST receiver for 628742-628743, listening at: ssl://node1-ip:4568
      Jun 24 15:05:03 node1 mariadbd[9378]: 2021-06-24 15:05:03 0 [Note] WSREP: Member 2.0 (node1) requested state transfer from '*any*'. Selected 1.0 (node3)(SYNCED) as donor.
      Jun 24 15:05:03 node1 mariadbd[9378]: 2021-06-24 15:05:03 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 628743)
      Jun 24 15:05:03 node1 mariadbd[9378]: 2021-06-24 15:05:03 1 [Note] WSREP: Requesting state transfer: success, donor: 1
      Jun 24 15:05:05 node1 mariadbd[9378]: 2021-06-24 15:05:05 0 [Note] WSREP: (75d699bd-98d2, 'ssl://0.0.0.0:4567') turning message relay requesting off
      Jun 24 15:05:06 node1 -wsrep-sst-joiner: 2021/06/24 15:05:06 socat[9577] E certificate is valid but its commonName does not match hostname
      Jun 24 15:05:06 node1 -wsrep-sst-joiner: Error while getting data from donor node:  exit codes: 1 0
      Jun 24 15:05:06 node1 -wsrep-sst-joiner: Cleanup after exit with status:32
      Jun 24 15:05:06 node1 -wsrep-sst-joiner: Removing the sst_in_progress file
      Jun 24 15:05:06 node1 -wsrep-sst-joiner: Cleaning up temporary directories
      Jun 24 15:05:06 node1 mariadbd[9378]: 2021-06-24 15:05:06 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address 'node1.mycompany.com' --datadir '/var/lib/mysql/' --parent '9378' --mysqld-args --wsrep_start_position=9795eb17-c967-11eb-896e-32dd10aa7427:628741: 32 (Broken pipe)
      Jun 24 15:05:06 node1 mariadbd[9378]: 2021-06-24 15:05:06 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
      

      Attachments

        Issue Links

          Activity

            sysprg Julius Goryavsky added a comment - https://github.com/MariaDB/server/commit/4ad148b148cfbb6f78b33ad9a7662f47c24cb759 and https://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-MDEV-26019-galera

            Ok to push if buildbot is happy

            jplindst Jan Lindström (Inactive) added a comment - Ok to push if buildbot is happy
            sysprg Julius Goryavsky added a comment - Fixed: 10.2: https://github.com/MariaDB/server/commit/4ad148b148cfbb6f78b33ad9a7662f47c24cb759 10.3: https://github.com/MariaDB/server/commit/29098083f7ac3b445ee59c3e765eb634ec70b947

            Hello everyone,

            I think the fix does not really cover the full issue. While the typo is gone now (which allows the donor to actually start up) the joiner still has the issue that socat is started with `commonname=localhost`. This will cause SST to fail as then the joiner will not validate the certificate of the donor correctly (as the certificate definately not matches localhost).

            This seems to be introduced here: https://github.com/MariaDB/server/commit/fe7e44d8ad5d7fe9c91f476353a3e1749f18afc6?branch=fe7e44d8ad5d7fe9c91f476353a3e1749f18afc6&diff=split#diff-1f9bb0e7c32584ac58bd554eeb3bb5f5f69b9310e7566d7566e71725926503dbR353 (in the diff of scripts/wsrep_sst_mariabackup.sh). Here the change removes the previous different behaviour between donor and joiner (where only the donor actually gets `commonname` set) and requires the common name for both the donor and the joiner.

            It is using the variable `WSREP_SST_OPT_HOST_UNESCAPED` for that which is always the hostname/ip of the joining node. Therefor the check here (https://github.com/MariaDB/server/blob/d1a948cfaaab67e699674af4c11efad3868a629d/scripts/wsrep_sst_mariabackup.sh#L387) reports for the joiner that it in fact is the local node and thereby sets `commonname=localhost`.

            To fix this i would propose to not append `$CN_option` at https://github.com/MariaDB/server/blob/d1a948cfaaab67e699674af4c11efad3868a629d/scripts/wsrep_sst_mariabackup.sh#L392 if `$WSREP_SST_OPT_ROLE = 'joiner'`.

            Thank you

            felix.huettner@mail.schwarz Felix Huettner added a comment - Hello everyone, I think the fix does not really cover the full issue. While the typo is gone now (which allows the donor to actually start up) the joiner still has the issue that socat is started with `commonname=localhost`. This will cause SST to fail as then the joiner will not validate the certificate of the donor correctly (as the certificate definately not matches localhost). This seems to be introduced here: https://github.com/MariaDB/server/commit/fe7e44d8ad5d7fe9c91f476353a3e1749f18afc6?branch=fe7e44d8ad5d7fe9c91f476353a3e1749f18afc6&diff=split#diff-1f9bb0e7c32584ac58bd554eeb3bb5f5f69b9310e7566d7566e71725926503dbR353 (in the diff of scripts/wsrep_sst_mariabackup.sh). Here the change removes the previous different behaviour between donor and joiner (where only the donor actually gets `commonname` set) and requires the common name for both the donor and the joiner. It is using the variable `WSREP_SST_OPT_HOST_UNESCAPED` for that which is always the hostname/ip of the joining node. Therefor the check here ( https://github.com/MariaDB/server/blob/d1a948cfaaab67e699674af4c11efad3868a629d/scripts/wsrep_sst_mariabackup.sh#L387 ) reports for the joiner that it in fact is the local node and thereby sets `commonname=localhost`. To fix this i would propose to not append `$CN_option` at https://github.com/MariaDB/server/blob/d1a948cfaaab67e699674af4c11efad3868a629d/scripts/wsrep_sst_mariabackup.sh#L392 if `$WSREP_SST_OPT_ROLE = 'joiner'`. Thank you

            felix.huettner@mail.schwarz Thanks for the comment, this change was added to MDEV-26360

            sysprg Julius Goryavsky added a comment - felix.huettner@mail.schwarz Thanks for the comment, this change was added to MDEV-26360

            People

              sysprg Julius Goryavsky
              ib-mlatin Matthew Latin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.