Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22554

galera.galera_sst_mariabackup fails with "Failed to start mysqld.2"

Details

    Description

      It always fails with:
      Failed to start mysqld.2
      mysqltest failed but provided no output
      14:23
      In the mysqld.2.error log:
      2020-05-13 14:15:54 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_do
      main_id from joiner script.
      2020-05-13 14:15:54 3 [Note] WSREP: SST received
      2020-05-13 14:15:54 3 [Note] WSREP: SST received: 00000000-0000-0000-0000-000000
      000000:-1
      2020-05-13 14:15:54 2 [ERROR] WSREP: Application received wrong state:
      Received: 00000000-0000-0000-0000-000000000000
      Required: df55f3aa-950a-11ea-8831-0a0b0fb93391

      Attachments

        1. 200429_stdout_10.5e.log
          39 kB
          Stepan Patryshev

        Activity

          It failed also on Jenkins 10.5 ES:
          stdio.log:

          10.5.2-0 ES, 58cd5f2adc138934150b5cb39b0acf65388f4cc8, Build RelWithDebInfo, debian-9

          galera.galera_sst_mariabackup 'innodb,release' w6 [ fail ]
                  Test ended at 2020-04-29 02:31:10
           
          CURRENT_TEST: galera.galera_sst_mariabackup
           
           
          Failed to start mysqld.2
          mysqltest failed but provided no output
           
           
           - saving '/var/tmp/mtr/6/log/galera.galera_sst_mariabackup-innodb,release/' to '/var/tmp/mtr/log/galera.galera_sst_mariabackup-innodb,release/'
          ***Warnings generated in error logs during shutdown after running tests: galera.galera_sst_mariabackup
           
          2020-04-29  2:31:09 0 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:
          2020-04-29  2:31:09 0 [ERROR] WSREP: failed to open gcomm backend connection: 131: have_quorum(current_view_, pc_view_) == true:  (FATAL)
          2020-04-29  2:31:09 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():222: Failed to open backend connection: -131 (State not recoverable)
          2020-04-29  2:31:09 0 [ERROR] WSREP: gcs connect failed: State not recoverable
          2020-04-29  2:31:09 0 [ERROR] Plugin 'wsrep' init function returned error.
          2020-04-29  2:31:09 0 [ERROR] Failed to initialize plugins.
          2020-04-29  2:31:09 0 [ERROR] Aborting
          

          Server logs.

          stepan.patryshev Stepan Patryshev (Inactive) added a comment - - edited It failed also on Jenkins 10.5 ES: stdio.log : 10.5.2-0 ES, 58cd5f2adc138934150b5cb39b0acf65388f4cc8, Build RelWithDebInfo, debian-9 galera.galera_sst_mariabackup 'innodb,release' w6 [ fail ] Test ended at 2020-04-29 02:31:10   CURRENT_TEST: galera.galera_sst_mariabackup     Failed to start mysqld.2 mysqltest failed but provided no output     - saving '/var/tmp/mtr/6/log/galera.galera_sst_mariabackup-innodb,release/' to '/var/tmp/mtr/log/galera.galera_sst_mariabackup-innodb,release/' ***Warnings generated in error logs during shutdown after running tests: galera.galera_sst_mariabackup   2020-04-29 2:31:09 0 [ERROR] WSREP: caught exception in PC, state dump to stderr follows: 2020-04-29 2:31:09 0 [ERROR] WSREP: failed to open gcomm backend connection: 131: have_quorum(current_view_, pc_view_) == true: (FATAL) 2020-04-29 2:31:09 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():222: Failed to open backend connection: -131 (State not recoverable) 2020-04-29 2:31:09 0 [ERROR] WSREP: gcs connect failed: State not recoverable 2020-04-29 2:31:09 0 [ERROR] Plugin 'wsrep' init function returned error. 2020-04-29 2:31:09 0 [ERROR] Failed to initialize plugins. 2020-04-29 2:31:09 0 [ERROR] Aborting Server logs .

          The problem is related to the operation of netcat streamer and does not appear on systems where socat is installed. Probably we need to add the -N option for netcat. As a local fix, find the second comment "# Debian netcat" in the /scripts/wsrep_scripts_mariabackup file (in the scripts directory) and change tcmd = "nc ${REMOTEIP} ${TSST_PORT}" to tcmd = "nc -N ${REMOTEIP} ${TSST_PORT} ". I am now figuring out whether adding this option is enough, or as a perfect solution, another refinement is needed.

          sysprg Julius Goryavsky added a comment - The problem is related to the operation of netcat streamer and does not appear on systems where socat is installed. Probably we need to add the -N option for netcat. As a local fix, find the second comment "# Debian netcat" in the /scripts/wsrep_scripts_mariabackup file (in the scripts directory) and change tcmd = "nc ${REMOTEIP} ${TSST_PORT}" to tcmd = "nc -N ${REMOTEIP} ${TSST_PORT} ". I am now figuring out whether adding this option is enough, or as a perfect solution, another refinement is needed.
          sysprg Julius Goryavsky added a comment - Fixed, https://github.com/MariaDB/server/commit/08f3ca8020af50fad80783b87bc70733036e5269

          The problem turned out to be a netcat streamer freeze after the successful completion of SST. As a result of several experiments, it was found that the data transmitted during the SST is correct, but netcat does not make a graceful TCP disconnect when receiving EOF from STDIN. To solve this problem, we need to call netcat with the -N option on the donor side. The fix here: https://github.com/MariaDB/server/commit/08f3ca8020af50fad80783b87bc70733036e5269

          sysprg Julius Goryavsky added a comment - The problem turned out to be a netcat streamer freeze after the successful completion of SST. As a result of several experiments, it was found that the data transmitted during the SST is correct, but netcat does not make a graceful TCP disconnect when receiving EOF from STDIN. To solve this problem, we need to call netcat with the -N option on the donor side. The fix here: https://github.com/MariaDB/server/commit/08f3ca8020af50fad80783b87bc70733036e5269

          ok to push but please push change to 10.2 also.

          jplindst Jan Lindström (Inactive) added a comment - ok to push but please push change to 10.2 also.

          Fixed & closed after verification

          sysprg Julius Goryavsky added a comment - Fixed & closed after verification

          People

            sysprg Julius Goryavsky
            monty Michael Widenius
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.