[MDEV-22554] galera.galera_sst_mariabackup fails with "Failed to start mysqld.2" Created: 2020-05-14  Updated: 2020-06-03  Resolved: 2020-05-18

Status: Closed
Project: MariaDB Server
Component/s: Galera, Galera SST, mariabackup, Tests
Affects Version/s: 10.5.3
Fix Version/s: 10.5.4, 10.2.33, 10.3.24, 10.4.14

Type: Bug Priority: Blocker
Reporter: Michael Widenius Assignee: Julius Goryavsky
Resolution: Fixed Votes: 0
Labels: None
Environment:

BUILD/compile-pentium64-valgrind-max
OpenSuse 10.5


Attachments: Zip Archive 2000429_galera_sst_mariabackup_10,5e.zip     Text File 200429_stdout_10.5e.log    

 Description   

It always fails with:
Failed to start mysqld.2
mysqltest failed but provided no output
14:23
In the mysqld.2.error log:
2020-05-13 14:15:54 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_do
main_id from joiner script.
2020-05-13 14:15:54 3 [Note] WSREP: SST received
2020-05-13 14:15:54 3 [Note] WSREP: SST received: 00000000-0000-0000-0000-000000
000000:-1
2020-05-13 14:15:54 2 [ERROR] WSREP: Application received wrong state:
Received: 00000000-0000-0000-0000-000000000000
Required: df55f3aa-950a-11ea-8831-0a0b0fb93391



 Comments   
Comment by Stepan Patryshev (Inactive) [ 2020-05-14 ]

It failed also on Jenkins 10.5 ES:
stdio.log:

10.5.2-0 ES, 58cd5f2adc138934150b5cb39b0acf65388f4cc8, Build RelWithDebInfo, debian-9

galera.galera_sst_mariabackup 'innodb,release' w6 [ fail ]
        Test ended at 2020-04-29 02:31:10
 
CURRENT_TEST: galera.galera_sst_mariabackup
 
 
Failed to start mysqld.2
mysqltest failed but provided no output
 
 
 - saving '/var/tmp/mtr/6/log/galera.galera_sst_mariabackup-innodb,release/' to '/var/tmp/mtr/log/galera.galera_sst_mariabackup-innodb,release/'
***Warnings generated in error logs during shutdown after running tests: galera.galera_sst_mariabackup
 
2020-04-29  2:31:09 0 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:
2020-04-29  2:31:09 0 [ERROR] WSREP: failed to open gcomm backend connection: 131: have_quorum(current_view_, pc_view_) == true:  (FATAL)
2020-04-29  2:31:09 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():222: Failed to open backend connection: -131 (State not recoverable)
2020-04-29  2:31:09 0 [ERROR] WSREP: gcs connect failed: State not recoverable
2020-04-29  2:31:09 0 [ERROR] Plugin 'wsrep' init function returned error.
2020-04-29  2:31:09 0 [ERROR] Failed to initialize plugins.
2020-04-29  2:31:09 0 [ERROR] Aborting

Server logs.

Comment by Julius Goryavsky [ 2020-05-15 ]

The problem is related to the operation of netcat streamer and does not appear on systems where socat is installed. Probably we need to add the -N option for netcat. As a local fix, find the second comment "# Debian netcat" in the /scripts/wsrep_scripts_mariabackup file (in the scripts directory) and change tcmd = "nc ${REMOTEIP} ${TSST_PORT}" to tcmd = "nc -N ${REMOTEIP} ${TSST_PORT} ". I am now figuring out whether adding this option is enough, or as a perfect solution, another refinement is needed.

Comment by Julius Goryavsky [ 2020-05-15 ]

Fixed, https://github.com/MariaDB/server/commit/08f3ca8020af50fad80783b87bc70733036e5269

Comment by Julius Goryavsky [ 2020-05-15 ]

The problem turned out to be a netcat streamer freeze after the successful completion of SST. As a result of several experiments, it was found that the data transmitted during the SST is correct, but netcat does not make a graceful TCP disconnect when receiving EOF from STDIN. To solve this problem, we need to call netcat with the -N option on the donor side. The fix here: https://github.com/MariaDB/server/commit/08f3ca8020af50fad80783b87bc70733036e5269

Comment by Jan Lindström (Inactive) [ 2020-05-15 ]

ok to push but please push change to 10.2 also.

Comment by Julius Goryavsky [ 2020-05-18 ]

Fixed & closed after verification

Generated at Thu Feb 08 09:15:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.