[MDEV-27524] Incorrect binlogs after Galera SST using rsync and mariabackup Created: 2022-01-17  Updated: 2023-05-25  Resolved: 2022-02-22

Status: Closed
Project: MariaDB Server
Component/s: Galera SST, mariabackup
Affects Version/s: 10.2.41, 10.3.32, 10.4.22, 10.5.13, 10.6.5, 10.7.1
Fix Version/s: 10.9.0, 10.3.35, 10.4.25, 10.5.16, 10.6.8, 10.7.4, 10.8.3

Type: Bug Priority: Critical
Reporter: Julius Goryavsky Assignee: Julius Goryavsky
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-27719 Syntax error in wsrep_sst_common: [: ... Closed
blocks MDEV-27777 Some Galera tests fail on FreeBSD wit... Closed
Issue split
split to MDEV-27534 mariabackup: missing line in the comp... Closed
split to MDEV-27602 Automatic creation of working directo... Stalled
split to MDEV-27692 Waiting for a port before trying to c... Stalled
split to MDEV-27740 Joiner node failed to join the cluste... Closed
split from MDEV-24097 galera_3nodes suite tests in MTR spor... Closed
PartOf
includes MDEV-24365 datadir cannot be changed even when s... Stalled
Problem/Incident
causes MDEV-27777 Some Galera tests fail on FreeBSD wit... Closed
causes MDEV-28758 Mariabackup copies binary logs to bac... Closed
causes MDEV-28781 mariabackup rsync fails to copy /var/... Closed
causes MDEV-29109 mariabackup --rsync option broken sin... Closed
Relates
relates to MDEV-17921 Galera SST successfully finishing wit... Closed
relates to MDEV-19874 Unable to join galera node. SST ERROR... Closed
relates to MDEV-26201 MariaDB 10.5.11 cannot do SST when bi... Closed
relates to MDEV-28583 Galera: binlogs disappear after rsync... Closed
relates to MDEV-28968 mariabackup SST doesn't properly hand... Closed
relates to MDEV-28015 Mariabackup | GTID value is missing, ... Closed

 Description   

In many scenarios, nodes crash after SST if the donor node uses multiple binlogs files and then SST is performed using rsync or mariabackup. Especially often the failure occurs after accessing binlogs from the joiner side. These failures are due to the fact that currently neither SST with rsync nor SST with mariabackup transfers all binlogs (not just the latest one), and the binlogs handling code in SST scripts is very simplistic and cannot correctly work in a number of configurations, for example, in configurations including absolute paths to binlogs in an index file, etc. An example of such a failure due to incorrect transmission of binlogs after SST:

021-12-28  1:48:20 45 [Warning] Access denied for user 'testuser'@'127.0.0.1' (using password: YES)
2021-12-28  2:11:01 0 [Note] InnoDB: Buffer pool(s) load completed at 211228  2:11:01
2021-12-28  3:31:32 0 [Note] WSREP: Member 1.0 (ddmsmariadb03) desyncs itself from group
2021-12-28  3:31:32 0 [Note] WSREP: Member 1.0 (ddmsmariadb03) resyncs itself to group.
2021-12-28  3:31:32 0 [Note] WSREP: Member 1.0 (ddmsmariadb03) synced with group.
2021-12-28  9:11:13 10 [ERROR] Error in Log_event::read_log_event(): 'Sanity check failed', data_len: 0, event_type: 0
2021-12-28  9:11:13 10 [ERROR] WSREP: applier could not read binlog event, seqno: 196201, len: 18446744072749699206
2021-12-28  9:11:13 0 [Note] WSREP: Member 1(ddmsmariadb03) initiates vote on 68ed27ac-671d-11ec-bed4-924ee3878f93:196201,8c41510586499f3b: 
2021-12-28  9:11:13 0 [Note] WSREP: Votes over 68ed27ac-671d-11ec-bed4-924ee3878f93:196201:
   0000000000000000:   1/3
   8c41510586499f3b:   1/3
Waiting for more votes.
2021-12-28  9:11:13 5 [Note] WSREP: Got vote request for seqno 68ed27ac-671d-11ec-bed4-924ee3878f93:196201
2021-12-28  9:11:13 0 [Note] WSREP: Member 0(ddmsmariadb01) responds to vote on 68ed27ac-671d-11ec-bed4-924ee3878f93:196201,0000000000000000: Success
2021-12-28  9:11:13 0 [Warning] WSREP: Received bogus VOTE message: 196201.0, from node 52e83762-673c-11ec-b091-279e19995029, expected > 196203. Ignoring.
2021-12-28  9:11:13 0 [Note] WSREP: Votes over 68ed27ac-671d-11ec-bed4-924ee3878f93:196201:
   0000000000000000:   1/3
   8c41510586499f3b:   1/3
Waiting for more votes.
2021-12-28  9:11:13 0 [Note] WSREP: Member 2(ddmsmariadb02) initiates vote on 68ed27ac-671d-11ec-bed4-924ee3878f93:196201,8c41510586499f3b: 
2021-12-28  9:11:13 0 [Note] WSREP: Votes over 68ed27ac-671d-11ec-bed4-924ee3878f93:196201:
   0000000000000000:   1/3
   8c41510586499f3b:   2/3
Winner: 8c41510586499f3b
2021-12-28  9:11:13 10 [ERROR] WSREP: Failed to apply write set: gtid: 68ed27ac-671d-11ec-bed4-924ee3878f93:196201 server_id: 52e83762-673c-11ec-b091-279e19995029 client_id: 5863 trx_id: 6901439 flags: 3 (start_transaction | commit)
2021-12-28  9:11:13 10 [Note] WSREP: Closing send monitor...
2021-12-28  9:11:13 10 [Note] WSREP: Closed send monitor.
2021-12-28  9:11:13 10 [Note] WSREP: gcomm: terminating thread
2021-12-28  9:11:13 10 [Note] WSREP: gcomm: joining thread
2021-12-28  9:11:13 10 [Note] WSREP: gcomm: closing backend
2021-12-28  9:11:15 10 [Note] WSREP: view(view_id(NON_PRIM,52e83762-b091,3) memb {
	6e8d8876-803a,0
} joined {
} left {
} partitioned {
	52e83762-b091,0
	5ef9bf5e-bd37,0
})



 Comments   
Comment by Jan Lindström (Inactive) [ 2022-01-18 ]

Code changes are ok, you could have avoided white space changes on mariabackup. However, rsync method seems to still have problems based on buildbot.

Comment by Brandon Nesterenko [ 2022-01-20 ]

The refinements look good. I left a couple minor suggestions on the new commit, feel free to push after their consideration.

Comment by Jan Lindström (Inactive) [ 2022-01-26 ]

mariabackup, test and script changes ok to push

Comment by Jan Lindström (Inactive) [ 2022-01-26 ]

serg Can you review https://github.com/MariaDB/server/commit/6777c260c22e16fb9a76a60f124c2d9337d04585 . My Perl knowledge is too thin to make any kind of review.

Comment by Brandon Nesterenko [ 2022-01-27 ]

I left one last question, otherwise it looks good. Thanks!

Comment by Julius Goryavsky [ 2022-02-22 ]

Fixed, https://github.com/MariaDB/server/commit/17e0f5224c8339ec08707a6ad0397bbf8c19bbd3

Generated at Thu Feb 08 09:53:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.