[MDEV-26759] New cluster member leaves cluster when mysql replication started Created: 2021-10-04  Updated: 2021-12-09

Status: Open
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.5.12
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Justin Bennett Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None


 Description   

Hello

We have hit an issue where a new cluster member will drop from the cluster with a consistency error as soon as mysql asynchronous replication is started on the donor node.
Another user has hit almost exactly the same problem and documented it here. The difference in our use case is the replication master (Percona XtraDB Cluster 5.7.27) rather than MySQL5.6 and our voting outcome is slightly different:

2021-10-04 14:05:55 16 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 42, event_type: -94
2021-10-04 14:05:55 16 [ERROR] WSREP: applier could not read binlog event, seqno: 245339937, len: 109
2021-10-04 14:05:55 0 [Note] WSREP: Member 1(new node - joiner) initiates vote on b28cc80b-1483-11ec-80bf-63be0f24e523:245339937,d697ce56d7e540c5:
2021-10-04 14:05:55 0 [Note] WSREP: Votes over b28cc80b-1483-11ec-80bf-63be0f24e523:245339937:
   d697ce56d7e540c5:   1/2
Waiting for more votes.
2021-10-04 14:05:55 0 [Note] WSREP: Member 0(replication slave node - donor) responds to vote on b28cc80b-1483-11ec-80bf-63be0f24e523:245339937,0000000000000000: Success
2021-10-04 14:05:55 0 [Note] WSREP: Votes over b28cc80b-1483-11ec-80bf-63be0f24e523:245339937:
   0000000000000000:   1/2
   d697ce56d7e540c5:   1/2
Winner: 0000000000000000
2021-10-04 14:05:55 16 [ERROR] WSREP: Inconsistency detected: Inconsistent by consensus on b28cc80b-1483-11ec-80bf-63be0f24e523:245339937
         at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:process_apply_error():1343

We can't go into production with MariaDB until this is resolved.

Thanks
Justin.



 Comments   
Comment by Justin Bennett [ 2021-10-05 ]

I noticed there was a galera log file in the data directory, it contained:

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#120715  7:45:56 server id 1  end_log_pos 107 	Start: binlog v 4, server v 5.5.25-debug-log created 120715  7:45:56 at startup
# Warning: this binlog is either in use or was not closed properly.
ROLLBACK/*!*/;
BINLOG '
NHUCUA8BAAAAZwAAAGsAAAABAAQANS41LjI1LWRlYnVnLWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAA0dQJQEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
ERROR: Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 402, event_type: 1
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Comment by Justin Bennett [ 2021-10-07 ]

After further investigation I was able to determine that the new Maria cluster node (node 2) was dropping out of the cluster when a TRUNCATE TABLE statement replicated from the Percona cluster was executed by node 1 in the Maria cluster. I was able to replicate this in my test environment. How to replicate:

1. Set up a Percona XtraDB 5.7.27 cluster
2. Set up a MariaDB 10.5.12 cluster, but only bootstrap the first node. Leave other nodes unstarted.
On the Percona cluster:
3. Create a test database and table.
4. Use mysqldump to dump the test database and table, including binlog position.
On the MariaDB cluster:
5. Load the dump containing the test database and table. Configure replication but don't start it.
6. Start the second node so it joins the cluster containing the first node.
7. Start replication on node 1. Confirm replication has started.
On the Percona cluster:
8. Truncate the test table. Second node in the Maria cluster will drop out of the cluster with a consistency error when the TRUNCATE TABLE is processed by the first node. Note the first node processes the TRUNCATE TABLE statement without any problem.

Comment by Justin Bennett [ 2021-10-11 ]

Further testing shows the same error will occur with MySQL Community Server 5.7.35, no cluster required. Interestingly, I followed the exactly same process with a single MariaDB 10.5.12 database as the master, no error. So it definitely seems to some problem with Galera replicating certain statements from non Maria replication masters.

Comment by lele forzani [ 2021-12-09 ]

We are having the same issue issue with mariadb 10.6.5 with a galera cluster set a slave of an old mysql 5.6.12.
It worked for a while, until every node went inconsistent.

Trying to re-assemble the cluster has every node fail with the same issue as soon as a TRUNCATE TABLE is executed.

021-12-09 23:09:50 6 [Note] WSREP: Wsrep_high_priority_service::apply_toi: 5015
2021-12-09 23:09:50 6 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 42, event_type: -94
2021-12-09 23:09:50 6 [ERROR] WSREP: applier could not read binlog event, seqno: 5015, len: 107
2021-12-09 23:09:50 6 [Note] WSREP: Error buffer for thd 6 seqno 5015, 0 bytes: '(null)'
2021-12-09 23:09:50 6 [Note] WSREP: Set WSREPXid for InnoDB:  2309de4f-593c-11ec-b7bc-e6380a34c6bb:5015
2021-12-09 23:09:50 0 [Note] WSREP: Member 1(core-mlc) initiates vote on 2309de4f-593c-11ec-b7bc-e6380a34c6bb:5015,ea68121ba4f89b67: 
2021-12-09 23:09:50 0 [Note] WSREP: Votes over 2309de4f-593c-11ec-b7bc-e6380a34c6bb:5015:
   ea68121ba4f89b67:   1/2
Waiting for more votes.
2021-12-09 23:09:50 0 [Note] WSREP: Member 0(core-mlc-slave) responds to vote on 2309de4f-593c-11ec-b7bc-e6380a34c6bb:5015,0000000000000000: Success
2021-12-09 23:09:50 0 [Note] WSREP: Votes over 2309de4f-593c-11ec-b7bc-e6380a34c6bb:5015:
   0000000000000000:   1/2
   ea68121ba4f89b67:   1/2
Winner: 0000000000000000
2021-12-09 23:09:50 6 [ERROR] WSREP: Inconsistency detected: Inconsistent by consensus on 2309de4f-593c-11ec-b7bc-e6380a34c6bb:5015

Generated at Thu Feb 08 09:47:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.