[MDEV-22029] Galera Node missing binlog entries from other nodes dispite log_slave_updates=ON Created: 2020-03-24  Updated: 2023-04-27

Status: Open
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.3.21
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Ulrich Moser (Inactive) Assignee: Seppo Jaakola
Resolution: Unresolved Votes: 0
Labels: None
Environment:

CentOS Linux release 7.7.1908 (Core)


Attachments: File log-node1.log     File log-node2.log     File log-node3.log    

 Description   

I am using one node of a 3-node-Clusters as Master for a slave. On all node log_slave_updates is set to ON. Still we occasionaly encounter missing records when processing updates from the binary logs on the slave. Analyzing the binary logs on the cluster nodes I found that most of the changes applied on other nodes are reflected on the node serving as master but some are not. And it is unpredectable when this happens.

This problem is a real blocker because the result is that the replication breaks whenever such a problem occurs and the binary logs are not usable for point in time recovery.



 Comments   
Comment by Jan Lindström (Inactive) [ 2020-03-24 ]

Do you have a repeatable test case or some examples what is missing from binary log? Please add error logs from all nodes and configuration.

Comment by Ulrich Moser (Inactive) [ 2020-03-24 ]

The error is not repeatable but occurs occasionally. It is only about 5 to 10 out of 500000 to 700000 transactions. But this breaks the replication and the point in time recovery.

The application processes data streams from an external source and populates customer specific tables. A single business transaction can cause several events in the data streams that are imported several times a day. The events from the data streams are handled by multiple processes that are distributed across the galera nodes by a load balancer. Therefore events related to the same business transaction can be executed on different nodes. Usually this works fine and the events are replicated throughout the cluster and therefore with log_slave_updates set to ON should occur in the binary logs on every node.

In the cases I encountered the initial INSERT happend on node 2 and an UPDATE occured on node 1. Both were correctly replicated within the cluster but the binary log entry for the INSERT received from node 2 was missing on node 1. As a result the UPDATE failed on the slave since it never got the INSERT.

All nodes have the same hardware, OS and MariaDB-Version.

Comment by Ulrich Moser (Inactive) [ 2020-03-24 ]

BTW this reminds me of a similar problem I encountered at another customer which we could not solve. Finally the customer was thus p... off the he did not extend the subscription and changed to postgreSQL.

Comment by Ulrich Moser (Inactive) [ 2020-03-24 ]

Currently I only have a journal for node 1 available. I contacted the hoster for help with the other journals. log-node1.log

Comment by Jan Lindström (Inactive) [ 2020-03-25 ]

We would need those binary logs from node1 and node2 to analyze why INSERT is not there.

Comment by Ulrich Moser (Inactive) [ 2020-03-25 ]

Hi Jan,
the binary logs contain personal identifying data. So I cannot upload them here.

BTW: I enabled sync_binlog on all three nodes yesterday and got another error of this type.

Comment by Jan Lindström (Inactive) [ 2020-03-25 ]

I would need only one example of binlog portion where you can see that one binlog has expected INSERT+UPDATE (you may change contents to remove personal data from it) and portion where you see that INSERT is missing.

Comment by Ulrich Moser (Inactive) [ 2020-03-25 ]

Can I send the excerpts directly to you by mail? Since the Row Image blocks still contain the personal data even if I change it in the annotations.

Generated at Thu Feb 08 09:11:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.