Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Not a Bug
-
10.5.21
-
None
Description
The behaviour of SQL_LOG_BIN is completely broken on Galera.
Minimal reproducer:
2 galera nodes, minimal config, one reader (1), one writer (2)
1 async replica (3) replicating from (2).
gtid_strict_mode=ON on all 3
On the writer (1):
set sql_log_bin=0;
create database test2;
Result:
test2 database will be created on all 3 nodes
GTID will be incremented to 0-1-n+1 on nodes 2 (galera) and 3 (async), but NOT on node 1 that wrote the transaction!
So sql_log_bin=0 renders the master writer node to corrupt it's own gtid position, by making all other nodes in the topology think themselves to be ahead of it even though it is the master writer node that actually generated the transaction.
Transaction ends up written into binlogs on the secondary galera node, but is not written locally on the master node that had sql_log_bin=0 set.
Worse, if you then start a new session on the writer without disabling sql_log_bin=0, and run another transaction, that transaction will then get the n+1 gtid with mater's server-id, but it will have gtid on all other nodes of n+2 gtid.
So in this case global-transaction-id is no longer a globally unique identifier of a transaction id. Because the same gtid on the master node and the secondary nodes now refer to completely different transactions.
This is critically dangerously broken behaviour that renders async replication from galera unsafe.
Expected behaviour:
- Transaction should replicate to other galera nodes and have the same gtid on all of them, but it should be OMITTED from the binlogs on ALL galera nodes.
Attachments
Issue Links
- relates to
-
MDEV-9037 DML statements on a Galera Cluster node with sql_log_bin=OFF still appears in binary log on _other_ nodes
- Closed
-
MDEV-20087 Galera + SET SQL_LOG_BIN=0 on binlog on others nodes
- Closed