[MDEV-24327] wsrep XID checkpointing order compromised in multi-master work loads Created: 2020-12-01  Updated: 2021-06-03  Resolved: 2020-12-17

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.5.8, 10.4
Fix Version/s: 10.2.37, 10.3.28, 10.4.18, 10.5.9, 10.6.0

Type: Bug Priority: Major
Reporter: Seppo Jaakola Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: None


 Description   

wsrep XID is checkpointed in innodb rollback segment during transaction commit, and this checkpointing is supposed to happen in strict GTID sequence order.

While troubleshooting MDEV-23851 under highly conflicting multi-master work loads, it was observed that Xid checkpointing order can be violated in two scenarios:

  • if MariaDB is configured with binlogging enabled but with log_slave_updates = OFF, fairly frequent Xid checkpoint ordering violating happens
  • write sets, which failed in certification can perform Xid checkpointing too early in receiving nodes

These Xid checkpointing failures do not cause the issue with MDEV-23851, but they make troubleshooting MDEV-23851 harder by hiding the underlying issue



 Comments   
Comment by Mark Anstice [ 2021-06-02 ]

If I understand correctly the net effect of the change means log_slave_updates option is now always enabled, on a MariaDB 10.3.28 3-node Galera cluster with binary logging enabled, I now see all writes to all nodes appearing in the binary logs on all nodes which is a change in behaviour. Unfortunately as the change came from a merge from the 10.2 branch it has taken me a while to track down as I have seen an impact on disk IO and disk space on my cluster nodes since upgrading from 10.3.9 to 10.3.28. I am concerned about the extra demands on Galera cluster nodes, and is there any way to achieve the previous behaviour?

Comment by Mark Anstice [ 2021-06-03 ]

I think I may have jumped the gun, the change here is about always enabling the binary log irrespective of the log_slave_updates setting, as I enable the binary log explicitly in my configuration it should have no effect. However, there is a change of behaviour where all nodes are logging all updates to all binary logs in 10.3.28, I'll see if I can track down where that change happened and raise a new ticket.

Generated at Thu Feb 08 09:29:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.