[MDEV-24327] wsrep XID checkpointing order compromised in multi-master work loads - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.5.8, 10.4(EOL)
Fix Version/s: 10.2.37, 10.3.28, 10.4.18, 10.5.9, 10.6.0
Component/s: Galera
Labels:
None

Description

wsrep XID is checkpointed in innodb rollback segment during transaction commit, and this checkpointing is supposed to happen in strict GTID sequence order.

While troubleshooting ~~MDEV-23851~~ under highly conflicting multi-master work loads, it was observed that Xid checkpointing order can be violated in two scenarios:

if MariaDB is configured with binlogging enabled but with log_slave_updates = OFF, fairly frequent Xid checkpoint ordering violating happens
write sets, which failed in certification can perform Xid checkpointing too early in receiving nodes

These Xid checkpointing failures do not cause the issue with ~~MDEV-23851~~, but they make troubleshooting ~~MDEV-23851~~ harder by hiding the underlying issue

Attachments

Activity

Ascending order - Click to sort in descending order

Mark Anstice added a comment - 2021-06-02 15:29

If I understand correctly the net effect of the change means log_slave_updates option is now always enabled, on a MariaDB 10.3.28 3-node Galera cluster with binary logging enabled, I now see all writes to all nodes appearing in the binary logs on all nodes which is a change in behaviour. Unfortunately as the change came from a merge from the 10.2 branch it has taken me a while to track down as I have seen an impact on disk IO and disk space on my cluster nodes since upgrading from 10.3.9 to 10.3.28. I am concerned about the extra demands on Galera cluster nodes, and is there any way to achieve the previous behaviour?

Mark Anstice added a comment - 2021-06-02 15:29 If I understand correctly the net effect of the change means log_slave_updates option is now always enabled, on a MariaDB 10.3.28 3-node Galera cluster with binary logging enabled, I now see all writes to all nodes appearing in the binary logs on all nodes which is a change in behaviour. Unfortunately as the change came from a merge from the 10.2 branch it has taken me a while to track down as I have seen an impact on disk IO and disk space on my cluster nodes since upgrading from 10.3.9 to 10.3.28. I am concerned about the extra demands on Galera cluster nodes, and is there any way to achieve the previous behaviour?

Mark Anstice added a comment - 2021-06-03 09:26

I think I may have jumped the gun, the change here is about always enabling the binary log irrespective of the log_slave_updates setting, as I enable the binary log explicitly in my configuration it should have no effect. However, there is a change of behaviour where all nodes are logging all updates to all binary logs in 10.3.28, I'll see if I can track down where that change happened and raise a new ticket.

Mark Anstice added a comment - 2021-06-03 09:26 I think I may have jumped the gun, the change here is about always enabling the binary log irrespective of the log_slave_updates setting, as I enable the binary log explicitly in my configuration it should have no effect. However, there is a change of behaviour where all nodes are logging all updates to all binary logs in 10.3.28, I'll see if I can track down where that change happened and raise a new ticket.

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Seppo Jaakola

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2020-12-01 21:14

Updated:: 2021-06-03 09:26

Resolved:: 2020-12-17 10:49

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server