Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24327

wsrep XID checkpointing order compromised in multi-master work loads

Details

    Description

      wsrep XID is checkpointed in innodb rollback segment during transaction commit, and this checkpointing is supposed to happen in strict GTID sequence order.

      While troubleshooting MDEV-23851 under highly conflicting multi-master work loads, it was observed that Xid checkpointing order can be violated in two scenarios:

      • if MariaDB is configured with binlogging enabled but with log_slave_updates = OFF, fairly frequent Xid checkpoint ordering violating happens
      • write sets, which failed in certification can perform Xid checkpointing too early in receiving nodes

      These Xid checkpointing failures do not cause the issue with MDEV-23851, but they make troubleshooting MDEV-23851 harder by hiding the underlying issue

      Attachments

        Activity

          Mark Anstice Mark Anstice added a comment -

          If I understand correctly the net effect of the change means log_slave_updates option is now always enabled, on a MariaDB 10.3.28 3-node Galera cluster with binary logging enabled, I now see all writes to all nodes appearing in the binary logs on all nodes which is a change in behaviour. Unfortunately as the change came from a merge from the 10.2 branch it has taken me a while to track down as I have seen an impact on disk IO and disk space on my cluster nodes since upgrading from 10.3.9 to 10.3.28. I am concerned about the extra demands on Galera cluster nodes, and is there any way to achieve the previous behaviour?

          Mark Anstice Mark Anstice added a comment - If I understand correctly the net effect of the change means log_slave_updates option is now always enabled, on a MariaDB 10.3.28 3-node Galera cluster with binary logging enabled, I now see all writes to all nodes appearing in the binary logs on all nodes which is a change in behaviour. Unfortunately as the change came from a merge from the 10.2 branch it has taken me a while to track down as I have seen an impact on disk IO and disk space on my cluster nodes since upgrading from 10.3.9 to 10.3.28. I am concerned about the extra demands on Galera cluster nodes, and is there any way to achieve the previous behaviour?
          Mark Anstice Mark Anstice added a comment -

          I think I may have jumped the gun, the change here is about always enabling the binary log irrespective of the log_slave_updates setting, as I enable the binary log explicitly in my configuration it should have no effect. However, there is a change of behaviour where all nodes are logging all updates to all binary logs in 10.3.28, I'll see if I can track down where that change happened and raise a new ticket.

          Mark Anstice Mark Anstice added a comment - I think I may have jumped the gun, the change here is about always enabling the binary log irrespective of the log_slave_updates setting, as I enable the binary log explicitly in my configuration it should have no effect. However, there is a change of behaviour where all nodes are logging all updates to all binary logs in 10.3.28, I'll see if I can track down where that change happened and raise a new ticket.

          People

            jplindst Jan Lindström (Inactive)
            seppo Seppo Jaakola
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.