Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32830

refactor XA binlogging for better integration with BGC/replication/recovery

    XMLWordPrintable

Details

    • Q2/2025 Development, Q3/2025 Server Development

    Description

      MDEV-32830 I. refactor XA binlogging for better integration with
      BGC/replication/recovery

      The task is being the part I of the series of four that addresses MDEV-31949 in
      two main directions which are xa parallel slave performance (remedied already by
      MDEV-33668) and xa transaction crash-recovery MDEV-33168.

      This one improves upon MDEV-742 design's XA binlogging to
      facilitate to the crash-recovery (the actual binlog-based recovery
      is coming in the part IV (MDEV-33668).

      Legends:
      XA-COMPLETE is typed with there's no difference between COMMIT and ROLLBACK
      actions as far as binlogging is concerned.

      High-level design

      The immediate objective is when binlog is ON, handle execution of a XA transaction,
      including binlogging, as uniform as possible with the normal BEGIN-COMMIT
      transaction so that MDEV-33668 would extend the existing binlog-based recovery
      over XA\footnote

      {MDEV-742 intended the roll-forward recovery of MDEV-18989}

      .

      The requirement implies that XA-PREPARE first is prepared in engines and after
      that accumulated replication events are written to the binary log, naturally
      without any completion event as it's unknown yet.
      When later XA-"COMPLETE" that is XA-COMMIT and XA-PREPARE follows up, the binary
      logging of respective Query event takes place first to be concluded with
      the engine action.
      One can perceive such scheme as if a normal transaction logging is split in the
      middle into two parts.

      With binlog is enabled both phases' loggings go through binlog-group-commit,
      where XA-PREPARE "sub-transaction" merely groups for binary logging so skips the
      engine action while XA-"COMPLETE" does both, that is the logging and an ordered
      "complete". This binlog-grouping behavior is consistent between completions
      from the native and external connections. The completion from the external
      connection, which the slave's option too, makes sure that
      the binlog group commit calls hton::"complete"_by_xid only when the
      transaction participant (Engine) is defined with commit_ordered.
      (This part of overall MDEV-31949 assumes such engines' hton::"complete"_by_xid
      is logically equivalent to its ordered_commit, which is btw the case for Innodb).
      Being a participant of binlog-group-commit designates either XA phase
      is recoverable (not implemented here) from active binlogs determined
      by binlog-checkpoint.

      Additionally a corner case of engine read-only XA transaction is addressed.
      Previously it was streamlined with logging an empty XA-PREPARE group of binlog
      events concluded by XA-"COMPLETE" query-event.
      Now when a preparing XA transaction is found to have only read-only engine
      branches or none it is marked for rollback as XA_RDONLY optimization:

      • nothing gets logged at the prepare time an XA_RDONLY note is generated and
      • it's rolled back at disconnect
        For XA-COMPLETE to tell whether the prepare phase was logged or not the XID
        state object is extended with a boolean flag which is a part internal interface
        for recovery implementation of MDEV-33668 (that is going to raise this flag when a
        prepared XA would be proven recoverable).

      While this task is not concerned with other than binlog transaction coordinator like
      TC_LOG_MMAP its current (if any) level of support is not made worse.

      Notable low-level design points

      1. ha_prepare() has to prepare engine branches before binlogging, so
      its so far indiscriminate hton loop extracts the binlog branch out
      (that repeats a pattern probed by 1-phase binlog_commit() by MDEV-21117).
      2. conversely binlog_commit, binlog_rollback and binlog_commit_by_xid,
      binlog_rollback_by_xid have to execute the binlog (internal 2pc coordinator
      role) as the first step.
      3. For skipping the engine actions inside binlog-group-commit the preparing
      XA enters it (as leader or follower) having its cache_mngr::using_xa
      set to FALSE.
      There is no implication on binlog-checkpoint based recovery, because
      via p.1 XA-PREPARE is already durable so binlog-checkpoint event is generated
      to account the safely (not anymore doubtfully) prepared state.
      4. Follow the normal transaction pattern, XA-COMPLETE executes the engine part
      cache_mngr::using_xa = true and it will not anymore specify
      binlog_group_entry::need_unlog which also necessary for successful crash-recovery.
      5. Follow the normal transaction pattern, XA-COMPLETE does not execute
      hton::"complete"_ordered if the engine/hton does not have it.
      This applies to the external completion-by-xid too.
      In other words the ordered commit aka the fast part of commit is invoked when
      an (engine) participant has both commit_ordered() and
      "complete"_by_xid methods.
      6. Logging of XA-PREPARE and XA-COMMIT engage (remains doing so)
      binlog_commit(), as the footprint of doing that is very little.

      To address the read-only XA

      7. ha_prepare() has to recognize the engine read-only status of the transaction,
      it notes the user about the fact of the ro status and marks the read-only XA to
      optimize it away at disconnect
      8. for the external connection XA-COMPLETE to find out whether the xa status
      is read-only XID_cache_element::xap_binlogged_awaiting_xac is introduced
      and manipulated as the following:
      raised by XA-PREPARE at flushing to binlog (so it's read-write),
      checked by binlog_commit,rollback() to make its mind about the XA-COMPLETE logging.
      The added flag is a piece of interface with MDEV-33668 recovery.

      Attachments

        1. bug1.txt
          4 kB
          Roel Van de Paar
        2. bug10.txt
          6 kB
          Roel Van de Paar
        3. bug2.txt
          3 kB
          Roel Van de Paar
        4. bug3.txt
          3 kB
          Roel Van de Paar
        5. bug4.txt
          0.5 kB
          Roel Van de Paar
        6. bug5.txt
          7 kB
          Roel Van de Paar
        7. bug6.txt
          5 kB
          Roel Van de Paar
        8. bug7.txt
          7 kB
          Roel Van de Paar
        9. bug8.txt
          6 kB
          Roel Van de Paar
        10. bug9.txt
          5 kB
          Roel Van de Paar

        Issue Links

          Activity

            People

              bnestere Brandon Nesterenko
              Elkin Andrei Elkin
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.