Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31949

slow parallel replication of user xa

Details

    • New Feature
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 12.0
    • Replication
    • None

    Description

      With some type of load, such as one statement "light" transactions, the user xa may perform much poorer on the optimistic parallel slave than an equivalent normal - BEGIN..COMMIT - load.

      This ticket aims at fixing the slowness while the XA high-availability is to be preserved.

      The whole work is arranged in two parts:
      1. refactoring of the user xa binary logging to facilitate slave side parallel execution
      MDEV-32830;
      2. the slave side change of the parallel scheduler to apply Round-Robin distribution
      for the user XA transactions.

      Attachments

        Issue Links

          Activity

            Elkin Andrei Elkin added a comment -

            masonmariadb, the whole work includes now the IVth recovery commit. Sergei is reviewing the general binlogging part I. split from this ticket as MDEV-32830, which has been refined to reflect the recovery concern raised by Kristian.
            The part II is on Kristian, I'll notify him once the part IV is finally readied for review by Sergei and him (ETA: Mon).
            The part III is about few Innodb changes that Marko needs look at, also when the IV is out.

            Elkin Andrei Elkin added a comment - masonmariadb , the whole work includes now the IVth recovery commit. Sergei is reviewing the general binlogging part I. split from this ticket as MDEV-32830 , which has been refined to reflect the recovery concern raised by Kristian. The part II is on Kristian, I'll notify him once the part IV is finally readied for review by Sergei and him (ETA: Mon). The part III is about few Innodb changes that Marko needs look at, also when the IV is out.

            I've started implementing MDEV-32020 to fix the XA stuff properly.

            knielsen Kristian Nielsen added a comment - I've started implementing MDEV-32020 to fix the XA stuff properly.
            Elkin Andrei Elkin added a comment - - edited

            The branch is force-updated

            ff8c19e7b64...82aedbe9213 HEAD -> bb-10.6-MDEV-31949 (forced update)
            

            to preserve its structure of four parts.
            Part I,II,III remain with In-Review status, the part IV needs incorporation with a Xid_log_list_event piece being worked on by Brandon.

            knielsen, 82aedbe9213 combined with the upcoming Xid_log_list_event must be a recovery solution that you prefer.
            Feel free to check it soon.
            It am sure it should not hold you on anymore for reviewing the part II.

            I'd be really glad to discuss your plan of implementation of xa-prepare deferred execution that you must've been about in the MDEV-32020 c omment.
            Having your design presented may help me to understand what you mean in 'proper'. I pointed it a number of times already that
            that bug case itself relates to the XA replication framework rather remotely. Much more tightly it deals with freaking specifics that unique key absent Rows events have manifested in various ways. It is resolvable within few hours by standard means. While
            the deferring XA transaction execution may be a solution too, I do feel uncomfortable about that, especially when it's rather clear that the deferring method must ensue a number of design hurdles esp recovery related.

            Elkin Andrei Elkin added a comment - - edited The branch is force-updated ff8c19e7b64...82aedbe9213 HEAD -> bb-10.6-MDEV-31949 (forced update) to preserve its structure of four parts. Part I,II,III remain with In-Review status, the part IV needs incorporation with a Xid_log_list_event piece being worked on by Brandon. knielsen , 82aedbe9213 combined with the upcoming Xid_log_list_event must be a recovery solution that you prefer. Feel free to check it soon. It am sure it should not hold you on anymore for reviewing the part II. I'd be really glad to discuss your plan of implementation of xa-prepare deferred execution that you must've been about in the MDEV-32020 c omment. Having your design presented may help me to understand what you mean in 'proper'. I pointed it a number of times already that that bug case itself relates to the XA replication framework rather remotely. Much more tightly it deals with freaking specifics that unique key absent Rows events have manifested in various ways. It is resolvable within few hours by standard means. While the deferring XA transaction execution may be a solution too, I do feel uncomfortable about that, especially when it's rather clear that the deferring method must ensue a number of design hurdles esp recovery related.
            Elkin Andrei Elkin added a comment - - edited

            Updated the branch in the IV recovery part/HEAD of MDEV-33168:

            82aedbe9213...a71f13489d0 HEAD -> bb-10.6-MDEV-31949 (forced update)
            

            The current HEAD is enriched with a number of mtr tests and fixes few glitches in implementation of business logics.
            Integration with Xid_log_list_event is scheduled for tomorrow.

            Elkin Andrei Elkin added a comment - - edited Updated the branch in the IV recovery part/HEAD of MDEV-33168 : 82aedbe9213...a71f13489d0 HEAD -> bb-10.6-MDEV-31949 (forced update) The current HEAD is enriched with a number of mtr tests and fixes few glitches in implementation of business logics. Integration with Xid_log_list_event is scheduled for tomorrow.
            Elkin Andrei Elkin added a comment -

            Howdy julien.fritsch!

            The whole issue is of 3 parts:
            1. refactoring of XA binloggin (MDEV-32830)
            2. parallel slave XA execution performance
            3. XA binlog-recovery (MDEV-33168)

            The first two were ready for review for some time. serg ] has provided some notes to the part I. There must be more to come after my reply about a week ago. knielsen did so to the part II of a previous version of the patch and those of his notes have been accounted in the current branch. He also strongly suggested to change the part 3's plan from the MDEV-21649 method to the MDEV-33168 one.
            The latter p.3 has been under my testing for few last days. I am about to publish it for review and QA preliminary testing: ETA 1-2 days.

            Elkin Andrei Elkin added a comment - Howdy julien.fritsch ! The whole issue is of 3 parts: 1. refactoring of XA binloggin ( MDEV-32830 ) 2. parallel slave XA execution performance 3. XA binlog-recovery ( MDEV-33168 ) The first two were ready for review for some time. serg ] has provided some notes to the part I. There must be more to come after my reply about a week ago. knielsen did so to the part II of a previous version of the patch and those of his notes have been accounted in the current branch. He also strongly suggested to change the part 3's plan from the MDEV-21649 method to the MDEV-33168 one. The latter p.3 has been under my testing for few last days. I am about to publish it for review and QA preliminary testing: ETA 1-2 days.

            People

              Elkin Andrei Elkin
              Elkin Andrei Elkin
              Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.