Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34481

optimize away waiting for owned by prepared xa non-unique index record

Details

    • Bug
    • Status: In Testing (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 10.5, 10.6, 10.11, 11.1(EOL), 11.2(EOL), 11.4
    • 10.11, 11.4
    • Replication
    • None

    Description

      In the MDEV-32020 description, it's not repeated in detail here, two transaction cannot be isolated on slave because they used different non-unique indexes on master and slave.
      As the first of the two is a prepared XA

      --connection slave_worker_1
        xa start 'xid'; /* ... lock here ... */ ; xa prepare 'xid'
      

      the 2nd

      --connection slave_worker_2
        begin; /* ... get lock ... => wait/hang...error out  */
      

      could not wait up for the conflicting lock, despite the XA transaction did not really
      lock its target record of the non-clustered index.

      The hang was really caused by a method to reach the needed record which is the index scan.
      The scanning orthodoxically could not step over a record that was rightfully locked by the XA.
      However as the record can not be targeted by the 2nd transaction, otherwise the transactions
      would have sensed the conflict back on master, it would be alright to not panic
      at seeing a timeout error from the engine. Instead the scanning would just proceed to next free index records of the same key value and ultimately must reach the target one.
      More generally, on the way to its target all busy records belonging to earlier (binlog order) transactions need not to be locked by the current one.

      A patch is implemented to carry out the description's agenda.

      Attachments

        Issue Links

          Activity

            Elkin Andrei Elkin added a comment -

            Howdy Brandon!

            Could you please have a look at bb-10.11-andrei?
            You may not be the only reviewer, but let me pick you first.

            Cheers,
            Andrei

            Elkin Andrei Elkin added a comment - Howdy Brandon! Could you please have a look at bb-10.11-andrei? You may not be the only reviewer, but let me pick you first. Cheers, Andrei
            Elkin Andrei Elkin added a comment -

            marko, thank you for attending this ticket!

            Sure Innodb team needs to look at the patch. Actually it was under way already in form of MDEV-34466 that I had to find out as it was blocking my progress on these fixes (that include a hunk for the latter bug). Btw they do not really change innodb_lock_wait_timeout policy, at least never beyond replication.
            To view this work as removing limitations (which they are) of engine/server locking protocol started by MDEV-26682 and followed in
            MDEV-33454 is much more fair in my opinion.

            As to the comparison of this and Kristian's method of resolving MDEV-32020, I only can repeat for what's been said multiple times.
            Arguably they are not mutually exclusive.
            Yet MDEV-742 is clearly a preferred choice when the user requires failover to slave.
            As it has to be reliable and fast the method of collecting and deferred applying of XA events simply may not be an option ('cos
            it *hopes* on replaying would succeed while time to spend on that is always affordable).
            Neither I am certain that implementation of collecting for deferred applying XA events is really straightforward.

            Elkin Andrei Elkin added a comment - marko , thank you for attending this ticket! Sure Innodb team needs to look at the patch. Actually it was under way already in form of MDEV-34466 that I had to find out as it was blocking my progress on these fixes (that include a hunk for the latter bug). Btw they do not really change innodb_lock_wait_timeout policy, at least never beyond replication. To view this work as removing limitations (which they are) of engine/server locking protocol started by MDEV-26682 and followed in MDEV-33454 is much more fair in my opinion. As to the comparison of this and Kristian's method of resolving MDEV-32020 , I only can repeat for what's been said multiple times. Arguably they are not mutually exclusive. Yet MDEV-742 is clearly a preferred choice when the user requires failover to slave. As it has to be reliable and fast the method of collecting and deferred applying of XA events simply may not be an option ('cos it * hopes * on replaying would succeed while time to spend on that is always affordable). Neither I am certain that implementation of collecting for deferred applying XA events is really straightforward.

            > Neither I am certain that implementation of collecting for deferred applying XA events is really straightforward.

            There is already a (prototype) implementation of this, see knielsen_mdev32020

            > Yet MDEV-742 is clearly a preferred choice when the user requires failover to slave.

            Really? The failing over of a prepared XA transaction is a (very) rare operation. The normal apply of an XA transaction will occur thousands or millions of times more often. Applying the XA PREPARE on the slave pessimises the common operation by doubling the work on the slave to process two event groups, two GTIDs, and two commits inside InnoDB.

            I think there's a misconception that somehow the XA PREPAREd transaction will normally be already applied on the slave in the case where a failover occurs. That's unlikely to be the case, especially for transactions that takes longer to apply, as the commit on the master will normally arrive shortly after the prepare, while the slave can only start applying the xa prepare after it has been synced to the binlog on the master.

            For the rare user that really wants to recover an XA PREPAREd (but not committed) master transaction on the slave, it will be necessary to apply the to-be-recovered prepare on the slave. The requirements for this are similar, though it is simplified in the MDEV-32020 proposal since there is no requirement to apply them in order like in current code. However, there is no need to do so for the wast majority of transactions that are prepared+committed on the master, that just introduces a lot of unnecessary overhead and complications.

            knielsen Kristian Nielsen added a comment - > Neither I am certain that implementation of collecting for deferred applying XA events is really straightforward. There is already a (prototype) implementation of this, see knielsen_mdev32020 > Yet MDEV-742 is clearly a preferred choice when the user requires failover to slave. Really? The failing over of a prepared XA transaction is a (very) rare operation. The normal apply of an XA transaction will occur thousands or millions of times more often. Applying the XA PREPARE on the slave pessimises the common operation by doubling the work on the slave to process two event groups, two GTIDs, and two commits inside InnoDB. I think there's a misconception that somehow the XA PREPAREd transaction will normally be already applied on the slave in the case where a failover occurs. That's unlikely to be the case, especially for transactions that takes longer to apply, as the commit on the master will normally arrive shortly after the prepare, while the slave can only start applying the xa prepare after it has been synced to the binlog on the master. For the rare user that really wants to recover an XA PREPAREd (but not committed) master transaction on the slave, it will be necessary to apply the to-be-recovered prepare on the slave. The requirements for this are similar, though it is simplified in the MDEV-32020 proposal since there is no requirement to apply them in order like in current code. However, there is no need to do so for the wast majority of transactions that are prepared+committed on the master, that just introduces a lot of unnecessary overhead and complications.
            Elkin Andrei Elkin added a comment - - edited

            >> Yet MDEV-742 is clearly a preferred choice when the user requires failover to slave.

            >Really? The failing over of a prepared XA transaction is a (very) rare operation.

            Sorry, 'rare operation' is something none, with all due respect to you dear Kristian, but an actual user can claim.
            And we don't know the future either.
            Similarly to 'normally' here
            > as the commit on the master will normally arrive shortly after the prepare
            Let's admit that it's all your adjectives and assumptions that may not come true.
            To the usability matter, also please refer to mysql@oracle state of xa replication.
            I've not heard from them, having this solution in place since 2013, any ideas to cancel it.

            I suggest to leave this matter alone. It's a firm and obvious fact that only the eager replication of
            XA-prepare provides instant recovery.

            This approach can take its toll Actually not, find here "extra fsync" dismissal.
            > two GTIDs, and two commits inside InnoDB.
            However it's not about 'doubling the work' at all. I and Brandon shared MDEV-31949 consoling benchmarkings. There's no reason to doubt they could be improved still.

            Elkin Andrei Elkin added a comment - - edited >> Yet MDEV-742 is clearly a preferred choice when the user requires failover to slave. >Really? The failing over of a prepared XA transaction is a (very) rare operation. Sorry, 'rare operation' is something none, with all due respect to you dear Kristian, but an actual user can claim. And we don't know the future either. Similarly to 'normally' here > as the commit on the master will normally arrive shortly after the prepare Let's admit that it's all your adjectives and assumptions that may not come true. To the usability matter, also please refer to mysql@oracle state of xa replication. I've not heard from them, having this solution in place since 2013, any ideas to cancel it. I suggest to leave this matter alone. It's a firm and obvious fact that only the eager replication of XA-prepare provides instant recovery. This approach can take its toll Actually not, find here "extra fsync" dismissal . > two GTIDs, and two commits inside InnoDB. However it's not about 'doubling the work' at all. I and Brandon shared MDEV-31949 consoling benchmarkings. There's no reason to doubt they could be improved still.
            Elkin Andrei Elkin added a comment - - edited

            Howdy Susil!

            Could you please concurrency tests to prove the fixes.
            They are currently pushed to bb-10.6-andrei to base on a not-fully completed MDEV-34466 branch. That was necessary to avoid some errors of the engine in a service that XA replication relies on.
            Read more on what the patch is about in the commit message as well.
            An mtr rpl suite test should be helping for insights how/what to test.
            In brief I consider an arbitrary size worker pool, multiple clients on master running mixed
            normal and XA transactions updating non-unique-only (including NULL-able unique) index
            tables in ROW format.
            All the sequential and the parallel optimistic or conservative slaves should complete the work
            with consistent data and gtid state in the end.

            GTID-connection mode (aka Change-Master-to master_use_gtdi) is irrelevant.
            Gtid strict mode should be on.

            Elkin Andrei Elkin added a comment - - edited Howdy Susil! Could you please concurrency tests to prove the fixes. They are currently pushed to bb-10.6-andrei to base on a not-fully completed MDEV-34466 branch. That was necessary to avoid some errors of the engine in a service that XA replication relies on. Read more on what the patch is about in the commit message as well. An mtr rpl suite test should be helping for insights how/what to test. In brief I consider an arbitrary size worker pool, multiple clients on master running mixed normal and XA transactions updating non-unique-only (including NULL-able unique) index tables in ROW format. All the sequential and the parallel optimistic or conservative slaves should complete the work with consistent data and gtid state in the end. GTID-connection mode (aka Change-Master-to master_use_gtdi) is irrelevant. Gtid strict mode should be on.
            knielsen Kristian Nielsen added a comment - Some comments on the patch: https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/thread/KFTWZ2CCNRFDJ77B4G4TGIHXMVMCFVHC/

            I suggest that this be tested also with innodb_snapshot_isolation=ON. Hopefully it will replace some lock waits with other errors (ER_LOCK_WAIT_TIMEOUT, ER_CHECKREAD).

            marko Marko Mäkelä added a comment - I suggest that this be tested also with innodb_snapshot_isolation=ON . Hopefully it will replace some lock waits with other errors ( ER_LOCK_WAIT_TIMEOUT , ER_CHECKREAD ).

            People

              susil.behera Susil Behera
              Elkin Andrei Elkin
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.