Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34877

Port "Bug #11745929 Change lock priority so that the transaction holding S-lock gets X-lock first" fix from MySQL to MariaDB

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.5, 10.6, 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL), 11.4, 11.5(EOL), 11.6(EOL), 11.7(EOL)
    • 11.4.5, 11.7.2

    Description

      MDEV-27025 describes the following case:
      1. trx1 gets an S-lock granted
      2. trx2 wants an X-lock, but has to wait for trx1
      3. trx1 now wants an X-lock, but waits for trx2

      The try to fix MDEV-27025 caused MDEV-27992, what made us to revert MDEV-27025 fix(see also this comment). It seems, MySQL's "Bug #11745929" fix solves MDEV-27025 issue.

      Attachments

        Issue Links

          Activity

            Update:
            I ported test case from MySQL and fixed all bugs found during debugging the test case. Currently, the code passed local mtr testing. I am refactoring the code to correspond to MariaDB coding standards. Then I am going to launch local RQG testing and pass it to standard RQG testing to Matthias after I catch and fix bugs locally.

            vlad.lesin Vladislav Lesin added a comment - Update: I ported test case from MySQL and fixed all bugs found during debugging the test case. Currently, the code passed local mtr testing. I am refactoring the code to correspond to MariaDB coding standards. Then I am going to launch local RQG testing and pass it to standard RQG testing to Matthias after I catch and fix bugs locally.

            vlad.lesin It was great to see the analysis and the solution for the specific issue caused by a lock being appended at the end during implicit to explicit conversion. Usually the idea of implicit locking works great as not many modifications is expected to generate conflict. The general idea I had in mind is (I think I have seen in other DBs) that we either have an implicit lock (some modification and no one else is interested) or a queue of explicit locks but not both. So, it intrigued me that this scenario actually shows the presence of both implicit and explicit locks simultaneously. I checked it out and I see we actually could have such case and we try to check for implicit locks unconditionally. So, I agree with the analysis and the fix for this specific point. May be in future we could consider having this constraint if we see some benefit skipping the implicit lock check (it is indeed expensive for 2ndary index) when we see some explicit lock is already there.

            I wanted to check if this optimization works for Read-Committed isolation level where we could be dealing with LOCK_REC_NOT_GAP with LOCK_X. I was specifically looking into ..

            inline bool lock_rec_mode_suits_for_bypassing(unsigned mode)
            {
              return /* !(mode & LOCK_INSERT_INTENTION) && !(mode & LOCK_GAP) &&
                     (mode & LOCK_MODE_MASK) == LOCK_X; */
                     (mode & (LOCK_INSERT_INTENTION | LOCK_GAP | LOCK_MODE_MASK)) == LOCK_X;
            }
            

            I do see now that this condition would actually pass irrespective of LOCK_REC_NOT_GAP. So, it should work great and no issues here. We could perhaps add a test for RC isolation.3. From [2], we could also infer one interesting point that it is not only X lock but also the GAP lock (Next Key lock also covers the gap before) that is placed ahead in the queue. Now this could actually result an X lock followed by II (Insert intention) lock in a queue where both are in granted state. I don't think this is a real issue as II lock once granted can be ignored but it could possibly hit some assert as it breaks our design constraint that any granted lock in the queue should not conflict with locks ahead in the queue. I have not verified it yet and it is possible that I am mistaken.

            debarun Debarun Banerjee added a comment - vlad.lesin It was great to see the analysis and the solution for the specific issue caused by a lock being appended at the end during implicit to explicit conversion. Usually the idea of implicit locking works great as not many modifications is expected to generate conflict. The general idea I had in mind is (I think I have seen in other DBs) that we either have an implicit lock (some modification and no one else is interested) or a queue of explicit locks but not both. So, it intrigued me that this scenario actually shows the presence of both implicit and explicit locks simultaneously. I checked it out and I see we actually could have such case and we try to check for implicit locks unconditionally. So, I agree with the analysis and the fix for this specific point. May be in future we could consider having this constraint if we see some benefit skipping the implicit lock check (it is indeed expensive for 2ndary index) when we see some explicit lock is already there. I wanted to check if this optimization works for Read-Committed isolation level where we could be dealing with LOCK_REC_NOT_GAP with LOCK_X. I was specifically looking into .. inline bool lock_rec_mode_suits_for_bypassing(unsigned mode) { return /* !(mode & LOCK_INSERT_INTENTION) && !(mode & LOCK_GAP) && (mode & LOCK_MODE_MASK) == LOCK_X; */ (mode & (LOCK_INSERT_INTENTION | LOCK_GAP | LOCK_MODE_MASK)) == LOCK_X; } I do see now that this condition would actually pass irrespective of LOCK_REC_NOT_GAP. So, it should work great and no issues here. We could perhaps add a test for RC isolation.3. From [2] , we could also infer one interesting point that it is not only X lock but also the GAP lock (Next Key lock also covers the gap before) that is placed ahead in the queue. Now this could actually result an X lock followed by II (Insert intention) lock in a queue where both are in granted state. I don't think this is a real issue as II lock once granted can be ignored but it could possibly hit some assert as it breaks our design constraint that any granted lock in the queue should not conflict with locks ahead in the queue. I have not verified it yet and it is possible that I am mistaken.

            Update: I did local RQG testing, caught and fixed a couple of bugs, and the last round of RQG testing did not find any related bugs. The next step is code cleanup to prepare it to code review, code review itself and RQG testing from Matthias. It can take up to 3-4 work days to finish it, roughly, 1 day for code clean-up and code review, 1 day for testing, and 1-2 days for analysing rr traces recorded during RQG testing.

            Monty asked me to provide some high level description. This MDEV will give less rollbacks caused by deadlocks for the cases when transactions concurrently combine locking reads with writes, i.e. UPDATES and INSERTS, on some limited range of records. It can improve parallel replication performance, as less deadlocks cause less transaction retries. At the other hand, the necessity to insert locks to the certain place in lock queue can cause more record lock objects creating instead of reusing lock bitmaps under the circumstances described above, i.e. more memory will be used for such kind of loads.

            vlad.lesin Vladislav Lesin added a comment - Update: I did local RQG testing, caught and fixed a couple of bugs, and the last round of RQG testing did not find any related bugs. The next step is code cleanup to prepare it to code review, code review itself and RQG testing from Matthias. It can take up to 3-4 work days to finish it, roughly, 1 day for code clean-up and code review, 1 day for testing, and 1-2 days for analysing rr traces recorded during RQG testing. Monty asked me to provide some high level description. This MDEV will give less rollbacks caused by deadlocks for the cases when transactions concurrently combine locking reads with writes, i.e. UPDATES and INSERTS, on some limited range of records. It can improve parallel replication performance, as less deadlocks cause less transaction retries. At the other hand, the necessity to insert locks to the certain place in lock queue can cause more record lock objects creating instead of reusing lock bitmaps under the circumstances described above, i.e. more memory will be used for such kind of loads.

            Thanks for the locking optimization and the code looks very good and well organized to me. I have added a few comments in the pull request.

            Important one to evaluate is about bypassing over an waiting S lock or a waiting lock for which the waiting transaction is another entity.

            debarun Debarun Banerjee added a comment - Thanks for the locking optimization and the code looks very good and well organized to me. I have added a few comments in the pull request . Important one to evaluate is about bypassing over an waiting S lock or a waiting lock for which the waiting transaction is another entity.

            MDEV-35930, MDEV-35931, and MDEV-35932 were filed for some related low-level optimizations.

            marko Marko Mäkelä added a comment - MDEV-35930 , MDEV-35931 , and MDEV-35932 were filed for some related low-level optimizations.

            People

              vlad.lesin Vladislav Lesin
              vlad.lesin Vladislav Lesin
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.