Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27927

row_sel_try_search_shortcut_for_mysql() does not latch a page, violating read view isolation

Details

    Description

      We have been testing Galera-MariaDB on data consistency models. We (together with the Galera contributor) found REPEATABLE READ anomalies with Galera disabled. Hence, there might be something wrong with MariaDB. We used the default REPEATABLE READ isolation in our testing.

      The collected data (or histories of transaction executions):
      https://gist.github.com/sciascid/aae12c130bdcfe1930601f19ce0f29d5/archive/6b214d707fdf0fb72b2505955c0e23d38717db6d.zip

      The specific anomaly was shown at https://github.com/codership/galera/issues/609#issuecomment-1036096287 Also see the table attached.

      Notation: w/r(key, value, client_id, txn_id); operations of a txn (denoted by a column) are listed from top down.

      Anomaly: txn393 read value 3328 on key 0 (red cell, column 1) in between the reads of value 33324 on key 0 (yellow cells).

      Setup: 3 server nodes and 3 client nodes. Galera Version: 26.4.9. MariaDB Version: 10.4.22.

      Attachments

        Issue Links

          Activity

            Thank you vlad.lesin, your analysis is plausible and suggestion is reasonable. Let us remove the ahi_latch argument.

            marko Marko Mäkelä added a comment - Thank you vlad.lesin , your analysis is plausible and suggestion is reasonable. Let us remove the ahi_latch argument.
            axel Axel Schwenke added a comment -

            Attached benchmark results: MDEV-27927.pdf

            The good news: there are no regressions in 10.4 branch (red vs. green) nor in the 10.6 branch (pink vs. light blue). But I also tested 10.6 with AHI turned off (the default) and it was much better for batched UPDATE workloads.

            axel Axel Schwenke added a comment - Attached benchmark results: MDEV-27927.pdf The good news: there are no regressions in 10.4 branch (red vs. green) nor in the 10.6 branch (pink vs. light blue). But I also tested 10.6 with AHI turned off (the default) and it was much better for batched UPDATE workloads.

            There were two bugs found during 10.6 fix version testing with RQG(bb-10.6-MDEV-27927-RR-anomaly branch): MDEV-29635 and MDEV-29622. But they don't relate to the fix. So, I would say the testing looks good.

            vlad.lesin Vladislav Lesin added a comment - There were two bugs found during 10.6 fix version testing with RQG(bb-10.6- MDEV-27927 -RR-anomaly branch): MDEV-29635 and MDEV-29622 . But they don't relate to the fix. So, I would say the testing looks good.

            Some update. Pushed bb-10.[36]-MDEV-27927-RR-anomaly for final testing and code review.

            vlad.lesin Vladislav Lesin added a comment - Some update. Pushed bb-10. [36] - MDEV-27927 -RR-anomaly for final testing and code review.

            Thank you, the 10.3 version looks good to me. I think that the debug assertion that you would adjust in buf_page_get_known_nowait() can be safely removed altogether, and that function signature can remain unchanged. In MDEV-19514, the assertion was removed when the function was replaced with equivalent logic. The ‘freed’ status of a file page should not change while a thread is holding a buffer-fix or a latch on the page.

            marko Marko Mäkelä added a comment - Thank you, the 10.3 version looks good to me. I think that the debug assertion that you would adjust in buf_page_get_known_nowait() can be safely removed altogether, and that function signature can remain unchanged. In MDEV-19514 , the assertion was removed when the function was replaced with equivalent logic. The ‘freed’ status of a file page should not change while a thread is holding a buffer-fix or a latch on the page.

            People

              vlad.lesin Vladislav Lesin
              nobiplusplus Si Liu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.