Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25016

Race condition between lock_sys_t::cancel() and page split or merge

    XMLWordPrintable

Details

    Description

      In MDEV-24789, we are minimizing the use of lock_sys.latch. It turns out that when the acquisition of an exclusive lock_sys.latch in innobase_kill_query() is replaced with an acquisition of a shared lock_sys.latch, a number of tests would occasionally hang:

      • rpl.rpl_parallel_optimistic
      • rpl.rpl_parallel_optimistic_xa_lsu_off
      • rpl.rpl_parallel_optimistic_nobinlog

      It seems that we can work around this bug by making innobase_kill_query() acquire an exclusive lock_sys.latch instead of a shared one. This work-around will obviously hurt performance, and I would think that it is merely reducing the probability of such hangs, instead of fixing them altogether. Until this bug is fixed, we can invoke the work-around whenever thd_need_wait_reports() holds.

      Note: thd_need_wait_reports() holds even when no replication is being used, and only the option log_bin is enabled. That condition seems to be necessary, because without it, the test binlog.rpl_parallel_optimistic would hang (fall back to innodb_lock_wait_timeout).

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.