Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35781

MDL Deadlock in self-reference spider table creation requiring connections

Details

    Description

      When a self-referencing spider table creation requires connection to send queries to the data node, it could cause a hang on an MDL deadlock, until lock wait timeout lapses. Here are two examples:

      CREATE TABLE t ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"' AS SELECT 1; 
      CREATE TABLE t ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"'; 
      

      Note that the second example requires table discovery which results in spider connecting to the data node to query for the table structure.

      In both examples, a set global lock_wait_timeout= 1; reveals that the hang is indeed waiting for the lock wait timeout.

      The spider self-referencing detection is not called because the deadlock happens before call to any spider handler methods. To see this, place a set global lock_wait_timeout= 1; before either example. Run mtr --rr on the testcase Place a breakpoint at my_error and my_message on the condition that the error number is 1205 ER_LOCK_WAIT_TIMEOUT. Continue till the breakpoint is reached. Place another break point at mysql_parse, as well as rbreak ha_spider::.. Then do reverse-continue. The mysql_parse breakpoint is reached.

      One solution is to temporarily set lock_wait_timeout to a small value when doing the "pre-query" in the first example, and table discovery query in the second, and suggest checking self-referencing in the error message, just like MDEV-29676.

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment -

            It is not as a quickfix as I expected because of issues revealed: MDEV-35783 and MDEV-35794. The set statement lock_wait_timeout should fix the bug in this issue (see e.g. bb-10.5-mdev-35781 1e88a92e767e0edc9e7b5442eced2b4422e63434) but the testcase will not work cleanly without addressing those two other issues. I will get back to critical bugs and return to these issues later.

            ycp Yuchen Pei added a comment - It is not as a quickfix as I expected because of issues revealed: MDEV-35783 and MDEV-35794 . The set statement lock_wait_timeout should fix the bug in this issue (see e.g. bb-10.5-mdev-35781 1e88a92e767e0edc9e7b5442eced2b4422e63434) but the testcase will not work cleanly without addressing those two other issues. I will get back to critical bugs and return to these issues later.

            People

              ycp Yuchen Pei
              ycp Yuchen Pei
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.