Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32451

Hang when locking both spider table and remote table in the same statement

Details

    Description

      A simplified version of a test mentioned in a comment in MDEV-29854[1], tested at 11.0 0199a5d0f634051f23d4c7f7ce7651c7f93c818f. Probably also existing in other versions.

      --echo #
      --echo # MDEV-XXXXX Hang when locking both spider table and remote table in the same statement
      --echo #
      --disable_query_log
      --disable_result_log
      --source ../../t/test_init.inc
      --enable_result_log
      --enable_query_log
      evalp CREATE SERVER srv FOREIGN DATA WRAPPER mysql
      OPTIONS (SOCKET "$MASTER_1_MYSOCK", DATABASE 'test',user 'root');
       
      create table t (c int);
      create table t2 (c int) ENGINE=Spider
      COMMENT='WRAPPER "mysql", srv "srv",TABLE "t"';
      LOCK TABLES t WRITE,t2 WRITE;
      drop table t, t2;
       
      drop server srv;
      --disable_query_log
      --disable_result_log
      --source ../../t/test_deinit.inc
      --enable_result_log
      --enable_query_log
      --echo #
      --echo # end of test mdev_XXXXX
      --echo #
      

      The metadata lock of t is acquired when opening tables. Then in lock_external() it iterates over all tables to lock. After locking t, when trying to lock t2, spider sends a query to the data node to lock t again. Somehow the MDL lock of t is not released yet, so the triggered open_table() hangs at waiting for MDL until lock_wait_timeout (default 1h) has passed.

      Not sure whether it counts as a spider bug or a sql layer bug.

      [1] https://jira.mariadb.org/browse/MDEV-29854?focusedCommentId=238531&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-238531

      For comparison, if we simply lock tables with the same name / alias in one statement, the parser would pick out the issue and throw a 1066 ER_NONUNIQ_TABLE

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment - - edited

            One possible solution is the same as MDEV-29676: set lock wait timeout to 1 second when sending queries to lock remote tables.

            ycp Yuchen Pei added a comment - - edited One possible solution is the same as MDEV-29676 : set lock wait timeout to 1 second when sending queries to lock remote tables.
            Roel Roel Van de Paar added a comment - - edited

            I used the original testcase routed via CLI to check for the hangs

            INSTALL PLUGIN Spider SONAME 'ha_spider.so';
            CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET'../socket.sock',DATABASE'',USER'',PASSWORD '');  # Socket location must be valid. Other values not.
            CREATE TABLE t (a INT KEY,b INT UNIQUE) ENGINE=InnoDB;
            CREATE TABLE t2 (id INT,id2 INT,UNIQUE (id,id2),CONSTRAINT t_id_fk FOREIGN KEY(id2,id) REFERENCES t (id,id2)) ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"';
            LOCK TABLES t WRITE,t2 WRITE;
            

            All versions 10.4-11.3 dbg+opt hung. I confirmed the issue based on this.
            I also tried the MTR testcase on 11.3, but got:

            11.3.0 905c3d61e18ae6222d0d195c43d335046eec65d9

            mysqltest: At line 15: query 'LOCK TABLES t WRITE,t2 WRITE' failed: ER_CONNECT_TO_FOREIGN_DATA_SOURCE (1429): Unable to connect to foreign data source: srv
            

            Same outcome (1429) on 11.0.4 9880006b5ba28d7c7dc1596d958295c67efa5ad0 (Debug)

            Roel Roel Van de Paar added a comment - - edited I used the original testcase routed via CLI to check for the hangs INSTALL PLUGIN Spider SONAME 'ha_spider.so' ; CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET '../socket.sock' , DATABASE '' , USER '' , PASSWORD '' ); # Socket location must be valid. Other values not . CREATE TABLE t (a INT KEY ,b INT UNIQUE ) ENGINE=InnoDB; CREATE TABLE t2 (id INT ,id2 INT , UNIQUE (id,id2), CONSTRAINT t_id_fk FOREIGN KEY (id2,id) REFERENCES t (id,id2)) ENGINE=Spider COMMENT= 'WRAPPER "mysql",srv "srv",TABLE "t"' ; LOCK TABLES t WRITE,t2 WRITE; All versions 10.4-11.3 dbg+opt hung. I confirmed the issue based on this. I also tried the MTR testcase on 11.3, but got: 11.3.0 905c3d61e18ae6222d0d195c43d335046eec65d9 mysqltest: At line 15: query 'LOCK TABLES t WRITE,t2 WRITE' failed: ER_CONNECT_TO_FOREIGN_DATA_SOURCE (1429): Unable to connect to foreign data source: srv Same outcome (1429) on 11.0.4 9880006b5ba28d7c7dc1596d958295c67efa5ad0 (Debug)
            ycp Yuchen Pei added a comment - - edited

            I don't think the scenario is at all common, as it only applies to when the spider and data nodes are on the same server.

            A more realistic scenario, like the following

            create table t2 (c int) ENGINE=Spider
            COMMENT='WRAPPER "mysql", srv "srv",TABLE "t"';
            create table t3 (c int) ENGINE=Spider
            COMMENT='WRAPPER "mysql", srv "srv",TABLE "t"';
            LOCK TABLES t3 write, t2 WRITE;
            

            will not cause a hang.

            Therefore I am lowering the prio.

            ycp Yuchen Pei added a comment - - edited I don't think the scenario is at all common, as it only applies to when the spider and data nodes are on the same server. A more realistic scenario, like the following create table t2 (c int ) ENGINE=Spider COMMENT= 'WRAPPER "mysql", srv "srv",TABLE "t"' ; create table t3 (c int ) ENGINE=Spider COMMENT= 'WRAPPER "mysql", srv "srv",TABLE "t"' ; LOCK TABLES t3 write, t2 WRITE; will not cause a hang. Therefore I am lowering the prio.
            ycp Yuchen Pei added a comment -

            The reason that the example in the previous comment does not cause a hang is because spider is smart enough to deduplicate spider tables with the same connection info in its own lock list. The "hang" will still happen if the connection info to the same server are specified in different ways, e.g.

            evalp CREATE SERVER srv1 FOREIGN DATA WRAPPER mysql
            OPTIONS (SOCKET "$MASTER_1_MYSOCK", DATABASE 'test',user 'root');
            evalp CREATE SERVER srv2 FOREIGN DATA WRAPPER mysql
            OPTIONS (HOST "127.0.0.1", PORT $MASTER_1_MYPORT, DATABASE 'test',user 'root');
            create table t (c int);
            create table t2 (c int) ENGINE=Spider
            COMMENT='WRAPPER "mysql", srv "srv1",TABLE "t"';
            create table t3 (c int) ENGINE=Spider
            COMMENT='WRAPPER "mysql", srv "srv2",TABLE "t"';
            # hangs
            LOCK TABLES t3 WRITE, t2 WRITE;

            However, if not all the tables are spider tables, as in the original example (t a normal table, t2 a spider table pointing to t), then there's not much spider can do. The metadata lock is rather internal at the sql layer, and even if spider could query for the locking status of a spider table, it can not tell why an MDL lock is being held and whether to wait or give up. Unlike MDEV-29676, the query sent to the data node is actually the "main" query (LOCK), rather than an optional "auxiliary" query (get table status and table index), so it does not make as much sense to set lock wait timeout to 1 and give up.

            It would be nice if the sql layer could determine that t and t2 are actually the same table, e.g. by calling a new handler method say get_real_table() to get info of the real table behind an alias / spider table / etc.

            CC serg.

            ycp Yuchen Pei added a comment - The reason that the example in the previous comment does not cause a hang is because spider is smart enough to deduplicate spider tables with the same connection info in its own lock list. The "hang" will still happen if the connection info to the same server are specified in different ways, e.g. evalp CREATE SERVER srv1 FOREIGN DATA WRAPPER mysql OPTIONS (SOCKET "$MASTER_1_MYSOCK" , DATABASE 'test' , user 'root' ); evalp CREATE SERVER srv2 FOREIGN DATA WRAPPER mysql OPTIONS (HOST "127.0.0.1" , PORT $MASTER_1_MYPORT, DATABASE 'test' , user 'root' ); create table t (c int ); create table t2 (c int ) ENGINE=Spider COMMENT= 'WRAPPER "mysql", srv "srv1",TABLE "t"' ; create table t3 (c int ) ENGINE=Spider COMMENT= 'WRAPPER "mysql", srv "srv2",TABLE "t"' ; # hangs LOCK TABLES t3 WRITE, t2 WRITE; However, if not all the tables are spider tables, as in the original example (t a normal table, t2 a spider table pointing to t), then there's not much spider can do. The metadata lock is rather internal at the sql layer, and even if spider could query for the locking status of a spider table, it can not tell why an MDL lock is being held and whether to wait or give up. Unlike MDEV-29676 , the query sent to the data node is actually the "main" query (LOCK), rather than an optional "auxiliary" query (get table status and table index), so it does not make as much sense to set lock wait timeout to 1 and give up. It would be nice if the sql layer could determine that t and t2 are actually the same table, e.g. by calling a new handler method say get_real_table() to get info of the real table behind an alias / spider table / etc. CC serg .

            People

              ycp Yuchen Pei
              ycp Yuchen Pei
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.