Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21404

Concurrent INSERTs to Spider table with InnoDB back end hang or deadlock

Details

    Description

      It seems concurrent INSERTs into a simple Spider table hang with lock wait. Initial setup is simple (server listens to port 3308):

      MariaDB [test]> set global spider_same_server_link=1;                           Query OK, 0 rows affected (0,000 sec)
       
      MariaDB [test]> create server `other` foreign data wrapper mysql options( host '127.0.0.1', database 'test', user 'root', port 3308);
      Query OK, 0 rows affected (0,016 sec)
       
      MariaDB [test]> create table t1(id int auto_increment primary key, c1 int);
      Query OK, 0 rows affected (0,226 sec)
       
      MariaDB [test]> create table t2(id int auto_increment primary key, c1 int);
      Query OK, 0 rows affected (0,279 sec)
       
      MariaDB [test]> create table t_spider(id int auto_increment primary key, c1 int) engine=spider comment='wrapper "mysql", table "t_spider"' partition by range(id) (partition p1 values less than(100000) comment='srv "other", table "t1"' engine=spider, partition p2 values less than maxvalue comment='srv "other", table "t2"' engine=spider);
      Query OK, 0 rows affected (0,084 sec)
       
      MariaDB [test]> select count(*) from t_spider;
      +----------+
      | count(*) |
      +----------+
      |        0 |
      +----------+
      1 row in set (0,006 sec)
      

      We can surely run concurrent INSERTs into any of the back end InnoDB tables:

      openxs@ao756:~/dbs/maria10.4$ bin/mysqlslap -uroot --create-schema=test --no-drop --concurrency=2  --number-of-queries=10000 --iterations=1 --socket=/tmp/mariadb.sock --query="insert into t2(c1) values(rand()*1000)"                         Benchmark
              Average number of seconds to run all queries: 440.052 seconds
              Minimum number of seconds to run all queries: 440.052 seconds
              Maximum number of seconds to run all queries: 440.052 seconds
              Number of clients running queries: 2
              Average number of queries per client: 5000
      

      In the processlist we see:

      ...
      | 3902 | root        | localhost       | test | Query   |    0 | Commit                                             | insert into t2(c1) values(rand()*1000) |    0.000 |
      | 3903 | root        | localhost       | test | Query   |    0 | Commit                                             | insert into t2(c1) values(rand()*1000) |    0.000 |
      ...
      

      But when we try try to run similar concurrent INSERTs into an empty Spider table we get a deadlock:

      openxs@ao756:~/dbs/maria10.4$ bin/mysqlslap -uroot --create-schema=test --no-drop --concurrency=2  --number-of-queries=10000 --iterations=1 --socket=/tmp/mariadb.sock --query="insert into t_spider(c1) values(rand()*1000)"
      bin/mysqlslap: Cannot run query insert into t_spider(c1) values(rand()*1000) ERROR : Deadlock found when trying to get lock; try restarting transaction
      

      or lock waits (and timeouts) for more complex tables:

      ------------
      TRANSACTIONS
      ------------
      Trx id counter 217
      Purge done for trx's n:o < 217 undo n:o < 0 state: running but idle
      History list length 80
      LIST OF TRANSACTIONS FOR EACH SESSION:
      ---TRANSACTION 212, ACTIVE 16 sec starting index read
      mysql tables in use 3, locked 3
      LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
      MySQL thread id 1110, OS thread handle 139926681863936, query id 687 localhost 127.0.0.1 root Sending data
      select `ID`, ... from `test`.`t_something` order by `ID` desc limit 1 for update
      ------- TRX HAS BEEN WAITING 16 SEC FOR THIS LOCK TO BE GRANTED:
      RECORD LOCKS space id 8 page no 3 n bits 96 index PRIMARY of table `test`.`t_something` /* Partition `p_1` */ trx id 212 lock_mode X waiting
      Record lock, heap no 24 PHYSICAL RECORD: n_fields 16; compact format; info bits 0
       0: len 4; hex 00000019; asc     ;;
       1: len 6; hex 0000000000d3; asc       ;;
       2: len 7; hex a80000015a01f1; asc     Z  ;;
       3: len 4; hex 00000000; asc     ;;
       4: len 4; hex 00000001; asc     ;;
       5: len 4; hex 00000000; asc     ;;
       6: len 4; hex 00000000; asc     ;;
       7: len 4; hex 00000000; asc     ;;
       8: len 4; hex 00000000; asc     ;;
       9: len 4; hex 00000000; asc     ;;
       10: len 5; hex 4e4f4b2020; asc NOK  ;;
       11: len 4; hex 00000000; asc     ;;
       12: len 4; hex 00000000; asc     ;;
       13: len 4; hex 00000000; asc     ;;
       14: len 4; hex 00000000; asc     ;;
       15: len 4; hex 5e08dec0; asc ^   ;;
       
      ------------------
      ---TRANSACTION 211, ACTIVE 16 sec
      26 lock struct(s), heap size 1136, 15 row lock(s), undo log entries 1
      MySQL thread id 1109, OS thread handle 139926681257728, query id 668 localhost 127.0.0.1 root
      

      In case of deadlock, the details are the following:

      MariaDB [test]> select version();
      +-----------------+
      | version()       |
      +-----------------+
      | 10.4.12-MariaDB |
      +-----------------+
      1 row in set (0,000 sec)
       
      MariaDB [test]> show engine innodb status\G
      *************************** 1. row ***************************
        Type: InnoDB
        Name:
      Status:
      =====================================
      2019-12-29 19:44:12 0x7f433805c700 INNODB MONITOR OUTPUT
      =====================================
      Per second averages calculated from the last 2 seconds
      -----------------
      BACKGROUND THREAD
      -----------------
      srv_master_thread loops: 413 srv_active, 0 srv_shutdown, 1680 srv_idle
      srv_master_thread log flush and writes: 2093
      ----------
      SEMAPHORES
      ----------
      OS WAIT ARRAY INFO: reservation count 2871
      OS WAIT ARRAY INFO: signal count 1874
      RW-shared spins 0, rounds 712, OS waits 189
      RW-excl spins 0, rounds 3651, OS waits 90
      RW-sx spins 534, rounds 7982, OS waits 81
      Spin rounds per wait: 712.00 RW-shared, 3651.00 RW-excl, 14.95 RW-sx
      ------------------------
      LATEST DETECTED DEADLOCK
      ------------------------
      2019-12-29 19:37:35 0x7f43380a6700
      *** (1) TRANSACTION:
      TRANSACTION 20360, ACTIVE 0 sec starting index read
      mysql tables in use 1, locked 1
      LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
      MySQL thread id 6109, OS thread handle 139926681257728, query id 11070 localhost 127.0.0.1 root Sending data
      select `id`,`c1` from `test`.`t1` order by `id` desc limit 1 for update
      *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
      RECORD LOCKS space id 17 page no 3 n bits 80 index PRIMARY of table `test`.`t1` trx id 20360 lock_mode X waiting
      Record lock, heap no 14 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
       0: len 4; hex 8000000d; asc     ;;
       1: len 6; hex 000000000000; asc       ;;
       2: len 7; hex 80000000000000; asc        ;;
       3: len 4; hex 800003e3; asc     ;;
       
      *** (2) TRANSACTION:
      TRANSACTION 20359, ACTIVE 0 sec inserting
      mysql tables in use 1, locked 1
      5 lock struct(s), heap size 1136, 4 row lock(s)
      MySQL thread id 6107, OS thread handle 139926679742208, query id 11059 localhost 127.0.0.1 root Update
      insert into `test`.`t1`(`id`,`c1`)values(14,432)
      *** (2) HOLDS THE LOCK(S):
      RECORD LOCKS space id 17 page no 3 n bits 80 index PRIMARY of table `test`.`t1` trx id 20359 lock_mode X
      Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
       0: len 8; hex 73757072656d756d; asc supremum;;
       
      Record lock, heap no 14 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
       0: len 4; hex 8000000d; asc     ;;
       1: len 6; hex 000000000000; asc       ;;
       2: len 7; hex 80000000000000; asc        ;;
       3: len 4; hex 800003e3; asc     ;;
       
      *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
      RECORD LOCKS space id 17 page no 3 n bits 80 index PRIMARY of table `test`.`t1` trx id 20359 lock_mode X insert intention waiting
      Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
       0: len 8; hex 73757072656d756d; asc supremum;;
       
      *** WE ROLL BACK TRANSACTION (1)
      ------------
      

      This makes it impossible to use Spider tables in concurrent environments.

      There are reasons to think that older versions (10.2.x) worked differently, so it may be a regression bug.

      Attachments

        Activity

          It looks caused by default value of spider_auto_increment_mode(=0). This is safe but it's not good for concurrent inserts. Please consider to use other value.
          If you used partitioned Spider table by default value, it worked as spider_auto_increment_mode=1. Because in 10.2, partition engine didn't have a patch for working with spider_auto_increment_mode=0.

          Kentoku Kentoku Shiba (Inactive) added a comment - It looks caused by default value of spider_auto_increment_mode(=0). This is safe but it's not good for concurrent inserts. Please consider to use other value. If you used partitioned Spider table by default value, it worked as spider_auto_increment_mode=1. Because in 10.2, partition engine didn't have a patch for working with spider_auto_increment_mode=0.

          Setting spider_auto_increment_mode=2 seemed to help. Can we get the impact on concurrent inserts documented for default value (-1)?

          Same about change in behavior on 10.4 vs 10.2, While 10.4 properly takes the value into account now for partitioned tables, together with default settings for this parameter we have a problem described in this bug report, while there was no problem with 10.2. This 'regression" must be properly higlighted in KB as an incompatible change that may require proper explicit setting.

          valerii Valerii Kravchuk added a comment - Setting spider_auto_increment_mode=2 seemed to help. Can we get the impact on concurrent inserts documented for default value (-1)? Same about change in behavior on 10.4 vs 10.2, While 10.4 properly takes the value into account now for partitioned tables, together with default settings for this parameter we have a problem described in this bug report, while there was no problem with 10.2. This 'regression" must be properly higlighted in KB as an incompatible change that may require proper explicit setting.
          Kentoku Kentoku Shiba (Inactive) added a comment - I added this behavior to the following in KB. https://mariadb.com/kb/en/spider-server-system-variables/ https://mariadb.com/kb/en/spider-faq/

          People

            Kentoku Kentoku Shiba (Inactive)
            valerii Valerii Kravchuk
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.