Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24813

Locking full table scan fails to use table-level locking

Details

    Description

      The following test case shows that table locks are not being passed to the storage engine in any of the cases of locking read.

      --source include/have_innodb.inc
      --source include/have_sequence.inc
      --source include/not_debug.inc
       
      SET @save_freq=@@GLOBAL.innodb_purge_rseg_truncate_frequency;
      SET GLOBAL innodb_purge_rseg_truncate_frequency=1;
       
      CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;
      INSERT INTO t(a) SELECT seq FROM seq_0_to_1000000;
       
      --source include/wait_all_purged.inc
      SET GLOBAL innodb_purge_rseg_truncate_frequency=@save_freq;
       
      SET profiling_history_size=100;
      SET profiling = 1;
       
      START TRANSACTION;
      LOCK TABLE t WRITE;
      SELECT COUNT(*) FROM t FOR UPDATE;
      UNLOCK TABLES;
      COMMIT;
      SELECT COUNT(*) FROM t FOR UPDATE;
       
      START TRANSACTION;
      LOCK TABLE t READ;
      SELECT COUNT(*) FROM t LOCK IN SHARE MODE;
      UNLOCK TABLES;
      COMMIT;
      SELECT COUNT(*) FROM t LOCK IN SHARE MODE;
       
      SHOW PROFILES;
      SET profiling = 0;
      DROP TABLE t;
      

      I would have expected that an explicit LOCK TABLE would be passed to the storage engine, like it seems to have been the case during the development of MDEV-14479.

      I would also expect the optimizer to automatically pass information to the storage engine so that the table can be locked upfront, if we know that the entire table will be scanned in a locking operation. This should include the following:

      • CREATE…SELECT
      • INSERT…SELECT
      • SELECT…LOCK IN SHARE MODE
      • SELECT…FOR UPDATE
      • SELECT executed at TRANSACTION ISOLATION LEVEL SERIALIZABLE
      • UPDATE
      • DELETE

      If no WHERE or JOIN condition applies to the table, nor a LIMIT is present, then we should expect the operation to scan the entire table, and the storage engine should be requested to lock the entire table.

      MDEV-14479 fixed the InnoDB record locking in such a way that when the table has already been locked in the corresponding mode, no individual record locks will be allocated and created. If we set a breakpoint in lock_create_low() after ha_innobase::create() has finished in the above test case, we will observe the following:

      10.5 ffc5d064895cadbc42711efd7dbb6ae1b323f050

      Thread 15 "mariadbd" hit Breakpoint 1, lock_rec_create_low (c_lock=0x0, 
          thr=0x0, type_mode=3, page_id={m_id = 21474836484}, 
          page=0x7ffff0b70000 "", heap_no=2, index=0x7fffbc01af40, 
          trx=0x7ffff17260d8, holds_trx_mutex=false)
          at /mariadb/10.5m/storage/innobase/lock/lock0lock.cc:1294
      (gdb) p index.table.locks
      $7 = {count = 1, start = 0x7ffff1726cb0, end = 0x7ffff1726cb0, 
        node = &lock_table_t::locks}
      (gdb) p *index.table.locks.start
      $8 = {trx = 0x7ffff17260d8, trx_locks = {prev = 0x0, next = 0x0}, 
        index = 0x0, hash = 0x0, requested_time = 0, wait_time = 0, un_member = {
          tab_lock = {table = 0x7fffbc019e10, locks = {prev = 0x0, next = 0x0}}, 
          rec_lock = {page_id = {m_id = 140736347610640}, n_bits = 0}}, 
        type_mode = 17}
      

      Note: type_mode == 17 == LOCK_TABLE | LOCK_IX. For the LOCK IN SHARE MODE, it would be 16 == LOCK_TABLE | LOCK_IS. We would expect the table lock to exist in LOCK_X or LOCK_S mode, so that no locks will have to be allocated for each individual visited row.

      If we change the CREATE TABLE to CREATE TEMPORARY TABLE, we can get an accurate estimate of how fast the execution would be without the row-level locking.

      On my system, with locking, each full-table-scan consumes between 0.49 and 0.66 seconds (the first run being the slowest). This can probably be attributed to dynamic memory allocation.

      With a temporary table, each locking full-table scan consumes between 0.28 and 0.30 seconds. The memory operations related to row-level locking are roughly doubling the execution time (and seriously increasing the memory footprint)!

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä created issue -
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            I think this should be implemented in steps. How about defining the first step as follows:

            If
            1. this is a DML statement (which does Locking Reads in InnODB)
            2. and the query is doing a full table scan without row filtering
            then:
            3. Use a table-level lock.

            InnoDB already knows about whether #1 is true.
            The only issue is #2.

            #2 can be implemented as follows:
            before (or right after) the handler->rnd_init(scan=false) call, the SQL layer will make an extra call to inform the InnoDB storage engine what condition will be attached to the table.

            (TODO: check if this can be unified with Table Condition Pushdown).

            psergei Sergei Petrunia added a comment - I think this should be implemented in steps. How about defining the first step as follows: If 1. this is a DML statement (which does Locking Reads in InnODB) 2. and the query is doing a full table scan without row filtering then: 3. Use a table-level lock. InnoDB already knows about whether #1 is true. The only issue is #2. #2 can be implemented as follows: before (or right after) the handler->rnd_init(scan=false) call, the SQL layer will make an extra call to inform the InnoDB storage engine what condition will be attached to the table. (TODO: check if this can be unified with Table Condition Pushdown).

            Implementation suggestion is below. Need to think this through.

            == Use Table Condition Pushdown for WHERE ==

            There is already handler::cond_push() call which informs the storage engine
            about the condition. InnoDB could implement it and this way see whether
            a condition is being present.

            Possible issues:
            Issue 1:
            setting optimizer_switch='table_condition_pushdown=off' will cause
            table-level locks to be acquired! Is this ok?
            It is a common practice among support/etc to disable optimizer switch
            flags to see what happens...

            Issue 2:
            InnoDB will take a note about whether there is a pushed condition, but it
            will not check the condition.

            That is, we'll have:

              COND *ha_innobase::cond_push(COND *arg) {
                 ...
                 return arg;
              }
            

            what should be in handler::pushed_cond, then? The code checks this member and seeing NULL interprets it as "nothing is pushed". Is this ok? If it is not, should we put "Item(1)" there?

            == Introduce push_limit ==

            To inform the storage engine about the LIMIT clause, we introduce

            void handler::push_limit(ha_rows soft_limit)
            

            Which serves as a hint to the storage engine that the SQL layer is not going to read more than soft_limit rows in a scan.

            The idea is that this is generally useful for storage engines that do batching - they should not batch much more than soft_limit rows.

            psergei Sergei Petrunia added a comment - Implementation suggestion is below. Need to think this through. == Use Table Condition Pushdown for WHERE == There is already handler::cond_push() call which informs the storage engine about the condition. InnoDB could implement it and this way see whether a condition is being present. Possible issues: Issue 1: setting optimizer_switch='table_condition_pushdown=off' will cause table-level locks to be acquired! Is this ok? It is a common practice among support/etc to disable optimizer switch flags to see what happens... Issue 2: InnoDB will take a note about whether there is a pushed condition, but it will not check the condition. That is, we'll have: COND *ha_innobase::cond_push(COND *arg) { ... return arg; } what should be in handler::pushed_cond, then? The code checks this member and seeing NULL interprets it as "nothing is pushed". Is this ok? If it is not, should we put "Item(1)" there? == Introduce push_limit == To inform the storage engine about the LIMIT clause, we introduce void handler::push_limit(ha_rows soft_limit) Which serves as a hint to the storage engine that the SQL layer is not going to read more than soft_limit rows in a scan. The idea is that this is generally useful for storage engines that do batching - they should not batch much more than soft_limit rows.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            psergei Sergei Petrunia made changes -
            Labels performance optimizer-feature performance
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 118952 ] MariaDB v4 [ 142575 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Roel Roel Van de Paar added a comment - - edited

            Marko informed me that this is the same bug:

            CREATE TABLE IF NOT EXISTS v0 ( v1 VARCHAR ( 82 ) ) ;
            INSERT INTO v0 ( v1 ) VALUES ( 25 ) , ( ( SELECT -1 WHERE ( v1 NOT IN ( v1 NOT IN ( SELECT v1 ) ) AND v1 NOT IN ( 80 ) ) ) ) , ( 89215105.000000 ) , ( 0 ) , ( 2147483647 ) , ( 127 ) , ( 74 ) , ( -128 ) ;
            INSERT INTO v0 ( v1 ) SELECT v1 FROM ( SELECT 'x' FROM v0 AS v10 , v0 AS v9 NATURAL JOIN v0 AS v8 , v0 ) AS v2 , ( SELECT 40561621.000000 FROM v0 AS v7 , v0 AS v6 , v0 AS v5 JOIN v0 ) AS v3 JOIN v0 AS v4 NATURAL JOIN v0 ORDER BY v1 ;
            

            Leads to:

            10.11.0-opt>CREATE TABLE IF NOT EXISTS v0 ( v1 VARCHAR ( 82 ) ) ;
            Query OK, 0 rows affected (0.015 sec)
             
            10.11.0-opt> INSERT INTO v0 ( v1 ) VALUES ( 25 ) , ( ( SELECT -1 WHERE ( v1 NOT IN ( v1 NOT IN ( SELECT v1 ) ) AND v1 NOT IN ( 80 ) ) ) ) , ( 89215105.000000 ) , ( 0 ) , ( 2147483647 ) , ( 127 ) , ( 74 ) , ( -128 ) ;
            Query OK, 8 rows affected (0.002 sec)
            Records: 8  Duplicates: 0  Warnings: 0
             
            10.11.0-opt> INSERT INTO v0 ( v1 ) SELECT v1 FROM ( SELECT 'x' FROM v0 AS v10 , v0 AS v9 NATURAL JOIN v0 AS v8 , v0 ) AS v2 , ( SELECT 40561621.000000 FROM v0 AS v7 , v0 AS v6 , v0 AS v5 JOIN v0 ) AS v3 JOIN v0 AS v4 NATURAL JOIN v0 ORDER BY v1 ;
            ERROR 1206 (HY000): The total number of locks exceeds the lock table size
            

            And

            10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Optimized)

            2022-08-30  8:43:42 0 [Note] /test/MD190822-mariadb-10.11.0-linux-x86_64-opt/bin/mysqld: ready for connections.
            Version: '10.11.0-MariaDB'  socket: '/test/MD190822-mariadb-10.11.0-linux-x86_64-opt/socket.sock'  port: 10942  MariaDB Server
            2022-08-30  8:44:35 4 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=126M. Starting the InnoDB Monitor to print diagnostics.
             
            =====================================
            2022-08-30 08:44:35 0x1466fd360700 INNODB MONITOR OUTPUT
            ...
            

            If it would help, I can reduce this testcase further.

            Roel Roel Van de Paar added a comment - - edited Marko informed me that this is the same bug: CREATE TABLE IF NOT EXISTS v0 ( v1 VARCHAR ( 82 ) ) ; INSERT INTO v0 ( v1 ) VALUES ( 25 ) , ( ( SELECT -1 WHERE ( v1 NOT IN ( v1 NOT IN ( SELECT v1 ) ) AND v1 NOT IN ( 80 ) ) ) ) , ( 89215105.000000 ) , ( 0 ) , ( 2147483647 ) , ( 127 ) , ( 74 ) , ( -128 ) ; INSERT INTO v0 ( v1 ) SELECT v1 FROM ( SELECT 'x' FROM v0 AS v10 , v0 AS v9 NATURAL JOIN v0 AS v8 , v0 ) AS v2 , ( SELECT 40561621.000000 FROM v0 AS v7 , v0 AS v6 , v0 AS v5 JOIN v0 ) AS v3 JOIN v0 AS v4 NATURAL JOIN v0 ORDER BY v1 ; Leads to: 10.11.0-opt>CREATE TABLE IF NOT EXISTS v0 ( v1 VARCHAR ( 82 ) ) ; Query OK, 0 rows affected (0.015 sec)   10.11.0-opt> INSERT INTO v0 ( v1 ) VALUES ( 25 ) , ( ( SELECT -1 WHERE ( v1 NOT IN ( v1 NOT IN ( SELECT v1 ) ) AND v1 NOT IN ( 80 ) ) ) ) , ( 89215105.000000 ) , ( 0 ) , ( 2147483647 ) , ( 127 ) , ( 74 ) , ( -128 ) ; Query OK, 8 rows affected (0.002 sec) Records: 8 Duplicates: 0 Warnings: 0   10.11.0-opt> INSERT INTO v0 ( v1 ) SELECT v1 FROM ( SELECT 'x' FROM v0 AS v10 , v0 AS v9 NATURAL JOIN v0 AS v8 , v0 ) AS v2 , ( SELECT 40561621.000000 FROM v0 AS v7 , v0 AS v6 , v0 AS v5 JOIN v0 ) AS v3 JOIN v0 AS v4 NATURAL JOIN v0 ORDER BY v1 ; ERROR 1206 (HY000): The total number of locks exceeds the lock table size And 10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Optimized) 2022-08-30 8:43:42 0 [Note] /test/MD190822-mariadb-10.11.0-linux-x86_64-opt/bin/mysqld: ready for connections. Version: '10.11.0-MariaDB' socket: '/test/MD190822-mariadb-10.11.0-linux-x86_64-opt/socket.sock' port: 10942 MariaDB Server 2022-08-30 8:44:35 4 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=126M. Starting the InnoDB Monitor to print diagnostics.   ===================================== 2022-08-30 08:44:35 0x1466fd360700 INNODB MONITOR OUTPUT ... If it would help, I can reduce this testcase further.
            Roel Roel Van de Paar made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            Roel Roel Van de Paar made changes -
            Affects Version/s 10.11 [ 27614 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            May be InnoDB should automatically upgrade row locks to a table lock, when there're too many of them?
            This is a fairly standard behavior of many database engines.

            serg Sergei Golubchik added a comment - May be InnoDB should automatically upgrade row locks to a table lock, when there're too many of them? This is a fairly standard behavior of many database engines.
            marko Marko Mäkelä made changes -
            ycp Yuchen Pei made changes -
            Assignee Sergei Petrunia [ psergey ] Yuchen Pei [ JIRAUSER52627 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            ycp Yuchen Pei made changes -
            Priority Major [ 3 ] Critical [ 2 ]

            Verified also in 11.5

            11.5.0 eeba940311ed17d160023280783fd2bbb64abef3 (Optimized)

            2024-04-02 10:38:02 0 [Note] /test/MD190224-mariadb-11.5.0-linux-x86_64-opt/bin/mariadbd: ready for connections.
            Version: '11.5.0-MariaDB'  socket: '/test/MD190224-mariadb-11.5.0-linux-x86_64-opt/socket.sock'  port: 10082  MariaDB Server
            2024-04-02 10:38:38 4 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=126M. Starting the InnoDB Monitor to print diagnostics.
             
            =====================================
            2024-04-02 10:38:38 0x153b737fb6c0 INNODB MONITOR OUTPUT
            ...
            

            Roel Roel Van de Paar added a comment - Verified also in 11.5 11.5.0 eeba940311ed17d160023280783fd2bbb64abef3 (Optimized) 2024-04-02 10:38:02 0 [Note] /test/MD190224-mariadb-11.5.0-linux-x86_64-opt/bin/mariadbd: ready for connections. Version: '11.5.0-MariaDB' socket: '/test/MD190224-mariadb-11.5.0-linux-x86_64-opt/socket.sock' port: 10082 MariaDB Server 2024-04-02 10:38:38 4 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=126M. Starting the InnoDB Monitor to print diagnostics.   ===================================== 2024-04-02 10:38:38 0x153b737fb6c0 INNODB MONITOR OUTPUT ...
            Roel Roel Van de Paar made changes -
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 11.4 [ 29301 ]
            Affects Version/s 11.5 [ 29506 ]
            marko Marko Mäkelä made changes -
            ycp Yuchen Pei added a comment - - edited

            I tried to understand how innodb acquire table locks, and given the relevance of MDEV-14479 (marko also asked me to look into that ticket), I extracted the tests from the change (721ec44e2ad), as follows

            --source include/have_innodb.inc
            # Turn on all monitor counters
            set global innodb_monitor_enable = all;
             
            CREATE TABLE fl0 (
              id INT NOT NULL PRIMARY KEY
            ) ENGINE = InnoDB;
             
            CREATE TABLE fl1 (
              id INT NOT NULL PRIMARY KEY,
              fl0_id INT,
              CONSTRAINT `fkl0`
                FOREIGN KEY (fl0_id) REFERENCES fl0 (id)
                ON DELETE CASCADE
                ON UPDATE RESTRICT
            ) ENGINE = InnoDB;
             
            CREATE TABLE fl2 (
              id INT NOT NULL PRIMARY KEY,
              fl1_id INT,
              CONSTRAINT `fkl1`
                FOREIGN KEY (fl1_id) REFERENCES fl1 (id)
                ON DELETE CASCADE
                ON UPDATE SET NULL
            ) ENGINE = InnoDB;
             
            INSERT INTO fl0 VALUES (1000);
            INSERT INTO fl1 VALUES (500, 1000), (1500, 1000);
            INSERT INTO fl2 VALUES (200, 500), (800, 500), (1200, 1500), (1800, 1500);
             
            CREATE TABLE t1(id INT PRIMARY KEY, a INT, b CHAR(1), UNIQUE KEY u(a,b))
            ENGINE=InnoDB;
             
            SET @start = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME
            = 'lock_rec_lock_created');
             
            BEGIN;
            INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d');
            DELETE FROM t1 WHERE a = 9999 AND b='b';
            COMMIT;
             
            # After MDEV-30225 is fixed, the above DELETE creates next-key lock during
            # secondary index unique search. That's why the result of the following must
            # be 1.
            SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME
            = 'lock_rec_lock_created');
            SELECT @end - @start;
             
            --echo # Records must not be S/X-locked if a table is X-locked.
            SET @start = @end;
            # Tables will not be locked if autocommit is not 0.
            # See OPTION_NOT_AUTOCOMMIT in ha_innobase::external_lock().
            SET autocommit = 0;
            BEGIN;
            LOCK TABLE t1 WRITE;
            SELECT * FROM t1;
            SELECT * FROM t1 WHERE a>=10000;
            SELECT * FROM t1 FOR UPDATE;
            SELECT * FROM t1 WHERE a>=10000 FOR UPDATE;
            UPDATE t1 SET b = 'b' WHERE id = 4;
            UPDATE t1 SET b = 'b' WHERE a = 10000;
            REPLACE INTO t1 VALUES (4,3,'a');
            INSERT INTO t1 VALUES (3,1,'e') ON DUPLICATE KEY UPDATE b = 'b';
            INSERT INTO t1 VALUES (5,5,'e');
            DELETE FROM t1 WHERE a = 1 AND b='a';
            DELETE FROM t1;
            COMMIT;
            UNLOCK TABLES;
            SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = 'lock_rec_lock_created');
            SELECT @end - @start;
             
            --echo # Records must not be S-locked if a table is S-locked.
            SET @start = @end;
            BEGIN;
            LOCK TABLE t1 WRITE;
            INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d');
            DELETE FROM t1 WHERE a = 9999 AND b='b';
            COMMIT;
            UNLOCK TABLES;
            BEGIN;
            LOCK TABLE t1 READ;
            SELECT * FROM t1 LOCK IN SHARE MODE;
            SELECT * FROM t1 WHERE a>=10000 LOCK IN SHARE MODE;
            COMMIT;
            UNLOCK TABLES;
            SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = 'lock_rec_lock_created');
            SELECT @end - @start;
             
            --echo # Records must not be S-locked for foreign keys enforcement
            SET @start = @end;
            BEGIN;
            LOCK TABLE fl0 READ, fl1 READ, fl2 WRITE;
            INSERT INTO fl2 VALUES (300, 500), (700, 500), (1300, 1500), (1700, 1500);
            SELECT * FROM fl1 LOCK IN SHARE MODE;
            COMMIT;
            UNLOCK TABLES;
            SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = 'lock_rec_lock_created');
            SELECT @end - @start;
             
            --echo # Records must not be X-locked for foreign keys cascade
            SET @start = @end;
            BEGIN;
            LOCK TABLE fl0 READ, fl1 WRITE, fl2 WRITE;
            DELETE FROM fl1 WHERE id = 1500;
            UPDATE fl1 SET id = 2500 WHERE id = 500;
            COMMIT;
            UNLOCK TABLES;
            SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = 'lock_rec_lock_created');
            SELECT @end - @start;
             
            SET autocommit = default;
            DROP TABLE t1;
            DROP TABLE fl2;
            DROP TABLE fl1;
            DROP TABLE fl0;
            SET GLOBAL innodb_monitor_enable=default;
            

            and added a branch to track when it goes into the branch of detecting table locks so that I can put a breakpoint there:

            modified   storage/innobase/lock/lock0lock.cc
            @@ -1956,7 +1956,8 @@ lock_rec_lock(
                      lock_table_has(trx, index->table, LOCK_IX));
             
               if (lock_table_has(trx, index->table,
            -                     static_cast<lock_mode>(LOCK_MODE_MASK & mode)));
            +                     static_cast<lock_mode>(LOCK_MODE_MASK & mode)))
            +	  std::cout << "yo" << std::endl;
               else if (lock_t *lock= lock_sys.get_first(block->page.id()))
               {
                 trx_mutex_enter(trx);
            

            What I got was odd (tested at 10.5 8bc32410160265dcd27d595ed610843d08d32034). Every time it breaks there (at least statements marked with ! below), the trx and the table lock are the same, and a watchpoint on the table lock trx->lock.table_locks[0] shows it was updated at the insert statement into fl1. As you can see, this happens across BEGIN;...COMMIT; blocks.

            INSERT INTO fl0 VALUES (1000);
            INSERT INTO fl1 VALUES (500, 1000), (1500, 1000); # last update of the lock
            # [... 25 lines elided]
            BEGIN;
            LOCK TABLE t1 WRITE;
            SELECT * FROM t1;
            SELECT * FROM t1 WHERE a>=10000;
            SELECT * FROM t1 FOR UPDATE; # !
            SELECT * FROM t1 WHERE a>=10000 FOR UPDATE; # !
            UPDATE t1 SET b = 'b' WHERE id = 4;
            UPDATE t1 SET b = 'b' WHERE a = 10000;
            REPLACE INTO t1 VALUES (4,3,'a'); # !
            INSERT INTO t1 VALUES (3,1,'e') ON DUPLICATE KEY UPDATE b = 'b';
            INSERT INTO t1 VALUES (5,5,'e');
            DELETE FROM t1 WHERE a = 1 AND b='a'; # !
            DELETE FROM t1; # !
            COMMIT;
            # [... 6 lines elided]
            BEGIN;
            LOCK TABLE t1 WRITE;
            INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d');
            DELETE FROM t1 WHERE a = 9999 AND b='b'; # !
            COMMIT;
            UNLOCK TABLES;
            BEGIN;
            LOCK TABLE t1 READ;
            SELECT * FROM t1 LOCK IN SHARE MODE;
            SELECT * FROM t1 WHERE a>=10000 LOCK IN SHARE MODE;
            COMMIT;
            

            marko: is this expected? If so why?

            ycp Yuchen Pei added a comment - - edited I tried to understand how innodb acquire table locks, and given the relevance of MDEV-14479 ( marko also asked me to look into that ticket), I extracted the tests from the change (721ec44e2ad), as follows --source include/have_innodb.inc # Turn on all monitor counters set global innodb_monitor_enable = all ;   CREATE TABLE fl0 ( id INT NOT NULL PRIMARY KEY ) ENGINE = InnoDB;   CREATE TABLE fl1 ( id INT NOT NULL PRIMARY KEY , fl0_id INT , CONSTRAINT `fkl0` FOREIGN KEY (fl0_id) REFERENCES fl0 (id) ON DELETE CASCADE ON UPDATE RESTRICT ) ENGINE = InnoDB;   CREATE TABLE fl2 ( id INT NOT NULL PRIMARY KEY , fl1_id INT , CONSTRAINT `fkl1` FOREIGN KEY (fl1_id) REFERENCES fl1 (id) ON DELETE CASCADE ON UPDATE SET NULL ) ENGINE = InnoDB;   INSERT INTO fl0 VALUES (1000); INSERT INTO fl1 VALUES (500, 1000), (1500, 1000); INSERT INTO fl2 VALUES (200, 500), (800, 500), (1200, 1500), (1800, 1500);   CREATE TABLE t1(id INT PRIMARY KEY , a INT , b CHAR (1), UNIQUE KEY u(a,b)) ENGINE=InnoDB;   SET @start = ( SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = 'lock_rec_lock_created' );   BEGIN ; INSERT INTO t1 VALUES (1,1, 'a' ),(2,9999, 'b' ),(3,10000, 'c' ),(4,4, 'd' ); DELETE FROM t1 WHERE a = 9999 AND b= 'b' ; COMMIT ;   # After MDEV-30225 is fixed, the above DELETE creates next - key lock during # secondary index unique search. That 's why the result of the following must # be 1. SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = ' lock_rec_lock_created '); SELECT @end - @start;   --echo # Records must not be S/X-locked if a table is X-locked. SET @start = @end; # Tables will not be locked if autocommit is not 0. # See OPTION_NOT_AUTOCOMMIT in ha_innobase::external_lock(). SET autocommit = 0; BEGIN; LOCK TABLE t1 WRITE; SELECT * FROM t1; SELECT * FROM t1 WHERE a>=10000; SELECT * FROM t1 FOR UPDATE; SELECT * FROM t1 WHERE a>=10000 FOR UPDATE; UPDATE t1 SET b = ' b ' WHERE id = 4; UPDATE t1 SET b = ' b ' WHERE a = 10000; REPLACE INTO t1 VALUES (4,3,' a '); INSERT INTO t1 VALUES (3,1,' e ') ON DUPLICATE KEY UPDATE b = ' b '; INSERT INTO t1 VALUES (5,5,' e '); DELETE FROM t1 WHERE a = 1 AND b=' a '; DELETE FROM t1; COMMIT; UNLOCK TABLES; SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = ' lock_rec_lock_created '); SELECT @end - @start;   --echo # Records must not be S-locked if a table is S-locked. SET @start = @end; BEGIN; LOCK TABLE t1 WRITE; INSERT INTO t1 VALUES(1,1,' a '),(2,9999,' b '),(3,10000,' c '),(4,4,' d '); DELETE FROM t1 WHERE a = 9999 AND b=' b '; COMMIT; UNLOCK TABLES; BEGIN; LOCK TABLE t1 READ; SELECT * FROM t1 LOCK IN SHARE MODE; SELECT * FROM t1 WHERE a>=10000 LOCK IN SHARE MODE; COMMIT; UNLOCK TABLES; SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = ' lock_rec_lock_created '); SELECT @end - @start;   --echo # Records must not be S-locked for foreign keys enforcement SET @start = @end; BEGIN; LOCK TABLE fl0 READ, fl1 READ, fl2 WRITE; INSERT INTO fl2 VALUES (300, 500), (700, 500), (1300, 1500), (1700, 1500); SELECT * FROM fl1 LOCK IN SHARE MODE; COMMIT; UNLOCK TABLES; SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = ' lock_rec_lock_created '); SELECT @end - @start;   --echo # Records must not be X-locked for foreign keys cascade SET @start = @end; BEGIN; LOCK TABLE fl0 READ, fl1 WRITE, fl2 WRITE; DELETE FROM fl1 WHERE id = 1500; UPDATE fl1 SET id = 2500 WHERE id = 500; COMMIT; UNLOCK TABLES; SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = ' lock_rec_lock_created'); SELECT @ end - @start;   SET autocommit = default ; DROP TABLE t1; DROP TABLE fl2; DROP TABLE fl1; DROP TABLE fl0; SET GLOBAL innodb_monitor_enable= default ; and added a branch to track when it goes into the branch of detecting table locks so that I can put a breakpoint there: modified storage/innobase/lock/lock0lock.cc @@ -1956,7 +1956,8 @@ lock_rec_lock( lock_table_has(trx, index->table, LOCK_IX)); if (lock_table_has(trx, index->table, - static_cast<lock_mode>(LOCK_MODE_MASK & mode))); + static_cast<lock_mode>(LOCK_MODE_MASK & mode))) + std::cout << "yo" << std::endl; else if (lock_t *lock= lock_sys.get_first(block->page.id())) { trx_mutex_enter(trx); What I got was odd (tested at 10.5 8bc32410160265dcd27d595ed610843d08d32034). Every time it breaks there (at least statements marked with ! below), the trx and the table lock are the same, and a watchpoint on the table lock trx->lock.table_locks [0] shows it was updated at the insert statement into fl1 . As you can see, this happens across BEGIN;...COMMIT; blocks. INSERT INTO fl0 VALUES (1000); INSERT INTO fl1 VALUES (500, 1000), (1500, 1000); # last update of the lock # [... 25 lines elided] BEGIN ; LOCK TABLE t1 WRITE; SELECT * FROM t1; SELECT * FROM t1 WHERE a>=10000; SELECT * FROM t1 FOR UPDATE ; # ! SELECT * FROM t1 WHERE a>=10000 FOR UPDATE ; # ! UPDATE t1 SET b = 'b' WHERE id = 4; UPDATE t1 SET b = 'b' WHERE a = 10000; REPLACE INTO t1 VALUES (4,3, 'a' ); # ! INSERT INTO t1 VALUES (3,1, 'e' ) ON DUPLICATE KEY UPDATE b = 'b' ; INSERT INTO t1 VALUES (5,5, 'e' ); DELETE FROM t1 WHERE a = 1 AND b= 'a' ; # ! DELETE FROM t1; # ! COMMIT ; # [... 6 lines elided] BEGIN ; LOCK TABLE t1 WRITE; INSERT INTO t1 VALUES (1,1, 'a' ),(2,9999, 'b' ),(3,10000, 'c' ),(4,4, 'd' ); DELETE FROM t1 WHERE a = 9999 AND b= 'b' ; # ! COMMIT ; UNLOCK TABLES; BEGIN ; LOCK TABLE t1 READ ; SELECT * FROM t1 LOCK IN SHARE MODE; SELECT * FROM t1 WHERE a>=10000 LOCK IN SHARE MODE; COMMIT ; marko : is this expected? If so why?

            ycp, sorry for the delay. Yes, it is expected that the table-wide exclusive lock (LOCK_X) covers all write operations in all indexes of the table, without a need to register LOCK_X on the visited records.

            I just debugged this in 10.6. It turns out that LOCK TABLE t1 WRITE actually does acquire an exclusive lock (LOCK_X) on the entire table. I found this out by setting a breakpoint on lock_table_create in an rr replay of running the test with ./mtr --rr.

            Note: to be 10.6 and debugging friendly, it is better to add STATS_PERSISTENT=0 to each CREATE TABLE statement; in 10.5 this was the default due to MDEV-4750. We do not want ‘noise’ from dict_stats_save() in background tasks. Also, note that in 10.6 there is a function lock_sys_tables() that will acquire locks on InnoDB data dictionary tables. The first interesting lock_table_create call is after the last ha_innobase::create() (for CREATE TABLE t1) completed.

            Right after the CREATE TABLE part, the first two lock_table_create calls are different. First, we have one LOCK_IX acquisition during the INSERT statement in the following:

            BEGIN;
            INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d');
            DELETE FROM t1 WHERE a = 9999 AND b='b';
            COMMIT;
            

            When there are no conflicts, INSERT operations use implicit locking. That is, if the DB_TRX_ID column of a clustered index record identifies an active transaction, that record is exclusively locked by that transaction. So, during the above transaction, the only calls to lock_rec_create_low() will be for the DELETE statement. We must acquire LOCK_X on the record, because we only hold LOCK_IX and not LOCK_X on the table:

            10.6 e459ce8336ba37ea12a405f29e5fae866fd28701

            #0  lock_rec_create_low (c_lock=c_lock@entry=0x0, type_mode=type_mode@entry=3, page_id=page_id@entry={m_id = 34359738372}, page=0x7fd021910000 "", heap_no=heap_no@entry=3, index=index@entry=0x7fcff0063738, 
                trx=0x7fd021c00b80, holds_trx_mutex=false) at /mariadb/10.6m/storage/innobase/lock/lock0lock.cc:1225
            #1  0x0000562a4eb4e165 in lock_rec_lock (impl=impl@entry=false, mode=mode@entry=3, block=block@entry=0x7fd021410190, heap_no=3, index=index@entry=0x7fcff0063738, thr=thr@entry=0x7fcff00658e0)
                at /mariadb/10.6m/storage/innobase/lock/lock0lock.cc:1716
            #2  0x0000562a4eb4ede4 in lock_sec_rec_read_check_and_lock (flags=flags@entry=0, block=block@entry=0x7fd021410190, rec=rec@entry=0x7fd02191008d "\200", index=index@entry=0x7fcff0063738, 
                offsets=offsets@entry=0x7fd0221fca10, mode=mode@entry=LOCK_X, gap_mode=0, thr=0x7fcff00658e0) at /mariadb/10.6m/storage/innobase/lock/lock0lock.cc:6004
            #3  0x0000562a4ec1c1e8 in sel_set_rec_lock (pcur=pcur@entry=0x7fcff0065108, rec=0x7fd02191008d "\200", index=index@entry=0x7fcff0063738, offsets=0x7fd0221fca10, mode=3, type=0, thr=0x7fcff00658e0, 
                mtr=0x7fd0221fcdf0) at /mariadb/10.6m/storage/innobase/row/row0sel.cc:1369
            #4  0x0000562a4ec1f961 in row_search_mvcc (buf=buf@entry=0x7fcff005eb08 "\377", mode=<optimized out>, prebuilt=0x7fcff0064f38, match_mode=<optimized out>, direction=direction@entry=0)
                at /mariadb/10.6m/storage/innobase/row/row0sel.cc:5237
            

            The next statement that reaches InnoDB and lock_table_create() is the following:

            SET autocommit = 0;
            BEGIN;
            LOCK TABLE t1 WRITE;
            

            This will indeed acquire an exclusive lock on the table. Now, let us set breakpoints on lock_rec_create_low() and trx_t::commit() and see what happens during the rest of this transaction:

            SELECT * FROM t1;
            SELECT * FROM t1 WHERE a>=10000;
            SELECT * FROM t1 FOR UPDATE;
            SELECT * FROM t1 WHERE a>=10000 FOR UPDATE;
            UPDATE t1 SET b = 'b' WHERE id = 4;
            UPDATE t1 SET b = 'b' WHERE a = 10000;
            REPLACE INTO t1 VALUES (4,3,'a');
            INSERT INTO t1 VALUES (3,1,'e') ON DUPLICATE KEY UPDATE b = 'b';
            INSERT INTO t1 VALUES (5,5,'e');
            DELETE FROM t1 WHERE a = 1 AND b='a';
            DELETE FROM t1;
            COMMIT;
            

            It turns out that none of these DML statements invoke lock_rec_create_low(); we hit the trx_t::commit() on the COMMIT statement. The LOCK TABLE t1 WRITE is doing the right thing here, making all DML operations on t1 covered by the exclusive table lock.

            Similarly, in the following section we only acquire LOCK_S (table-wide shared lock) on the table, and do not acquire any LOCK_S on the visited index records:

            LOCK TABLE t1 READ;
            SELECT * FROM t1 LOCK IN SHARE MODE;
            SELECT * FROM t1 WHERE a>=10000 LOCK IN SHARE MODE;
            COMMIT;
            

            I think that for now, we can ignore the part that involves the tables whose names start with fl and involve FOREIGN KEY constraints.

            This example shows that it already is possible to acquire shared or exclusive InnoDB tables lock from SQL. I had forgotten these details. To fix this bug we have to figure out how to modify the code (I think mostly outside InnoDB) so that LOCK_X or LOCK_S will be acquired if a locking operation will traverse the entire table.

            marko Marko Mäkelä added a comment - ycp , sorry for the delay. Yes, it is expected that the table-wide exclusive lock ( LOCK_X ) covers all write operations in all indexes of the table, without a need to register LOCK_X on the visited records. I just debugged this in 10.6. It turns out that LOCK TABLE t1 WRITE actually does acquire an exclusive lock ( LOCK_X ) on the entire table. I found this out by setting a breakpoint on lock_table_create in an rr replay of running the test with ./mtr --rr . Note: to be 10.6 and debugging friendly, it is better to add STATS_PERSISTENT=0 to each CREATE TABLE statement; in 10.5 this was the default due to MDEV-4750 . We do not want ‘noise’ from dict_stats_save() in background tasks. Also, note that in 10.6 there is a function lock_sys_tables() that will acquire locks on InnoDB data dictionary tables. The first interesting lock_table_create call is after the last ha_innobase::create() (for CREATE TABLE t1 ) completed. Right after the CREATE TABLE part, the first two lock_table_create calls are different. First, we have one LOCK_IX acquisition during the INSERT statement in the following: BEGIN ; INSERT INTO t1 VALUES (1,1, 'a' ),(2,9999, 'b' ),(3,10000, 'c' ),(4,4, 'd' ); DELETE FROM t1 WHERE a = 9999 AND b= 'b' ; COMMIT ; When there are no conflicts, INSERT operations use implicit locking. That is, if the DB_TRX_ID column of a clustered index record identifies an active transaction, that record is exclusively locked by that transaction. So, during the above transaction, the only calls to lock_rec_create_low() will be for the DELETE statement. We must acquire LOCK_X on the record, because we only hold LOCK_IX and not LOCK_X on the table: 10.6 e459ce8336ba37ea12a405f29e5fae866fd28701 #0 lock_rec_create_low (c_lock=c_lock@entry=0x0, type_mode=type_mode@entry=3, page_id=page_id@entry={m_id = 34359738372}, page=0x7fd021910000 "", heap_no=heap_no@entry=3, index=index@entry=0x7fcff0063738, trx=0x7fd021c00b80, holds_trx_mutex=false) at /mariadb/10.6m/storage/innobase/lock/lock0lock.cc:1225 #1 0x0000562a4eb4e165 in lock_rec_lock (impl=impl@entry=false, mode=mode@entry=3, block=block@entry=0x7fd021410190, heap_no=3, index=index@entry=0x7fcff0063738, thr=thr@entry=0x7fcff00658e0) at /mariadb/10.6m/storage/innobase/lock/lock0lock.cc:1716 #2 0x0000562a4eb4ede4 in lock_sec_rec_read_check_and_lock (flags=flags@entry=0, block=block@entry=0x7fd021410190, rec=rec@entry=0x7fd02191008d "\200", index=index@entry=0x7fcff0063738, offsets=offsets@entry=0x7fd0221fca10, mode=mode@entry=LOCK_X, gap_mode=0, thr=0x7fcff00658e0) at /mariadb/10.6m/storage/innobase/lock/lock0lock.cc:6004 #3 0x0000562a4ec1c1e8 in sel_set_rec_lock (pcur=pcur@entry=0x7fcff0065108, rec=0x7fd02191008d "\200", index=index@entry=0x7fcff0063738, offsets=0x7fd0221fca10, mode=3, type=0, thr=0x7fcff00658e0, mtr=0x7fd0221fcdf0) at /mariadb/10.6m/storage/innobase/row/row0sel.cc:1369 #4 0x0000562a4ec1f961 in row_search_mvcc (buf=buf@entry=0x7fcff005eb08 "\377", mode=<optimized out>, prebuilt=0x7fcff0064f38, match_mode=<optimized out>, direction=direction@entry=0) at /mariadb/10.6m/storage/innobase/row/row0sel.cc:5237 The next statement that reaches InnoDB and lock_table_create() is the following: SET autocommit = 0; BEGIN ; LOCK TABLE t1 WRITE; This will indeed acquire an exclusive lock on the table. Now, let us set breakpoints on lock_rec_create_low() and trx_t::commit() and see what happens during the rest of this transaction: SELECT * FROM t1; SELECT * FROM t1 WHERE a>=10000; SELECT * FROM t1 FOR UPDATE ; SELECT * FROM t1 WHERE a>=10000 FOR UPDATE ; UPDATE t1 SET b = 'b' WHERE id = 4; UPDATE t1 SET b = 'b' WHERE a = 10000; REPLACE INTO t1 VALUES (4,3, 'a' ); INSERT INTO t1 VALUES (3,1, 'e' ) ON DUPLICATE KEY UPDATE b = 'b' ; INSERT INTO t1 VALUES (5,5, 'e' ); DELETE FROM t1 WHERE a = 1 AND b= 'a' ; DELETE FROM t1; COMMIT ; It turns out that none of these DML statements invoke lock_rec_create_low() ; we hit the trx_t::commit() on the COMMIT statement. The LOCK TABLE t1 WRITE is doing the right thing here, making all DML operations on t1 covered by the exclusive table lock. Similarly, in the following section we only acquire LOCK_S (table-wide shared lock) on the table, and do not acquire any LOCK_S on the visited index records: LOCK TABLE t1 READ ; SELECT * FROM t1 LOCK IN SHARE MODE; SELECT * FROM t1 WHERE a>=10000 LOCK IN SHARE MODE; COMMIT ; I think that for now, we can ignore the part that involves the tables whose names start with fl and involve FOREIGN KEY constraints. This example shows that it already is possible to acquire shared or exclusive InnoDB tables lock from SQL. I had forgotten these details. To fix this bug we have to figure out how to modify the code (I think mostly outside InnoDB) so that LOCK_X or LOCK_S will be acquired if a locking operation will traverse the entire table.
            ycp Yuchen Pei added a comment - - edited

            quick recap

            • what we've got so far
              • mdev_14479.test (the testcase in my previous comment):
                • IX lock in INSERT INTO fl0 VALUES (1000)
                • IX lock in INSERT INTO fl1 VALUES (500, 1000), (1500, 1000)
                • IX lock in INSERT INTO fl2 VALUES (200, 500), (800, 500), (1200, 1500), (1800, 1500)
                • IX lock in "INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d')"
                • X lock in "LOCK TABLE t1 WRITE" (because of the autocommit trick)
                • X lock in second "LOCK TABLE t1 WRITE"
              • mdev_24813.test (the test in the description of this ticket with the four selects labeled 0-3):
                • IX lock in "/* 0 */ SELECT COUNT FROM t FOR UPDATE"
                • IX in "/* 1 */ SELECT COUNT FROM t FOR UPDATE"
                • IS in "/* 2 */ SELECT COUNT FROM t LOCK IN SHARE MODE"
                • IS in "/* 3 */ SELECT COUNT FROM t LOCK IN SHARE MODE"
            • what we want: X or S table wide locks in these examples

              BEGIN; SELECT * FROM t1 LOCK IN SHARE MODE; ... -- S lock
              BEGIN; SELECT * FROM t1 FOR UPDATE; ...         -- X lock
              BEGIN; DELETE FROM t1; ...                      -- X lock
              

            • why the autocommit trick works

              // in ha_innobase::external_lock
              			if (sql_command == SQLCOM_LOCK_TABLES
              			    && THDVAR(thd, table_locks)
              			    && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT)
              			    && thd_in_lock_tables(thd)) {
               
              				dberr_t	error = row_lock_table(m_prebuilt);
              

            ycp Yuchen Pei added a comment - - edited quick recap what we've got so far mdev_14479.test (the testcase in my previous comment): IX lock in INSERT INTO fl0 VALUES (1000) IX lock in INSERT INTO fl1 VALUES (500, 1000), (1500, 1000) IX lock in INSERT INTO fl2 VALUES (200, 500), (800, 500), (1200, 1500), (1800, 1500) IX lock in "INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d')" X lock in "LOCK TABLE t1 WRITE" (because of the autocommit trick) X lock in second "LOCK TABLE t1 WRITE" mdev_24813.test (the test in the description of this ticket with the four selects labeled 0-3): IX lock in "/* 0 */ SELECT COUNT FROM t FOR UPDATE" IX in "/* 1 */ SELECT COUNT FROM t FOR UPDATE" IS in "/* 2 */ SELECT COUNT FROM t LOCK IN SHARE MODE" IS in "/* 3 */ SELECT COUNT FROM t LOCK IN SHARE MODE" what we want: X or S table wide locks in these examples BEGIN ; SELECT * FROM t1 LOCK IN SHARE MODE; ... -- S lock BEGIN ; SELECT * FROM t1 FOR UPDATE ; ... -- X lock BEGIN ; DELETE FROM t1; ... -- X lock why the autocommit trick works // in ha_innobase::external_lock if (sql_command == SQLCOM_LOCK_TABLES && THDVAR(thd, table_locks) && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT) && thd_in_lock_tables(thd)) {   dberr_t error = row_lock_table(m_prebuilt);

            ycp, thank you. It seems to me that this logic to acquire InnoDB table locks for LOCK TABLES is flawed. Elsewhere, such as in thd_trx_is_auto_commit() and innobase_register_trx(), we are checking for two flags, corresponding to autocommit=0 or being within an explicitly started transaction:

            diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
            index 418f39885f7..397d95dbbdd 100644
            --- a/storage/innobase/handler/ha_innodb.cc
            +++ b/storage/innobase/handler/ha_innodb.cc
            @@ -16180,7 +16180,8 @@ ha_innobase::external_lock(
             
             			if (sql_command == SQLCOM_LOCK_TABLES
             			    && THDVAR(thd, table_locks)
            -			    && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT)
            +			    && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT
            +						| OPTION_BEGIN)
             			    && thd_in_lock_tables(thd)) {
             
             				dberr_t	error = row_lock_table(m_prebuilt);
            

            In fact, this call is the only one where we are checking for only one of these two flags. Can you check if applying that fix would make LOCK TABLES behave better? This would have to be fixed in a separate bug, to provide a simpler work-around for this bug.

            marko Marko Mäkelä added a comment - ycp , thank you. It seems to me that this logic to acquire InnoDB table locks for LOCK TABLES is flawed. Elsewhere, such as in thd_trx_is_auto_commit() and innobase_register_trx() , we are checking for two flags, corresponding to autocommit=0 or being within an explicitly started transaction: diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index 418f39885f7..397d95dbbdd 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -16180,7 +16180,8 @@ ha_innobase::external_lock( if (sql_command == SQLCOM_LOCK_TABLES && THDVAR(thd, table_locks) - && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT) + && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT + | OPTION_BEGIN) && thd_in_lock_tables(thd)) { dberr_t error = row_lock_table(m_prebuilt); In fact, this call is the only one where we are checking for only one of these two flags. Can you check if applying that fix would make LOCK TABLES behave better? This would have to be fixed in a separate bug, to provide a simpler work-around for this bug.
            ycp Yuchen Pei added a comment -

            Just to summarise discussions on the slack thread.

            > Can you check if applying that fix would make LOCK TABLES behave better?

            I tried this. Unfortunately LOCK TABLES resets the flags OPTION_BEGIN in trans_commit_implicit():

            // in mysql_execute_command():
              case SQLCOM_LOCK_TABLES:
                /* We must end the transaction first, regardless of anything */
                res= trans_commit_implicit(thd);
             
            // in trans_commit_implicit():
              thd->variables.option_bits&= ~(OPTION_BEGIN | OPTION_KEEP_LOG);
            

            And the resetting was from very early commits (<= 2010). So LOCK TABLES implicitly commits a current transaction, regardless of the value of autocommit.

            marko remarked "Ideally, LOCK TABLES would start transaction and UNLOCK TABLES would commit it. All this seems rather messy."

            ycp Yuchen Pei added a comment - Just to summarise discussions on the slack thread. > Can you check if applying that fix would make LOCK TABLES behave better? I tried this. Unfortunately LOCK TABLES resets the flags OPTION_BEGIN in trans_commit_implicit() : // in mysql_execute_command(): case SQLCOM_LOCK_TABLES: /* We must end the transaction first, regardless of anything */ res= trans_commit_implicit(thd);   // in trans_commit_implicit(): thd->variables.option_bits&= ~(OPTION_BEGIN | OPTION_KEEP_LOG); And the resetting was from very early commits (<= 2010). So LOCK TABLES implicitly commits a current transaction, regardless of the value of autocommit . marko remarked "Ideally, LOCK TABLES would start transaction and UNLOCK TABLES would commit it. All this seems rather messy."
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.3 [ 28565 ]
            marko Marko Mäkelä made changes -
            ycp Yuchen Pei added a comment - - edited

            marko and psergei: I put together a poc that works for a minimal example for MDEV-24813, as a draft pull request <https://github.com/MariaDB/server/pull/3487>:

            commit 701924a2c12ef20135615340bf4f5dec11bcbfc7 (HEAD -> mdev-24813, upstream/mdev-24813)
            Author: Yuchen Pei <ycp@mariadb.com>
            Date:   Thu Aug 29 15:26:40 2024 +1000
             
                MDEV-24813 [poc] Signal full table lock when cond is NULL
                
                Will work in the following minimal example
                
                --source include/have_innodb.inc
                CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;
                insert into t values (42);
                
                BEGIN;
                SELECT * FROM t LOCK IN SHARE MODE; # will acquire S lock
                COMMIT;
                
                DROP TABLE t;
            

            It is of course a quick hack, but is this similar to what you have in mind?

            ycp Yuchen Pei added a comment - - edited marko and psergei : I put together a poc that works for a minimal example for MDEV-24813 , as a draft pull request < https://github.com/MariaDB/server/pull/3487 >: commit 701924a2c12ef20135615340bf4f5dec11bcbfc7 (HEAD -> mdev-24813, upstream/mdev-24813) Author: Yuchen Pei <ycp@mariadb.com> Date: Thu Aug 29 15:26:40 2024 +1000   MDEV-24813 [poc] Signal full table lock when cond is NULL Will work in the following minimal example --source include/have_innodb.inc CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB; insert into t values (42); BEGIN; SELECT * FROM t LOCK IN SHARE MODE; # will acquire S lock COMMIT; DROP TABLE t; It is of course a quick hack, but is this similar to what you have in mind?
            ycp Yuchen Pei added a comment -

            Hi psergei, can you take a look at the draft pull request mentioned in the previous comment, to see whether it matches the cond_push concept of what you were proposing?

            ycp Yuchen Pei added a comment - Hi psergei , can you take a look at the draft pull request mentioned in the previous comment, to see whether it matches the cond_push concept of what you were proposing?
            ycp Yuchen Pei made changes -
            Assignee Yuchen Pei [ JIRAUSER52627 ] Sergei Petrunia [ psergey ]
            Status Confirmed [ 10101 ] In Review [ 10002 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 11.1 [ 28549 ]
            Roel Roel Van de Paar made changes -
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 11.2(EOL) [ 28603 ]

            People

              psergei Sergei Petrunia
              marko Marko Mäkelä
              Votes:
              3 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.