[MDEV-24813] Locking full table scan fails to use table-level locking - Jira

Details

Type: Bug
Status: In Review (View Workflow)
Priority: Critical
Resolution: Unresolved
Affects Version/s: 10.5, 10.11, 11.5(EOL)
Fix Version/s: 10.5, 10.6, 10.11, 11.4
Component/s: Locking, Optimizer, Storage Engine - InnoDB
Labels:
- optimizer-feature
- performance

Description

The following test case shows that table locks are not being passed to the storage engine in any of the cases of locking read.

--source include/have_innodb.inc

--source include/have_sequence.inc

--source include/not_debug.inc

SET @save_freq=@@GLOBAL.innodb_purge_rseg_truncate_frequency;

SET GLOBAL innodb_purge_rseg_truncate_frequency=1;

CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;

INSERT INTO t(a) SELECT seq FROM seq_0_to_1000000;

--source include/wait_all_purged.inc

SET GLOBAL innodb_purge_rseg_truncate_frequency=@save_freq;

SET profiling_history_size=100;

SET profiling = 1;

START TRANSACTION;

LOCK TABLE t WRITE;

SELECT COUNT(*) FROM t FOR UPDATE;

UNLOCK TABLES;

COMMIT;

SELECT COUNT(*) FROM t FOR UPDATE;

START TRANSACTION;

LOCK TABLE t READ;

SELECT COUNT(*) FROM t LOCK IN SHARE MODE;

UNLOCK TABLES;

COMMIT;

SELECT COUNT(*) FROM t LOCK IN SHARE MODE;

SHOW PROFILES;

SET profiling = 0;

DROP TABLE t;

I would have expected that an explicit LOCK TABLE would be passed to the storage engine, like it seems to have been the case during the development of ~~MDEV-14479~~.

I would also expect the optimizer to automatically pass information to the storage engine so that the table can be locked upfront, if we know that the entire table will be scanned in a locking operation. This should include the following:

CREATE…SELECT
INSERT…SELECT
SELECT…LOCK IN SHARE MODE
SELECT…FOR UPDATE
SELECT executed at TRANSACTION ISOLATION LEVEL SERIALIZABLE
UPDATE
DELETE

If no WHERE or JOIN condition applies to the table, nor a LIMIT is present, then we should expect the operation to scan the entire table, and the storage engine should be requested to lock the entire table.

~~MDEV-14479~~ fixed the InnoDB record locking in such a way that when the table has already been locked in the corresponding mode, no individual record locks will be allocated and created. If we set a breakpoint in lock_create_low() after ha_innobase::create() has finished in the above test case, we will observe the following:

10.5 ffc5d064895cadbc42711efd7dbb6ae1b323f050
Thread 15 "mariadbd" hit Breakpoint 1, lock_rec_create_low (c_lock=0x0,
thr=0x0, type_mode=3, page_id={m_id = 21474836484},
page=0x7ffff0b70000 "", heap_no=2, index=0x7fffbc01af40,
trx=0x7ffff17260d8, holds_trx_mutex=false)
at /mariadb/10.5m/storage/innobase/lock/lock0lock.cc:1294
(gdb) p index.table.locks
$7 = {count = 1, start = 0x7ffff1726cb0, end = 0x7ffff1726cb0,
node = &lock_table_t::locks}
(gdb) p *index.table.locks.start
$8 = {trx = 0x7ffff17260d8, trx_locks = {prev = 0x0, next = 0x0},
index = 0x0, hash = 0x0, requested_time = 0, wait_time = 0, un_member = {
tab_lock = {table = 0x7fffbc019e10, locks = {prev = 0x0, next = 0x0}},
rec_lock = {page_id = {m_id = 140736347610640}, n_bits = 0}},
type_mode = 17}

Note: type_mode == 17 == LOCK_TABLE | LOCK_IX. For the LOCK IN SHARE MODE, it would be 16 == LOCK_TABLE | LOCK_IS. We would expect the table lock to exist in LOCK_X or LOCK_S mode, so that no locks will have to be allocated for each individual visited row.

If we change the CREATE TABLE to CREATE TEMPORARY TABLE, we can get an accurate estimate of how fast the execution would be without the row-level locking.

On my system, with locking, each full-table-scan consumes between 0.49 and 0.66 seconds (the first run being the slowest). This can probably be attributed to dynamic memory allocation.

With a temporary table, each locking full-table scan consumes between 0.28 and 0.30 seconds. The memory operations related to row-level locking are roughly doubling the execution time (and seriously increasing the memory footprint)!

Attachments

Issue Links

is duplicated by

MDEV-21974 InnoDB DML under backup locks make buffer pool usage grow permanently

Open

relates to

MDEV-10719 'create temporary table as select' generates unnecessary table locks

Confirmed

MDEV-17512 Deadlock after locking all rows in table

Open

MDEV-27992 DELETE fails to delete record after blocking is released

Closed

MDEV-14479 Do not acquire InnoDB record locks when covering table locks exist

Closed

MDEV-16232 Use fewer mini-transactions

Stalled

MDEV-17918 InnoDB sometimes prints incorrect buffer pool size in the "Over xx percent of the buffer pool is occupied" message

Open

MDEV-20612 Improve InnoDB lock_sys scalability

Closed

MDEV-28800 SIGABRT due to running out of memory for InnoDB locks

Closed

MDEV-28803 ERROR 1206 (HY000): The total number of locks exceeds the lock table size

Confirmed

MDEV-28804 Increased lock objects in 10.6+ leading to significant slowdown when using a limited InnoDB buffer pool, including in release builds

Stalled

MDEV-29565 Inconsistent read and write, which use the same predicate (WHERE clause)

Closed

MDEV-30368 Locking behaviour is missing in documentation

Open

MDEV-34719 Disable purge for LOAD DATA INFILE into empty table

Closed

(9 relates to)

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä created issue - 2021-02-09 06:37

Marko Mäkelä made changes - 2021-02-09 06:37

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-14479~~ [ ~~MDEV-14479~~ ]

Marko Mäkelä made changes - 2021-02-09 06:37

Link

This issue relates to ~~MDEV-20612~~ [ ~~MDEV-20612~~ ]

Marko Mäkelä made changes - 2021-02-09 06:37

Link

This issue relates to MDEV-16232 [ MDEV-16232 ]

Sergei Petrunia added a comment - 2021-02-10 11:45

I think this should be implemented in steps. How about defining the first step as follows:

If
1. this is a DML statement (which does Locking Reads in InnODB)
2. and the query is doing a full table scan without row filtering
then:
3. Use a table-level lock.

InnoDB already knows about whether #1 is true.
The only issue is #2.

#2 can be implemented as follows:
before (or right after) the handler->rnd_init(scan=false) call, the SQL layer will make an extra call to inform the InnoDB storage engine what condition will be attached to the table.

(TODO: check if this can be unified with Table Condition Pushdown).

Sergei Petrunia added a comment - 2021-02-10 11:45 I think this should be implemented in steps. How about defining the first step as follows: If 1. this is a DML statement (which does Locking Reads in InnODB) 2. and the query is doing a full table scan without row filtering then: 3. Use a table-level lock. InnoDB already knows about whether #1 is true. The only issue is #2. #2 can be implemented as follows: before (or right after) the handler->rnd_init(scan=false) call, the SQL layer will make an extra call to inform the InnoDB storage engine what condition will be attached to the table. (TODO: check if this can be unified with Table Condition Pushdown).

Sergei Petrunia added a comment - 2021-02-10 13:34

Implementation suggestion is below. Need to think this through.

== Use Table Condition Pushdown for WHERE ==

There is already handler::cond_push() call which informs the storage engine
about the condition. InnoDB could implement it and this way see whether
a condition is being present.

Possible issues:
Issue 1:
setting optimizer_switch='table_condition_pushdown=off' will cause
table-level locks to be acquired! Is this ok?
It is a common practice among support/etc to disable optimizer switch
flags to see what happens...

Issue 2:
InnoDB will take a note about whether there is a pushed condition, but it
will not check the condition.

That is, we'll have:

  COND *ha_innobase::cond_push(COND *arg) {

...

     return arg;

what should be in handler::pushed_cond, then? The code checks this member and seeing NULL interprets it as "nothing is pushed". Is this ok? If it is not, should we put "Item(1)" there?

== Introduce push_limit ==

To inform the storage engine about the LIMIT clause, we introduce

void handler::push_limit(ha_rows soft_limit)

Which serves as a hint to the storage engine that the SQL layer is not going to read more than soft_limit rows in a scan.

The idea is that this is generally useful for storage engines that do batching - they should not batch much more than soft_limit rows.

Sergei Petrunia added a comment - 2021-02-10 13:34 Implementation suggestion is below. Need to think this through. == Use Table Condition Pushdown for WHERE == There is already handler::cond_push() call which informs the storage engine about the condition. InnoDB could implement it and this way see whether a condition is being present. Possible issues: Issue 1: setting optimizer_switch='table_condition_pushdown=off' will cause table-level locks to be acquired! Is this ok? It is a common practice among support/etc to disable optimizer switch flags to see what happens... Issue 2: InnoDB will take a note about whether there is a pushed condition, but it will not check the condition. That is, we'll have: COND *ha_innobase::cond_push(COND *arg) { ... return arg; } what should be in handler::pushed_cond, then? The code checks this member and seeing NULL interprets it as "nothing is pushed". Is this ok? If it is not, should we put "Item(1)" there? == Introduce push_limit == To inform the storage engine about the LIMIT clause, we introduce void handler::push_limit(ha_rows soft_limit) Which serves as a hint to the storage engine that the SQL layer is not going to read more than soft_limit rows in a scan. The idea is that this is generally useful for storage engines that do batching - they should not batch much more than soft_limit rows.

Marko Mäkelä made changes - 2021-04-13 15:40

Link

This issue relates to MDEV-17512 [ MDEV-17512 ]

Marko Mäkelä made changes - 2021-04-26 13:08

Link

This issue relates to MDEV-17918 [ MDEV-17918 ]

Sergei Petrunia made changes - 2021-08-05 10:43

Labels

performance

optimizer-feature performance

Sergei Golubchik made changes - 2021-12-06 21:34

Workflow

MariaDB v3 [ 118952 ]

MariaDB v4 [ 142575 ]

Marko Mäkelä made changes - 2022-03-03 12:46

Link

This issue relates to ~~MDEV-27992~~ [ ~~MDEV-27992~~ ]

Marko Mäkelä made changes - 2022-06-13 05:49

Link

This issue is duplicated by MDEV-28804 [ MDEV-28804 ]

Roel Van de Paar added a comment - 2022-08-30 06:49 - edited

Marko informed me that this is the same bug:

CREATE TABLE IF NOT EXISTS v0 ( v1 VARCHAR ( 82 ) ) ;

INSERT INTO v0 ( v1 ) VALUES ( 25 ) , ( ( SELECT -1 WHERE ( v1 NOT IN ( v1 NOT IN ( SELECT v1 ) ) AND v1 NOT IN ( 80 ) ) ) ) , ( 89215105.000000 ) , ( 0 ) , ( 2147483647 ) , ( 127 ) , ( 74 ) , ( -128 ) ;

INSERT INTO v0 ( v1 ) SELECT v1 FROM ( SELECT 'x' FROM v0 AS v10 , v0 AS v9 NATURAL JOIN v0 AS v8 , v0 ) AS v2 , ( SELECT 40561621.000000 FROM v0 AS v7 , v0 AS v6 , v0 AS v5 JOIN v0 ) AS v3 JOIN v0 AS v4 NATURAL JOIN v0 ORDER BY v1 ;

Leads to:

10.11.0-opt>CREATE TABLE IF NOT EXISTS v0 ( v1 VARCHAR ( 82 ) ) ;

Query OK, 0 rows affected (0.015 sec)

10.11.0-opt> INSERT INTO v0 ( v1 ) VALUES ( 25 ) , ( ( SELECT -1 WHERE ( v1 NOT IN ( v1 NOT IN ( SELECT v1 ) ) AND v1 NOT IN ( 80 ) ) ) ) , ( 89215105.000000 ) , ( 0 ) , ( 2147483647 ) , ( 127 ) , ( 74 ) , ( -128 ) ;

Query OK, 8 rows affected (0.002 sec)

Records: 8  Duplicates: 0  Warnings: 0

10.11.0-opt> INSERT INTO v0 ( v1 ) SELECT v1 FROM ( SELECT 'x' FROM v0 AS v10 , v0 AS v9 NATURAL JOIN v0 AS v8 , v0 ) AS v2 , ( SELECT 40561621.000000 FROM v0 AS v7 , v0 AS v6 , v0 AS v5 JOIN v0 ) AS v3 JOIN v0 AS v4 NATURAL JOIN v0 ORDER BY v1 ;

ERROR 1206 (HY000): The total number of locks exceeds the lock table size

And

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Optimized)
2022-08-30 8:43:42 0 [Note] /test/MD190822-mariadb-10.11.0-linux-x86_64-opt/bin/mysqld: ready for connections.
Version: '10.11.0-MariaDB' socket: '/test/MD190822-mariadb-10.11.0-linux-x86_64-opt/socket.sock' port: 10942 MariaDB Server
2022-08-30 8:44:35 4 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=126M. Starting the InnoDB Monitor to print diagnostics.

=====================================
2022-08-30 08:44:35 0x1466fd360700 INNODB MONITOR OUTPUT
...

If it would help, I can reduce this testcase further.

Roel Van de Paar added a comment - 2022-08-30 06:49 - edited Marko informed me that this is the same bug: CREATE TABLE IF NOT EXISTS v0 ( v1 VARCHAR ( 82 ) ) ; INSERT INTO v0 ( v1 ) VALUES ( 25 ) , ( ( SELECT -1 WHERE ( v1 NOT IN ( v1 NOT IN ( SELECT v1 ) ) AND v1 NOT IN ( 80 ) ) ) ) , ( 89215105.000000 ) , ( 0 ) , ( 2147483647 ) , ( 127 ) , ( 74 ) , ( -128 ) ; INSERT INTO v0 ( v1 ) SELECT v1 FROM ( SELECT 'x' FROM v0 AS v10 , v0 AS v9 NATURAL JOIN v0 AS v8 , v0 ) AS v2 , ( SELECT 40561621.000000 FROM v0 AS v7 , v0 AS v6 , v0 AS v5 JOIN v0 ) AS v3 JOIN v0 AS v4 NATURAL JOIN v0 ORDER BY v1 ; Leads to: 10.11.0-opt>CREATE TABLE IF NOT EXISTS v0 ( v1 VARCHAR ( 82 ) ) ; Query OK, 0 rows affected (0.015 sec) 10.11.0-opt> INSERT INTO v0 ( v1 ) VALUES ( 25 ) , ( ( SELECT -1 WHERE ( v1 NOT IN ( v1 NOT IN ( SELECT v1 ) ) AND v1 NOT IN ( 80 ) ) ) ) , ( 89215105.000000 ) , ( 0 ) , ( 2147483647 ) , ( 127 ) , ( 74 ) , ( -128 ) ; Query OK, 8 rows affected (0.002 sec) Records: 8 Duplicates: 0 Warnings: 0 10.11.0-opt> INSERT INTO v0 ( v1 ) SELECT v1 FROM ( SELECT 'x' FROM v0 AS v10 , v0 AS v9 NATURAL JOIN v0 AS v8 , v0 ) AS v2 , ( SELECT 40561621.000000 FROM v0 AS v7 , v0 AS v6 , v0 AS v5 JOIN v0 ) AS v3 JOIN v0 AS v4 NATURAL JOIN v0 ORDER BY v1 ; ERROR 1206 (HY000): The total number of locks exceeds the lock table size And 10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Optimized) 2022-08-30 8:43:42 0 [Note] /test/MD190822-mariadb-10.11.0-linux-x86_64-opt/bin/mysqld: ready for connections. Version: '10.11.0-MariaDB' socket: '/test/MD190822-mariadb-10.11.0-linux-x86_64-opt/socket.sock' port: 10942 MariaDB Server 2022-08-30 8:44:35 4 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=126M. Starting the InnoDB Monitor to print diagnostics. ===================================== 2022-08-30 08:44:35 0x1466fd360700 INNODB MONITOR OUTPUT ... If it would help, I can reduce this testcase further.

Roel Van de Paar made changes - 2022-08-30 06:50

Status

Open [ 1 ]

Confirmed [ 10101 ]

Roel Van de Paar made changes - 2022-08-30 06:50

Affects Version/s

10.11 [ 27614 ]

Marko Mäkelä made changes - 2022-09-20 07:43

Link

This issue relates to ~~MDEV-29565~~ [ ~~MDEV-29565~~ ]

Marko Mäkelä made changes - 2023-01-09 15:59

Link

This issue relates to MDEV-30368 [ MDEV-30368 ]

Marko Mäkelä made changes - 2023-04-11 08:48

Link

This issue relates to MDEV-10719 [ MDEV-10719 ]

Sergei Golubchik added a comment - 2023-08-31 09:25

May be InnoDB should automatically upgrade row locks to a table lock, when there're too many of them?
This is a fairly standard behavior of many database engines.

Sergei Golubchik added a comment - 2023-08-31 09:25 May be InnoDB should automatically upgrade row locks to a table lock, when there're too many of them? This is a fairly standard behavior of many database engines.

Marko Mäkelä made changes - 2023-10-01 11:37

Link

This issue relates to ~~MDEV-28800~~ [ ~~MDEV-28800~~ ]

Yuchen Pei made changes - 2023-12-14 23:19

Assignee

Sergei Petrunia [ psergey ]

Yuchen Pei [ JIRAUSER52627 ]

Marko Mäkelä made changes - 2024-01-26 13:50

Link

This issue relates to MDEV-28804 [ MDEV-28804 ]

Marko Mäkelä made changes - 2024-01-26 13:51

Link

This issue is duplicated by MDEV-28804 [ MDEV-28804 ]

Yuchen Pei made changes - 2024-01-31 06:47

Priority

Major [ 3 ]

Critical [ 2 ]

Roel Van de Paar added a comment - 2024-04-01 23:46

Verified also in 11.5

11.5.0 eeba940311ed17d160023280783fd2bbb64abef3 (Optimized)
2024-04-02 10:38:02 0 [Note] /test/MD190224-mariadb-11.5.0-linux-x86_64-opt/bin/mariadbd: ready for connections.
Version: '11.5.0-MariaDB' socket: '/test/MD190224-mariadb-11.5.0-linux-x86_64-opt/socket.sock' port: 10082 MariaDB Server
2024-04-02 10:38:38 4 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=126M. Starting the InnoDB Monitor to print diagnostics.

=====================================
2024-04-02 10:38:38 0x153b737fb6c0 INNODB MONITOR OUTPUT
...

Roel Van de Paar added a comment - 2024-04-01 23:46 Verified also in 11.5 11.5.0 eeba940311ed17d160023280783fd2bbb64abef3 (Optimized) 2024-04-02 10:38:02 0 [Note] /test/MD190224-mariadb-11.5.0-linux-x86_64-opt/bin/mariadbd: ready for connections. Version: '11.5.0-MariaDB' socket: '/test/MD190224-mariadb-11.5.0-linux-x86_64-opt/socket.sock' port: 10082 MariaDB Server 2024-04-02 10:38:38 4 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=126M. Starting the InnoDB Monitor to print diagnostics. ===================================== 2024-04-02 10:38:38 0x153b737fb6c0 INNODB MONITOR OUTPUT ...

Roel Van de Paar made changes - 2024-04-01 23:47

Fix Version/s		10.6 [ 24028 ]
Fix Version/s		10.11 [ 27614 ]
Fix Version/s		11.0 [ 28320 ]
Fix Version/s		11.1 [ 28549 ]
Fix Version/s		11.2 [ 28603 ]
Fix Version/s		11.3 [ 28565 ]
Fix Version/s		11.4 [ 29301 ]
Affects Version/s		11.5 [ 29506 ]

Marko Mäkelä made changes - 2024-04-03 09:49

Link

This issue is duplicated by MDEV-21974 [ MDEV-21974 ]

Yuchen Pei added a comment - 2024-04-15 05:04 - edited

I tried to understand how innodb acquire table locks, and given the relevance of ~~MDEV-14479~~ (marko also asked me to look into that ticket), I extracted the tests from the change (721ec44e2ad), as follows

--source include/have_innodb.inc

# Turn on all monitor counters

set global innodb_monitor_enable = all;

CREATE TABLE fl0 (

  id INT NOT NULL PRIMARY KEY

) ENGINE = InnoDB;

CREATE TABLE fl1 (

  id INT NOT NULL PRIMARY KEY,

  fl0_id INT,

  CONSTRAINT `fkl0`

    FOREIGN KEY (fl0_id) REFERENCES fl0 (id)

    ON DELETE CASCADE

    ON UPDATE RESTRICT

) ENGINE = InnoDB;

CREATE TABLE fl2 (

  id INT NOT NULL PRIMARY KEY,

  fl1_id INT,

  CONSTRAINT `fkl1`

    FOREIGN KEY (fl1_id) REFERENCES fl1 (id)

    ON DELETE CASCADE

    ON UPDATE SET NULL

) ENGINE = InnoDB;

INSERT INTO fl0 VALUES (1000);

INSERT INTO fl1 VALUES (500, 1000), (1500, 1000);

INSERT INTO fl2 VALUES (200, 500), (800, 500), (1200, 1500), (1800, 1500);

CREATE TABLE t1(id INT PRIMARY KEY, a INT, b CHAR(1), UNIQUE KEY u(a,b))

ENGINE=InnoDB;

SET @start = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME

= 'lock_rec_lock_created');

BEGIN;

INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d');

DELETE FROM t1 WHERE a = 9999 AND b='b';

COMMIT;

# After MDEV-30225 is fixed, the above DELETE creates next-key lock during

# secondary index unique search. That's why the result of the following must

# be 1.

SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME

= 'lock_rec_lock_created');

SELECT @end - @start;

--echo # Records must not be S/X-locked if a table is X-locked.

SET @start = @end;

# Tables will not be locked if autocommit is not 0.

# See OPTION_NOT_AUTOCOMMIT in ha_innobase::external_lock().

SET autocommit = 0;

BEGIN;

LOCK TABLE t1 WRITE;

SELECT * FROM t1;

SELECT * FROM t1 WHERE a>=10000;

SELECT * FROM t1 FOR UPDATE;

SELECT * FROM t1 WHERE a>=10000 FOR UPDATE;

UPDATE t1 SET b = 'b' WHERE id = 4;

UPDATE t1 SET b = 'b' WHERE a = 10000;

REPLACE INTO t1 VALUES (4,3,'a');

INSERT INTO t1 VALUES (3,1,'e') ON DUPLICATE KEY UPDATE b = 'b';

INSERT INTO t1 VALUES (5,5,'e');

DELETE FROM t1 WHERE a = 1 AND b='a';

DELETE FROM t1;

COMMIT;

UNLOCK TABLES;

SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = 'lock_rec_lock_created');

SELECT @end - @start;

--echo # Records must not be S-locked if a table is S-locked.

SET @start = @end;

BEGIN;

LOCK TABLE t1 WRITE;

INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d');

DELETE FROM t1 WHERE a = 9999 AND b='b';

COMMIT;

UNLOCK TABLES;

BEGIN;

LOCK TABLE t1 READ;

SELECT * FROM t1 LOCK IN SHARE MODE;

SELECT * FROM t1 WHERE a>=10000 LOCK IN SHARE MODE;

COMMIT;

UNLOCK TABLES;

SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = 'lock_rec_lock_created');

SELECT @end - @start;

--echo # Records must not be S-locked for foreign keys enforcement

SET @start = @end;

BEGIN;

LOCK TABLE fl0 READ, fl1 READ, fl2 WRITE;

INSERT INTO fl2 VALUES (300, 500), (700, 500), (1300, 1500), (1700, 1500);

SELECT * FROM fl1 LOCK IN SHARE MODE;

COMMIT;

UNLOCK TABLES;

SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = 'lock_rec_lock_created');

SELECT @end - @start;

--echo # Records must not be X-locked for foreign keys cascade

SET @start = @end;

BEGIN;

LOCK TABLE fl0 READ, fl1 WRITE, fl2 WRITE;

DELETE FROM fl1 WHERE id = 1500;

UPDATE fl1 SET id = 2500 WHERE id = 500;

COMMIT;

UNLOCK TABLES;

SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = 'lock_rec_lock_created');

SELECT @end - @start;

SET autocommit = default;

DROP TABLE t1;

DROP TABLE fl2;

DROP TABLE fl1;

DROP TABLE fl0;

SET GLOBAL innodb_monitor_enable=default;

and added a branch to track when it goes into the branch of detecting table locks so that I can put a breakpoint there:

modified   storage/innobase/lock/lock0lock.cc

@@ -1956,7 +1956,8 @@ lock_rec_lock(

          lock_table_has(trx, index->table, LOCK_IX));

   if (lock_table_has(trx, index->table,

-                     static_cast<lock_mode>(LOCK_MODE_MASK & mode)));

+                     static_cast<lock_mode>(LOCK_MODE_MASK & mode)))

+	  std::cout << "yo" << std::endl;

   else if (lock_t *lock= lock_sys.get_first(block->page.id()))

     trx_mutex_enter(trx);

What I got was odd (tested at 10.5 8bc32410160265dcd27d595ed610843d08d32034). Every time it breaks there (at least statements marked with ! below), the trx and the table lock are the same, and a watchpoint on the table lock trx->lock.table_locks[0] shows it was updated at the insert statement into fl1. As you can see, this happens across BEGIN;...COMMIT; blocks.

INSERT INTO fl0 VALUES (1000);

INSERT INTO fl1 VALUES (500, 1000), (1500, 1000); # last update of the lock

# [... 25 lines elided]

BEGIN;

LOCK TABLE t1 WRITE;

SELECT * FROM t1;

SELECT * FROM t1 WHERE a>=10000;

SELECT * FROM t1 FOR UPDATE; # !

SELECT * FROM t1 WHERE a>=10000 FOR UPDATE; # !

UPDATE t1 SET b = 'b' WHERE id = 4;

UPDATE t1 SET b = 'b' WHERE a = 10000;

REPLACE INTO t1 VALUES (4,3,'a'); # !

INSERT INTO t1 VALUES (3,1,'e') ON DUPLICATE KEY UPDATE b = 'b';

INSERT INTO t1 VALUES (5,5,'e');

DELETE FROM t1 WHERE a = 1 AND b='a'; # !

DELETE FROM t1; # !

COMMIT;

# [... 6 lines elided]

BEGIN;

LOCK TABLE t1 WRITE;

INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d');

DELETE FROM t1 WHERE a = 9999 AND b='b'; # !

COMMIT;

UNLOCK TABLES;

BEGIN;

LOCK TABLE t1 READ;

SELECT * FROM t1 LOCK IN SHARE MODE;

SELECT * FROM t1 WHERE a>=10000 LOCK IN SHARE MODE;

COMMIT;

marko: is this expected? If so why?

Yuchen Pei added a comment - 2024-04-15 05:04 - edited I tried to understand how innodb acquire table locks, and given the relevance of MDEV-14479 ( marko also asked me to look into that ticket), I extracted the tests from the change (721ec44e2ad), as follows --source include/have_innodb.inc # Turn on all monitor counters set global innodb_monitor_enable = all ; CREATE TABLE fl0 ( id INT NOT NULL PRIMARY KEY ) ENGINE = InnoDB; CREATE TABLE fl1 ( id INT NOT NULL PRIMARY KEY , fl0_id INT , CONSTRAINT `fkl0` FOREIGN KEY (fl0_id) REFERENCES fl0 (id) ON DELETE CASCADE ON UPDATE RESTRICT ) ENGINE = InnoDB; CREATE TABLE fl2 ( id INT NOT NULL PRIMARY KEY , fl1_id INT , CONSTRAINT `fkl1` FOREIGN KEY (fl1_id) REFERENCES fl1 (id) ON DELETE CASCADE ON UPDATE SET NULL ) ENGINE = InnoDB; INSERT INTO fl0 VALUES (1000); INSERT INTO fl1 VALUES (500, 1000), (1500, 1000); INSERT INTO fl2 VALUES (200, 500), (800, 500), (1200, 1500), (1800, 1500); CREATE TABLE t1(id INT PRIMARY KEY , a INT , b CHAR (1), UNIQUE KEY u(a,b)) ENGINE=InnoDB; SET @start = ( SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = 'lock_rec_lock_created' ); BEGIN ; INSERT INTO t1 VALUES (1,1, 'a' ),(2,9999, 'b' ),(3,10000, 'c' ),(4,4, 'd' ); DELETE FROM t1 WHERE a = 9999 AND b= 'b' ; COMMIT ; # After MDEV-30225 is fixed, the above DELETE creates next - key lock during # secondary index unique search. That 's why the result of the following must # be 1. SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = ' lock_rec_lock_created '); SELECT @end - @start; --echo # Records must not be S/X-locked if a table is X-locked. SET @start = @end; # Tables will not be locked if autocommit is not 0. # See OPTION_NOT_AUTOCOMMIT in ha_innobase::external_lock(). SET autocommit = 0; BEGIN; LOCK TABLE t1 WRITE; SELECT * FROM t1; SELECT * FROM t1 WHERE a>=10000; SELECT * FROM t1 FOR UPDATE; SELECT * FROM t1 WHERE a>=10000 FOR UPDATE; UPDATE t1 SET b = ' b ' WHERE id = 4; UPDATE t1 SET b = ' b ' WHERE a = 10000; REPLACE INTO t1 VALUES (4,3,' a '); INSERT INTO t1 VALUES (3,1,' e ') ON DUPLICATE KEY UPDATE b = ' b '; INSERT INTO t1 VALUES (5,5,' e '); DELETE FROM t1 WHERE a = 1 AND b=' a '; DELETE FROM t1; COMMIT; UNLOCK TABLES; SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = ' lock_rec_lock_created '); SELECT @end - @start; --echo # Records must not be S-locked if a table is S-locked. SET @start = @end; BEGIN; LOCK TABLE t1 WRITE; INSERT INTO t1 VALUES(1,1,' a '),(2,9999,' b '),(3,10000,' c '),(4,4,' d '); DELETE FROM t1 WHERE a = 9999 AND b=' b '; COMMIT; UNLOCK TABLES; BEGIN; LOCK TABLE t1 READ; SELECT * FROM t1 LOCK IN SHARE MODE; SELECT * FROM t1 WHERE a>=10000 LOCK IN SHARE MODE; COMMIT; UNLOCK TABLES; SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = ' lock_rec_lock_created '); SELECT @end - @start; --echo # Records must not be S-locked for foreign keys enforcement SET @start = @end; BEGIN; LOCK TABLE fl0 READ, fl1 READ, fl2 WRITE; INSERT INTO fl2 VALUES (300, 500), (700, 500), (1300, 1500), (1700, 1500); SELECT * FROM fl1 LOCK IN SHARE MODE; COMMIT; UNLOCK TABLES; SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = ' lock_rec_lock_created '); SELECT @end - @start; --echo # Records must not be X-locked for foreign keys cascade SET @start = @end; BEGIN; LOCK TABLE fl0 READ, fl1 WRITE, fl2 WRITE; DELETE FROM fl1 WHERE id = 1500; UPDATE fl1 SET id = 2500 WHERE id = 500; COMMIT; UNLOCK TABLES; SET @end = (SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = ' lock_rec_lock_created'); SELECT @ end - @start; SET autocommit = default ; DROP TABLE t1; DROP TABLE fl2; DROP TABLE fl1; DROP TABLE fl0; SET GLOBAL innodb_monitor_enable= default ; and added a branch to track when it goes into the branch of detecting table locks so that I can put a breakpoint there: modified storage/innobase/lock/lock0lock.cc @@ -1956,7 +1956,8 @@ lock_rec_lock( lock_table_has(trx, index->table, LOCK_IX)); if (lock_table_has(trx, index->table, - static_cast<lock_mode>(LOCK_MODE_MASK & mode))); + static_cast<lock_mode>(LOCK_MODE_MASK & mode))) + std::cout << "yo" << std::endl; else if (lock_t *lock= lock_sys.get_first(block->page.id())) { trx_mutex_enter(trx); What I got was odd (tested at 10.5 8bc32410160265dcd27d595ed610843d08d32034). Every time it breaks there (at least statements marked with ! below), the trx and the table lock are the same, and a watchpoint on the table lock trx->lock.table_locks [0] shows it was updated at the insert statement into fl1 . As you can see, this happens across BEGIN;...COMMIT; blocks. INSERT INTO fl0 VALUES (1000); INSERT INTO fl1 VALUES (500, 1000), (1500, 1000); # last update of the lock # [... 25 lines elided] BEGIN ; LOCK TABLE t1 WRITE; SELECT * FROM t1; SELECT * FROM t1 WHERE a>=10000; SELECT * FROM t1 FOR UPDATE ; # ! SELECT * FROM t1 WHERE a>=10000 FOR UPDATE ; # ! UPDATE t1 SET b = 'b' WHERE id = 4; UPDATE t1 SET b = 'b' WHERE a = 10000; REPLACE INTO t1 VALUES (4,3, 'a' ); # ! INSERT INTO t1 VALUES (3,1, 'e' ) ON DUPLICATE KEY UPDATE b = 'b' ; INSERT INTO t1 VALUES (5,5, 'e' ); DELETE FROM t1 WHERE a = 1 AND b= 'a' ; # ! DELETE FROM t1; # ! COMMIT ; # [... 6 lines elided] BEGIN ; LOCK TABLE t1 WRITE; INSERT INTO t1 VALUES (1,1, 'a' ),(2,9999, 'b' ),(3,10000, 'c' ),(4,4, 'd' ); DELETE FROM t1 WHERE a = 9999 AND b= 'b' ; # ! COMMIT ; UNLOCK TABLES; BEGIN ; LOCK TABLE t1 READ ; SELECT * FROM t1 LOCK IN SHARE MODE; SELECT * FROM t1 WHERE a>=10000 LOCK IN SHARE MODE; COMMIT ; marko : is this expected? If so why?

Marko Mäkelä added a comment - 2024-04-18 04:52

ycp, sorry for the delay. Yes, it is expected that the table-wide exclusive lock (LOCK_X) covers all write operations in all indexes of the table, without a need to register LOCK_X on the visited records.

I just debugged this in 10.6. It turns out that LOCK TABLE t1 WRITE actually does acquire an exclusive lock (LOCK_X) on the entire table. I found this out by setting a breakpoint on lock_table_create in an rr replay of running the test with ./mtr --rr.

Note: to be 10.6 and debugging friendly, it is better to add STATS_PERSISTENT=0 to each CREATE TABLE statement; in 10.5 this was the default due to ~~MDEV-4750~~. We do not want ‘noise’ from dict_stats_save() in background tasks. Also, note that in 10.6 there is a function lock_sys_tables() that will acquire locks on InnoDB data dictionary tables. The first interesting lock_table_create call is after the last ha_innobase::create() (for CREATE TABLE t1) completed.

Right after the CREATE TABLE part, the first two lock_table_create calls are different. First, we have one LOCK_IX acquisition during the INSERT statement in the following:

BEGIN;

INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d');

DELETE FROM t1 WHERE a = 9999 AND b='b';

COMMIT;

When there are no conflicts, INSERT operations use implicit locking. That is, if the DB_TRX_ID column of a clustered index record identifies an active transaction, that record is exclusively locked by that transaction. So, during the above transaction, the only calls to lock_rec_create_low() will be for the DELETE statement. We must acquire LOCK_X on the record, because we only hold LOCK_IX and not LOCK_X on the table:

10.6 e459ce8336ba37ea12a405f29e5fae866fd28701
#0 lock_rec_create_low (c_lock=c_lock@entry=0x0, type_mode=type_mode@entry=3, page_id=page_id@entry={m_id = 34359738372}, page=0x7fd021910000 "", heap_no=heap_no@entry=3, index=index@entry=0x7fcff0063738,
trx=0x7fd021c00b80, holds_trx_mutex=false) at /mariadb/10.6m/storage/innobase/lock/lock0lock.cc:1225
#1 0x0000562a4eb4e165 in lock_rec_lock (impl=impl@entry=false, mode=mode@entry=3, block=block@entry=0x7fd021410190, heap_no=3, index=index@entry=0x7fcff0063738, thr=thr@entry=0x7fcff00658e0)
at /mariadb/10.6m/storage/innobase/lock/lock0lock.cc:1716
#2 0x0000562a4eb4ede4 in lock_sec_rec_read_check_and_lock (flags=flags@entry=0, block=block@entry=0x7fd021410190, rec=rec@entry=0x7fd02191008d "\200", index=index@entry=0x7fcff0063738,
offsets=offsets@entry=0x7fd0221fca10, mode=mode@entry=LOCK_X, gap_mode=0, thr=0x7fcff00658e0) at /mariadb/10.6m/storage/innobase/lock/lock0lock.cc:6004
#3 0x0000562a4ec1c1e8 in sel_set_rec_lock (pcur=pcur@entry=0x7fcff0065108, rec=0x7fd02191008d "\200", index=index@entry=0x7fcff0063738, offsets=0x7fd0221fca10, mode=3, type=0, thr=0x7fcff00658e0,
mtr=0x7fd0221fcdf0) at /mariadb/10.6m/storage/innobase/row/row0sel.cc:1369
#4 0x0000562a4ec1f961 in row_search_mvcc (buf=buf@entry=0x7fcff005eb08 "\377", mode=<optimized out>, prebuilt=0x7fcff0064f38, match_mode=<optimized out>, direction=direction@entry=0)
at /mariadb/10.6m/storage/innobase/row/row0sel.cc:5237

The next statement that reaches InnoDB and lock_table_create() is the following:

SET autocommit = 0;

BEGIN;

LOCK TABLE t1 WRITE;

This will indeed acquire an exclusive lock on the table. Now, let us set breakpoints on lock_rec_create_low() and trx_t::commit() and see what happens during the rest of this transaction:

SELECT * FROM t1;

SELECT * FROM t1 WHERE a>=10000;

SELECT * FROM t1 FOR UPDATE;

SELECT * FROM t1 WHERE a>=10000 FOR UPDATE;

UPDATE t1 SET b = 'b' WHERE id = 4;

UPDATE t1 SET b = 'b' WHERE a = 10000;

REPLACE INTO t1 VALUES (4,3,'a');

INSERT INTO t1 VALUES (3,1,'e') ON DUPLICATE KEY UPDATE b = 'b';

INSERT INTO t1 VALUES (5,5,'e');

DELETE FROM t1 WHERE a = 1 AND b='a';

DELETE FROM t1;

COMMIT;

It turns out that none of these DML statements invoke lock_rec_create_low(); we hit the trx_t::commit() on the COMMIT statement. The LOCK TABLE t1 WRITE is doing the right thing here, making all DML operations on t1 covered by the exclusive table lock.

Similarly, in the following section we only acquire LOCK_S (table-wide shared lock) on the table, and do not acquire any LOCK_S on the visited index records:

LOCK TABLE t1 READ;

SELECT * FROM t1 LOCK IN SHARE MODE;

SELECT * FROM t1 WHERE a>=10000 LOCK IN SHARE MODE;

COMMIT;

I think that for now, we can ignore the part that involves the tables whose names start with fl and involve FOREIGN KEY constraints.

This example shows that it already is possible to acquire shared or exclusive InnoDB tables lock from SQL. I had forgotten these details. To fix this bug we have to figure out how to modify the code (I think mostly outside InnoDB) so that LOCK_X or LOCK_S will be acquired if a locking operation will traverse the entire table.

Marko Mäkelä added a comment - 2024-04-18 04:52 ycp , sorry for the delay. Yes, it is expected that the table-wide exclusive lock ( LOCK_X ) covers all write operations in all indexes of the table, without a need to register LOCK_X on the visited records. I just debugged this in 10.6. It turns out that LOCK TABLE t1 WRITE actually does acquire an exclusive lock ( LOCK_X ) on the entire table. I found this out by setting a breakpoint on lock_table_create in an rr replay of running the test with ./mtr --rr . Note: to be 10.6 and debugging friendly, it is better to add STATS_PERSISTENT=0 to each CREATE TABLE statement; in 10.5 this was the default due to MDEV-4750 . We do not want ‘noise’ from dict_stats_save() in background tasks. Also, note that in 10.6 there is a function lock_sys_tables() that will acquire locks on InnoDB data dictionary tables. The first interesting lock_table_create call is after the last ha_innobase::create() (for CREATE TABLE t1 ) completed. Right after the CREATE TABLE part, the first two lock_table_create calls are different. First, we have one LOCK_IX acquisition during the INSERT statement in the following: BEGIN ; INSERT INTO t1 VALUES (1,1, 'a' ),(2,9999, 'b' ),(3,10000, 'c' ),(4,4, 'd' ); DELETE FROM t1 WHERE a = 9999 AND b= 'b' ; COMMIT ; When there are no conflicts, INSERT operations use implicit locking. That is, if the DB_TRX_ID column of a clustered index record identifies an active transaction, that record is exclusively locked by that transaction. So, during the above transaction, the only calls to lock_rec_create_low() will be for the DELETE statement. We must acquire LOCK_X on the record, because we only hold LOCK_IX and not LOCK_X on the table: 10.6 e459ce8336ba37ea12a405f29e5fae866fd28701 #0 lock_rec_create_low (c_lock=c_lock@entry=0x0, type_mode=type_mode@entry=3, page_id=page_id@entry={m_id = 34359738372}, page=0x7fd021910000 "", heap_no=heap_no@entry=3, index=index@entry=0x7fcff0063738, trx=0x7fd021c00b80, holds_trx_mutex=false) at /mariadb/10.6m/storage/innobase/lock/lock0lock.cc:1225 #1 0x0000562a4eb4e165 in lock_rec_lock (impl=impl@entry=false, mode=mode@entry=3, block=block@entry=0x7fd021410190, heap_no=3, index=index@entry=0x7fcff0063738, thr=thr@entry=0x7fcff00658e0) at /mariadb/10.6m/storage/innobase/lock/lock0lock.cc:1716 #2 0x0000562a4eb4ede4 in lock_sec_rec_read_check_and_lock (flags=flags@entry=0, block=block@entry=0x7fd021410190, rec=rec@entry=0x7fd02191008d "\200", index=index@entry=0x7fcff0063738, offsets=offsets@entry=0x7fd0221fca10, mode=mode@entry=LOCK_X, gap_mode=0, thr=0x7fcff00658e0) at /mariadb/10.6m/storage/innobase/lock/lock0lock.cc:6004 #3 0x0000562a4ec1c1e8 in sel_set_rec_lock (pcur=pcur@entry=0x7fcff0065108, rec=0x7fd02191008d "\200", index=index@entry=0x7fcff0063738, offsets=0x7fd0221fca10, mode=3, type=0, thr=0x7fcff00658e0, mtr=0x7fd0221fcdf0) at /mariadb/10.6m/storage/innobase/row/row0sel.cc:1369 #4 0x0000562a4ec1f961 in row_search_mvcc (buf=buf@entry=0x7fcff005eb08 "\377", mode=<optimized out>, prebuilt=0x7fcff0064f38, match_mode=<optimized out>, direction=direction@entry=0) at /mariadb/10.6m/storage/innobase/row/row0sel.cc:5237 The next statement that reaches InnoDB and lock_table_create() is the following: SET autocommit = 0; BEGIN ; LOCK TABLE t1 WRITE; This will indeed acquire an exclusive lock on the table. Now, let us set breakpoints on lock_rec_create_low() and trx_t::commit() and see what happens during the rest of this transaction: SELECT * FROM t1; SELECT * FROM t1 WHERE a>=10000; SELECT * FROM t1 FOR UPDATE ; SELECT * FROM t1 WHERE a>=10000 FOR UPDATE ; UPDATE t1 SET b = 'b' WHERE id = 4; UPDATE t1 SET b = 'b' WHERE a = 10000; REPLACE INTO t1 VALUES (4,3, 'a' ); INSERT INTO t1 VALUES (3,1, 'e' ) ON DUPLICATE KEY UPDATE b = 'b' ; INSERT INTO t1 VALUES (5,5, 'e' ); DELETE FROM t1 WHERE a = 1 AND b= 'a' ; DELETE FROM t1; COMMIT ; It turns out that none of these DML statements invoke lock_rec_create_low() ; we hit the trx_t::commit() on the COMMIT statement. The LOCK TABLE t1 WRITE is doing the right thing here, making all DML operations on t1 covered by the exclusive table lock. Similarly, in the following section we only acquire LOCK_S (table-wide shared lock) on the table, and do not acquire any LOCK_S on the visited index records: LOCK TABLE t1 READ ; SELECT * FROM t1 LOCK IN SHARE MODE; SELECT * FROM t1 WHERE a>=10000 LOCK IN SHARE MODE; COMMIT ; I think that for now, we can ignore the part that involves the tables whose names start with fl and involve FOREIGN KEY constraints. This example shows that it already is possible to acquire shared or exclusive InnoDB tables lock from SQL. I had forgotten these details. To fix this bug we have to figure out how to modify the code (I think mostly outside InnoDB) so that LOCK_X or LOCK_S will be acquired if a locking operation will traverse the entire table.

Yuchen Pei added a comment - 2024-04-19 02:58 - edited

quick recap

what we've got so far
- mdev_14479.test (the testcase in my previous comment):
  - IX lock in INSERT INTO fl0 VALUES (1000)
  - IX lock in INSERT INTO fl1 VALUES (500, 1000), (1500, 1000)
  - IX lock in INSERT INTO fl2 VALUES (200, 500), (800, 500), (1200, 1500), (1800, 1500)
  - IX lock in "INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d')"
  - X lock in "LOCK TABLE t1 WRITE" (because of the autocommit trick)
  - X lock in second "LOCK TABLE t1 WRITE"
- mdev_24813.test (the test in the description of this ticket with the four selects labeled 0-3):
  - IX lock in "/* 0 */ SELECT COUNT FROM t FOR UPDATE"
  - IX in "/* 1 */ SELECT COUNT FROM t FOR UPDATE"
  - IS in "/* 2 */ SELECT COUNT FROM t LOCK IN SHARE MODE"
  - IS in "/* 3 */ SELECT COUNT FROM t LOCK IN SHARE MODE"

what we want: X or S table wide locks in these examples

BEGIN; SELECT * FROM t1 LOCK IN SHARE MODE; ... -- S lock

BEGIN; SELECT * FROM t1 FOR UPDATE; ...         -- X lock

BEGIN; DELETE FROM t1; ...                      -- X lock

why the autocommit trick works

// in ha_innobase::external_lock

			if (sql_command == SQLCOM_LOCK_TABLES

			    && THDVAR(thd, table_locks)

			    && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT)

			    && thd_in_lock_tables(thd)) {

				dberr_t	error = row_lock_table(m_prebuilt);

Yuchen Pei added a comment - 2024-04-19 02:58 - edited quick recap what we've got so far mdev_14479.test (the testcase in my previous comment): IX lock in INSERT INTO fl0 VALUES (1000) IX lock in INSERT INTO fl1 VALUES (500, 1000), (1500, 1000) IX lock in INSERT INTO fl2 VALUES (200, 500), (800, 500), (1200, 1500), (1800, 1500) IX lock in "INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d')" X lock in "LOCK TABLE t1 WRITE" (because of the autocommit trick) X lock in second "LOCK TABLE t1 WRITE" mdev_24813.test (the test in the description of this ticket with the four selects labeled 0-3): IX lock in "/* 0 */ SELECT COUNT FROM t FOR UPDATE" IX in "/* 1 */ SELECT COUNT FROM t FOR UPDATE" IS in "/* 2 */ SELECT COUNT FROM t LOCK IN SHARE MODE" IS in "/* 3 */ SELECT COUNT FROM t LOCK IN SHARE MODE" what we want: X or S table wide locks in these examples BEGIN ; SELECT * FROM t1 LOCK IN SHARE MODE; ... -- S lock BEGIN ; SELECT * FROM t1 FOR UPDATE ; ... -- X lock BEGIN ; DELETE FROM t1; ... -- X lock why the autocommit trick works // in ha_innobase::external_lock if (sql_command == SQLCOM_LOCK_TABLES && THDVAR(thd, table_locks) && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT) && thd_in_lock_tables(thd)) { dberr_t error = row_lock_table(m_prebuilt);

Marko Mäkelä added a comment - 2024-04-19 05:17

ycp, thank you. It seems to me that this logic to acquire InnoDB table locks for LOCK TABLES is flawed. Elsewhere, such as in thd_trx_is_auto_commit() and innobase_register_trx(), we are checking for two flags, corresponding to autocommit=0 or being within an explicitly started transaction:

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc

index 418f39885f7..397d95dbbdd 100644

--- a/storage/innobase/handler/ha_innodb.cc

+++ b/storage/innobase/handler/ha_innodb.cc

@@ -16180,7 +16180,8 @@ ha_innobase::external_lock(

 			if (sql_command == SQLCOM_LOCK_TABLES

 			    && THDVAR(thd, table_locks)

-			    && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT)

+			    && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT

+						| OPTION_BEGIN)

 			    && thd_in_lock_tables(thd)) {

 				dberr_t	error = row_lock_table(m_prebuilt);

In fact, this call is the only one where we are checking for only one of these two flags. Can you check if applying that fix would make LOCK TABLES behave better? This would have to be fixed in a separate bug, to provide a simpler work-around for this bug.

Marko Mäkelä added a comment - 2024-04-19 05:17 ycp , thank you. It seems to me that this logic to acquire InnoDB table locks for LOCK TABLES is flawed. Elsewhere, such as in thd_trx_is_auto_commit() and innobase_register_trx() , we are checking for two flags, corresponding to autocommit=0 or being within an explicitly started transaction: diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index 418f39885f7..397d95dbbdd 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -16180,7 +16180,8 @@ ha_innobase::external_lock( if (sql_command == SQLCOM_LOCK_TABLES && THDVAR(thd, table_locks) - && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT) + && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT + | OPTION_BEGIN) && thd_in_lock_tables(thd)) { dberr_t error = row_lock_table(m_prebuilt); In fact, this call is the only one where we are checking for only one of these two flags. Can you check if applying that fix would make LOCK TABLES behave better? This would have to be fixed in a separate bug, to provide a simpler work-around for this bug.

Yuchen Pei added a comment - 2024-04-30 00:43

Just to summarise discussions on the slack thread.

> Can you check if applying that fix would make LOCK TABLES behave better?

I tried this. Unfortunately LOCK TABLES resets the flags OPTION_BEGIN in trans_commit_implicit():

// in mysql_execute_command():

  case SQLCOM_LOCK_TABLES:

    /* We must end the transaction first, regardless of anything */

    res= trans_commit_implicit(thd);

// in trans_commit_implicit():

  thd->variables.option_bits&= ~(OPTION_BEGIN | OPTION_KEEP_LOG);

And the resetting was from very early commits (<= 2010). So LOCK TABLES implicitly commits a current transaction, regardless of the value of autocommit.

marko remarked "Ideally, LOCK TABLES would start transaction and UNLOCK TABLES would commit it. All this seems rather messy."

Yuchen Pei added a comment - 2024-04-30 00:43 Just to summarise discussions on the slack thread. > Can you check if applying that fix would make LOCK TABLES behave better? I tried this. Unfortunately LOCK TABLES resets the flags OPTION_BEGIN in trans_commit_implicit() : // in mysql_execute_command(): case SQLCOM_LOCK_TABLES: /* We must end the transaction first, regardless of anything */ res= trans_commit_implicit(thd); // in trans_commit_implicit(): thd->variables.option_bits&= ~(OPTION_BEGIN | OPTION_KEEP_LOG); And the resetting was from very early commits (<= 2010). So LOCK TABLES implicitly commits a current transaction, regardless of the value of autocommit . marko remarked "Ideally, LOCK TABLES would start transaction and UNLOCK TABLES would commit it. All this seems rather messy."

Julien Fritsch made changes - 2024-05-03 14:33

Fix Version/s	11.0 [ 28320 ]
Fix Version/s	11.3 [ 28565 ]

Marko Mäkelä made changes - 2024-08-07 12:08

Link

This issue relates to ~~MDEV-34719~~ [ ~~MDEV-34719~~ ]

Yuchen Pei added a comment - 2024-08-29 05:36 - edited

marko and psergei: I put together a poc that works for a minimal example for MDEV-24813, as a draft pull request <https://github.com/MariaDB/server/pull/3487>:

commit 701924a2c12ef20135615340bf4f5dec11bcbfc7 (HEAD -> mdev-24813, upstream/mdev-24813)

Author: Yuchen Pei <ycp@mariadb.com>

Date:   Thu Aug 29 15:26:40 2024 +1000

    MDEV-24813 [poc] Signal full table lock when cond is NULL

    Will work in the following minimal example

    --source include/have_innodb.inc

    CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;

    insert into t values (42);

    BEGIN;

    SELECT * FROM t LOCK IN SHARE MODE; # will acquire S lock

    COMMIT;

    DROP TABLE t;

It is of course a quick hack, but is this similar to what you have in mind?

Yuchen Pei added a comment - 2024-08-29 05:36 - edited marko and psergei : I put together a poc that works for a minimal example for MDEV-24813 , as a draft pull request < https://github.com/MariaDB/server/pull/3487 >: commit 701924a2c12ef20135615340bf4f5dec11bcbfc7 (HEAD -> mdev-24813, upstream/mdev-24813) Author: Yuchen Pei <ycp@mariadb.com> Date: Thu Aug 29 15:26:40 2024 +1000 MDEV-24813 [poc] Signal full table lock when cond is NULL Will work in the following minimal example --source include/have_innodb.inc CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB; insert into t values (42); BEGIN; SELECT * FROM t LOCK IN SHARE MODE; # will acquire S lock COMMIT; DROP TABLE t; It is of course a quick hack, but is this similar to what you have in mind?

Yuchen Pei added a comment - 2024-08-30 01:36

Hi psergei, can you take a look at the draft pull request mentioned in the previous comment, to see whether it matches the cond_push concept of what you were proposing?

Yuchen Pei added a comment - 2024-08-30 01:36 Hi psergei , can you take a look at the draft pull request mentioned in the previous comment, to see whether it matches the cond_push concept of what you were proposing?

Yuchen Pei made changes - 2024-08-30 01:36

Assignee	Yuchen Pei [ JIRAUSER52627 ]	Sergei Petrunia [ psergey ]
Status	Confirmed [ 10101 ]	In Review [ 10002 ]

Julien Fritsch made changes - 2024-09-10 15:27

Fix Version/s

11.1 [ 28549 ]

Roel Van de Paar made changes - 2024-10-20 21:05

Link

This issue relates to MDEV-28803 [ MDEV-28803 ]

Julien Fritsch made changes - 2024-11-22 17:06

Fix Version/s

11.2(EOL) [ 28603 ]

People

Assignee:: Sergei Petrunia

Reporter:: Marko Mäkelä

Votes:: 3 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 2021-02-09 06:37

Updated:: 2024-11-25 22:51

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.