[MDEV-24813] Locking full table scan fails to use table-level locking - Jira

Details

Type: Bug
Status: In Review (View Workflow)
Priority: Critical
Resolution: Unresolved
Affects Version/s: 10.5, 10.11, 11.5(EOL)
Fix Version/s: 10.5, 10.6, 10.11, 11.4
Component/s: Locking, Optimizer, Storage Engine - InnoDB
Labels:

Description

The following test case shows that table locks are not being passed to the storage engine in any of the cases of locking read.

--source include/have_innodb.inc

--source include/have_sequence.inc

--source include/not_debug.inc

SET @save_freq=@@GLOBAL.innodb_purge_rseg_truncate_frequency;

SET GLOBAL innodb_purge_rseg_truncate_frequency=1;

CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;

INSERT INTO t(a) SELECT seq FROM seq_0_to_1000000;

--source include/wait_all_purged.inc

SET GLOBAL innodb_purge_rseg_truncate_frequency=@save_freq;

SET profiling_history_size=100;

SET profiling = 1;

START TRANSACTION;

LOCK TABLE t WRITE;

SELECT COUNT(*) FROM t FOR UPDATE;

UNLOCK TABLES;

COMMIT;

SELECT COUNT(*) FROM t FOR UPDATE;

START TRANSACTION;

LOCK TABLE t READ;

SELECT COUNT(*) FROM t LOCK IN SHARE MODE;

UNLOCK TABLES;

COMMIT;

SELECT COUNT(*) FROM t LOCK IN SHARE MODE;

SHOW PROFILES;

SET profiling = 0;

DROP TABLE t;

I would have expected that an explicit LOCK TABLE would be passed to the storage engine, like it seems to have been the case during the development of ~~MDEV-14479~~.

I would also expect the optimizer to automatically pass information to the storage engine so that the table can be locked upfront, if we know that the entire table will be scanned in a locking operation. This should include the following:

CREATE…SELECT
INSERT…SELECT
SELECT…LOCK IN SHARE MODE
SELECT…FOR UPDATE
SELECT executed at TRANSACTION ISOLATION LEVEL SERIALIZABLE
UPDATE
DELETE

If no WHERE or JOIN condition applies to the table, nor a LIMIT is present, then we should expect the operation to scan the entire table, and the storage engine should be requested to lock the entire table.

~~MDEV-14479~~ fixed the InnoDB record locking in such a way that when the table has already been locked in the corresponding mode, no individual record locks will be allocated and created. If we set a breakpoint in lock_create_low() after ha_innobase::create() has finished in the above test case, we will observe the following:

10.5 ffc5d064895cadbc42711efd7dbb6ae1b323f050
Thread 15 "mariadbd" hit Breakpoint 1, lock_rec_create_low (c_lock=0x0,
thr=0x0, type_mode=3, page_id={m_id = 21474836484},
page=0x7ffff0b70000 "", heap_no=2, index=0x7fffbc01af40,
trx=0x7ffff17260d8, holds_trx_mutex=false)
at /mariadb/10.5m/storage/innobase/lock/lock0lock.cc:1294
(gdb) p index.table.locks
$7 = {count = 1, start = 0x7ffff1726cb0, end = 0x7ffff1726cb0,
node = &lock_table_t::locks}
(gdb) p *index.table.locks.start
$8 = {trx = 0x7ffff17260d8, trx_locks = {prev = 0x0, next = 0x0},
index = 0x0, hash = 0x0, requested_time = 0, wait_time = 0, un_member = {
tab_lock = {table = 0x7fffbc019e10, locks = {prev = 0x0, next = 0x0}},
rec_lock = {page_id = {m_id = 140736347610640}, n_bits = 0}},
type_mode = 17}

Note: type_mode == 17 == LOCK_TABLE | LOCK_IX. For the LOCK IN SHARE MODE, it would be 16 == LOCK_TABLE | LOCK_IS. We would expect the table lock to exist in LOCK_X or LOCK_S mode, so that no locks will have to be allocated for each individual visited row.

If we change the CREATE TABLE to CREATE TEMPORARY TABLE, we can get an accurate estimate of how fast the execution would be without the row-level locking.

On my system, with locking, each full-table-scan consumes between 0.49 and 0.66 seconds (the first run being the slowest). This can probably be attributed to dynamic memory allocation.

With a temporary table, each locking full-table scan consumes between 0.28 and 0.30 seconds. The memory operations related to row-level locking are roughly doubling the execution time (and seriously increasing the memory footprint)!

Attachments

Issue Links

is duplicated by

MDEV-21974 InnoDB DML under backup locks make buffer pool usage grow permanently

Open

relates to

MDEV-10719 'create temporary table as select' generates unnecessary table locks

Confirmed

MDEV-17512 Deadlock after locking all rows in table

Open

MDEV-27992 DELETE fails to delete record after blocking is released

Closed

MDEV-14479 Do not acquire InnoDB record locks when covering table locks exist

Closed

MDEV-16232 Use fewer mini-transactions

Stalled

MDEV-17918 InnoDB sometimes prints incorrect buffer pool size in the "Over xx percent of the buffer pool is occupied" message

Open

MDEV-20612 Improve InnoDB lock_sys scalability

Closed

MDEV-28800 SIGABRT due to running out of memory for InnoDB locks

Closed

MDEV-28803 ERROR 1206 (HY000): The total number of locks exceeds the lock table size

Confirmed

MDEV-28804 Increased lock objects in 10.6+ leading to significant slowdown when using a limited InnoDB buffer pool, including in release builds

Stalled

MDEV-29565 Inconsistent read and write, which use the same predicate (WHERE clause)

Closed

MDEV-30368 Locking behaviour is missing in documentation

Open

MDEV-34719 Disable purge for LOAD DATA INFILE into empty table

Closed

(9 relates to)

Activity

Ascending order - Click to sort in descending order

View 7 older comments

Yuchen Pei added a comment - 2024-04-19 02:58 - edited

quick recap

what we've got so far
- mdev_14479.test (the testcase in my previous comment):
  - IX lock in INSERT INTO fl0 VALUES (1000)
  - IX lock in INSERT INTO fl1 VALUES (500, 1000), (1500, 1000)
  - IX lock in INSERT INTO fl2 VALUES (200, 500), (800, 500), (1200, 1500), (1800, 1500)
  - IX lock in "INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d')"
  - X lock in "LOCK TABLE t1 WRITE" (because of the autocommit trick)
  - X lock in second "LOCK TABLE t1 WRITE"
- mdev_24813.test (the test in the description of this ticket with the four selects labeled 0-3):
  - IX lock in "/* 0 */ SELECT COUNT FROM t FOR UPDATE"
  - IX in "/* 1 */ SELECT COUNT FROM t FOR UPDATE"
  - IS in "/* 2 */ SELECT COUNT FROM t LOCK IN SHARE MODE"
  - IS in "/* 3 */ SELECT COUNT FROM t LOCK IN SHARE MODE"

what we want: X or S table wide locks in these examples

BEGIN; SELECT * FROM t1 LOCK IN SHARE MODE; ... -- S lock

BEGIN; SELECT * FROM t1 FOR UPDATE; ...         -- X lock

BEGIN; DELETE FROM t1; ...                      -- X lock

why the autocommit trick works

// in ha_innobase::external_lock

			if (sql_command == SQLCOM_LOCK_TABLES

			    && THDVAR(thd, table_locks)

			    && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT)

			    && thd_in_lock_tables(thd)) {

				dberr_t	error = row_lock_table(m_prebuilt);

Yuchen Pei added a comment - 2024-04-19 02:58 - edited quick recap what we've got so far mdev_14479.test (the testcase in my previous comment): IX lock in INSERT INTO fl0 VALUES (1000) IX lock in INSERT INTO fl1 VALUES (500, 1000), (1500, 1000) IX lock in INSERT INTO fl2 VALUES (200, 500), (800, 500), (1200, 1500), (1800, 1500) IX lock in "INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d')" X lock in "LOCK TABLE t1 WRITE" (because of the autocommit trick) X lock in second "LOCK TABLE t1 WRITE" mdev_24813.test (the test in the description of this ticket with the four selects labeled 0-3): IX lock in "/* 0 */ SELECT COUNT FROM t FOR UPDATE" IX in "/* 1 */ SELECT COUNT FROM t FOR UPDATE" IS in "/* 2 */ SELECT COUNT FROM t LOCK IN SHARE MODE" IS in "/* 3 */ SELECT COUNT FROM t LOCK IN SHARE MODE" what we want: X or S table wide locks in these examples BEGIN ; SELECT * FROM t1 LOCK IN SHARE MODE; ... -- S lock BEGIN ; SELECT * FROM t1 FOR UPDATE ; ... -- X lock BEGIN ; DELETE FROM t1; ... -- X lock why the autocommit trick works // in ha_innobase::external_lock if (sql_command == SQLCOM_LOCK_TABLES && THDVAR(thd, table_locks) && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT) && thd_in_lock_tables(thd)) { dberr_t error = row_lock_table(m_prebuilt);

Marko Mäkelä added a comment - 2024-04-19 05:17

ycp, thank you. It seems to me that this logic to acquire InnoDB table locks for LOCK TABLES is flawed. Elsewhere, such as in thd_trx_is_auto_commit() and innobase_register_trx(), we are checking for two flags, corresponding to autocommit=0 or being within an explicitly started transaction:

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc

index 418f39885f7..397d95dbbdd 100644

--- a/storage/innobase/handler/ha_innodb.cc

+++ b/storage/innobase/handler/ha_innodb.cc

@@ -16180,7 +16180,8 @@ ha_innobase::external_lock(

 			if (sql_command == SQLCOM_LOCK_TABLES

 			    && THDVAR(thd, table_locks)

-			    && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT)

+			    && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT

+						| OPTION_BEGIN)

 			    && thd_in_lock_tables(thd)) {

 				dberr_t	error = row_lock_table(m_prebuilt);

In fact, this call is the only one where we are checking for only one of these two flags. Can you check if applying that fix would make LOCK TABLES behave better? This would have to be fixed in a separate bug, to provide a simpler work-around for this bug.

Marko Mäkelä added a comment - 2024-04-19 05:17 ycp , thank you. It seems to me that this logic to acquire InnoDB table locks for LOCK TABLES is flawed. Elsewhere, such as in thd_trx_is_auto_commit() and innobase_register_trx() , we are checking for two flags, corresponding to autocommit=0 or being within an explicitly started transaction: diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index 418f39885f7..397d95dbbdd 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -16180,7 +16180,8 @@ ha_innobase::external_lock( if (sql_command == SQLCOM_LOCK_TABLES && THDVAR(thd, table_locks) - && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT) + && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT + | OPTION_BEGIN) && thd_in_lock_tables(thd)) { dberr_t error = row_lock_table(m_prebuilt); In fact, this call is the only one where we are checking for only one of these two flags. Can you check if applying that fix would make LOCK TABLES behave better? This would have to be fixed in a separate bug, to provide a simpler work-around for this bug.

Yuchen Pei added a comment - 2024-04-30 00:43

Just to summarise discussions on the slack thread.

> Can you check if applying that fix would make LOCK TABLES behave better?

I tried this. Unfortunately LOCK TABLES resets the flags OPTION_BEGIN in trans_commit_implicit():

// in mysql_execute_command():

  case SQLCOM_LOCK_TABLES:

    /* We must end the transaction first, regardless of anything */

    res= trans_commit_implicit(thd);

// in trans_commit_implicit():

  thd->variables.option_bits&= ~(OPTION_BEGIN | OPTION_KEEP_LOG);

And the resetting was from very early commits (<= 2010). So LOCK TABLES implicitly commits a current transaction, regardless of the value of autocommit.

marko remarked "Ideally, LOCK TABLES would start transaction and UNLOCK TABLES would commit it. All this seems rather messy."

Yuchen Pei added a comment - 2024-04-30 00:43 Just to summarise discussions on the slack thread. > Can you check if applying that fix would make LOCK TABLES behave better? I tried this. Unfortunately LOCK TABLES resets the flags OPTION_BEGIN in trans_commit_implicit() : // in mysql_execute_command(): case SQLCOM_LOCK_TABLES: /* We must end the transaction first, regardless of anything */ res= trans_commit_implicit(thd); // in trans_commit_implicit(): thd->variables.option_bits&= ~(OPTION_BEGIN | OPTION_KEEP_LOG); And the resetting was from very early commits (<= 2010). So LOCK TABLES implicitly commits a current transaction, regardless of the value of autocommit . marko remarked "Ideally, LOCK TABLES would start transaction and UNLOCK TABLES would commit it. All this seems rather messy."

Yuchen Pei added a comment - 2024-08-29 05:36 - edited

marko and psergei: I put together a poc that works for a minimal example for MDEV-24813, as a draft pull request <https://github.com/MariaDB/server/pull/3487>:

commit 701924a2c12ef20135615340bf4f5dec11bcbfc7 (HEAD -> mdev-24813, upstream/mdev-24813)

Author: Yuchen Pei <ycp@mariadb.com>

Date:   Thu Aug 29 15:26:40 2024 +1000

    MDEV-24813 [poc] Signal full table lock when cond is NULL

    Will work in the following minimal example

    --source include/have_innodb.inc

    CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;

    insert into t values (42);

    BEGIN;

    SELECT * FROM t LOCK IN SHARE MODE; # will acquire S lock

    COMMIT;

    DROP TABLE t;

It is of course a quick hack, but is this similar to what you have in mind?

Yuchen Pei added a comment - 2024-08-29 05:36 - edited marko and psergei : I put together a poc that works for a minimal example for MDEV-24813 , as a draft pull request < https://github.com/MariaDB/server/pull/3487 >: commit 701924a2c12ef20135615340bf4f5dec11bcbfc7 (HEAD -> mdev-24813, upstream/mdev-24813) Author: Yuchen Pei <ycp@mariadb.com> Date: Thu Aug 29 15:26:40 2024 +1000 MDEV-24813 [poc] Signal full table lock when cond is NULL Will work in the following minimal example --source include/have_innodb.inc CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB; insert into t values (42); BEGIN; SELECT * FROM t LOCK IN SHARE MODE; # will acquire S lock COMMIT; DROP TABLE t; It is of course a quick hack, but is this similar to what you have in mind?

Yuchen Pei added a comment - 2024-08-30 01:36

Hi psergei, can you take a look at the draft pull request mentioned in the previous comment, to see whether it matches the cond_push concept of what you were proposing?

Yuchen Pei added a comment - 2024-08-30 01:36 Hi psergei , can you take a look at the draft pull request mentioned in the previous comment, to see whether it matches the cond_push concept of what you were proposing?

People

Assignee:: Sergei Petrunia

Reporter:: Marko Mäkelä

Votes:: 3 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 2021-02-09 06:37

Updated:: 2025-04-09 14:54

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server