Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24813

Locking full table scan fails to use table-level locking

Details

    Description

      The following test case shows that table locks are not being passed to the storage engine in any of the cases of locking read.

      --source include/have_innodb.inc
      --source include/have_sequence.inc
      --source include/not_debug.inc
       
      SET @save_freq=@@GLOBAL.innodb_purge_rseg_truncate_frequency;
      SET GLOBAL innodb_purge_rseg_truncate_frequency=1;
       
      CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;
      INSERT INTO t(a) SELECT seq FROM seq_0_to_1000000;
       
      --source include/wait_all_purged.inc
      SET GLOBAL innodb_purge_rseg_truncate_frequency=@save_freq;
       
      SET profiling_history_size=100;
      SET profiling = 1;
       
      START TRANSACTION;
      LOCK TABLE t WRITE;
      SELECT COUNT(*) FROM t FOR UPDATE;
      UNLOCK TABLES;
      COMMIT;
      SELECT COUNT(*) FROM t FOR UPDATE;
       
      START TRANSACTION;
      LOCK TABLE t READ;
      SELECT COUNT(*) FROM t LOCK IN SHARE MODE;
      UNLOCK TABLES;
      COMMIT;
      SELECT COUNT(*) FROM t LOCK IN SHARE MODE;
       
      SHOW PROFILES;
      SET profiling = 0;
      DROP TABLE t;
      

      I would have expected that an explicit LOCK TABLE would be passed to the storage engine, like it seems to have been the case during the development of MDEV-14479.

      I would also expect the optimizer to automatically pass information to the storage engine so that the table can be locked upfront, if we know that the entire table will be scanned in a locking operation. This should include the following:

      • CREATE…SELECT
      • INSERT…SELECT
      • SELECT…LOCK IN SHARE MODE
      • SELECT…FOR UPDATE
      • SELECT executed at TRANSACTION ISOLATION LEVEL SERIALIZABLE
      • UPDATE
      • DELETE

      If no WHERE or JOIN condition applies to the table, nor a LIMIT is present, then we should expect the operation to scan the entire table, and the storage engine should be requested to lock the entire table.

      MDEV-14479 fixed the InnoDB record locking in such a way that when the table has already been locked in the corresponding mode, no individual record locks will be allocated and created. If we set a breakpoint in lock_create_low() after ha_innobase::create() has finished in the above test case, we will observe the following:

      10.5 ffc5d064895cadbc42711efd7dbb6ae1b323f050

      Thread 15 "mariadbd" hit Breakpoint 1, lock_rec_create_low (c_lock=0x0, 
          thr=0x0, type_mode=3, page_id={m_id = 21474836484}, 
          page=0x7ffff0b70000 "", heap_no=2, index=0x7fffbc01af40, 
          trx=0x7ffff17260d8, holds_trx_mutex=false)
          at /mariadb/10.5m/storage/innobase/lock/lock0lock.cc:1294
      (gdb) p index.table.locks
      $7 = {count = 1, start = 0x7ffff1726cb0, end = 0x7ffff1726cb0, 
        node = &lock_table_t::locks}
      (gdb) p *index.table.locks.start
      $8 = {trx = 0x7ffff17260d8, trx_locks = {prev = 0x0, next = 0x0}, 
        index = 0x0, hash = 0x0, requested_time = 0, wait_time = 0, un_member = {
          tab_lock = {table = 0x7fffbc019e10, locks = {prev = 0x0, next = 0x0}}, 
          rec_lock = {page_id = {m_id = 140736347610640}, n_bits = 0}}, 
        type_mode = 17}
      

      Note: type_mode == 17 == LOCK_TABLE | LOCK_IX. For the LOCK IN SHARE MODE, it would be 16 == LOCK_TABLE | LOCK_IS. We would expect the table lock to exist in LOCK_X or LOCK_S mode, so that no locks will have to be allocated for each individual visited row.

      If we change the CREATE TABLE to CREATE TEMPORARY TABLE, we can get an accurate estimate of how fast the execution would be without the row-level locking.

      On my system, with locking, each full-table-scan consumes between 0.49 and 0.66 seconds (the first run being the slowest). This can probably be attributed to dynamic memory allocation.

      With a temporary table, each locking full-table scan consumes between 0.28 and 0.30 seconds. The memory operations related to row-level locking are roughly doubling the execution time (and seriously increasing the memory footprint)!

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment - - edited

            quick recap

            • what we've got so far
              • mdev_14479.test (the testcase in my previous comment):
                • IX lock in INSERT INTO fl0 VALUES (1000)
                • IX lock in INSERT INTO fl1 VALUES (500, 1000), (1500, 1000)
                • IX lock in INSERT INTO fl2 VALUES (200, 500), (800, 500), (1200, 1500), (1800, 1500)
                • IX lock in "INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d')"
                • X lock in "LOCK TABLE t1 WRITE" (because of the autocommit trick)
                • X lock in second "LOCK TABLE t1 WRITE"
              • mdev_24813.test (the test in the description of this ticket with the four selects labeled 0-3):
                • IX lock in "/* 0 */ SELECT COUNT FROM t FOR UPDATE"
                • IX in "/* 1 */ SELECT COUNT FROM t FOR UPDATE"
                • IS in "/* 2 */ SELECT COUNT FROM t LOCK IN SHARE MODE"
                • IS in "/* 3 */ SELECT COUNT FROM t LOCK IN SHARE MODE"
            • what we want: X or S table wide locks in these examples

              BEGIN; SELECT * FROM t1 LOCK IN SHARE MODE; ... -- S lock
              BEGIN; SELECT * FROM t1 FOR UPDATE; ...         -- X lock
              BEGIN; DELETE FROM t1; ...                      -- X lock
              

            • why the autocommit trick works

              // in ha_innobase::external_lock
              			if (sql_command == SQLCOM_LOCK_TABLES
              			    && THDVAR(thd, table_locks)
              			    && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT)
              			    && thd_in_lock_tables(thd)) {
               
              				dberr_t	error = row_lock_table(m_prebuilt);
              

            ycp Yuchen Pei added a comment - - edited quick recap what we've got so far mdev_14479.test (the testcase in my previous comment): IX lock in INSERT INTO fl0 VALUES (1000) IX lock in INSERT INTO fl1 VALUES (500, 1000), (1500, 1000) IX lock in INSERT INTO fl2 VALUES (200, 500), (800, 500), (1200, 1500), (1800, 1500) IX lock in "INSERT INTO t1 VALUES(1,1,'a'),(2,9999,'b'),(3,10000,'c'),(4,4,'d')" X lock in "LOCK TABLE t1 WRITE" (because of the autocommit trick) X lock in second "LOCK TABLE t1 WRITE" mdev_24813.test (the test in the description of this ticket with the four selects labeled 0-3): IX lock in "/* 0 */ SELECT COUNT FROM t FOR UPDATE" IX in "/* 1 */ SELECT COUNT FROM t FOR UPDATE" IS in "/* 2 */ SELECT COUNT FROM t LOCK IN SHARE MODE" IS in "/* 3 */ SELECT COUNT FROM t LOCK IN SHARE MODE" what we want: X or S table wide locks in these examples BEGIN ; SELECT * FROM t1 LOCK IN SHARE MODE; ... -- S lock BEGIN ; SELECT * FROM t1 FOR UPDATE ; ... -- X lock BEGIN ; DELETE FROM t1; ... -- X lock why the autocommit trick works // in ha_innobase::external_lock if (sql_command == SQLCOM_LOCK_TABLES && THDVAR(thd, table_locks) && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT) && thd_in_lock_tables(thd)) {   dberr_t error = row_lock_table(m_prebuilt);

            ycp, thank you. It seems to me that this logic to acquire InnoDB table locks for LOCK TABLES is flawed. Elsewhere, such as in thd_trx_is_auto_commit() and innobase_register_trx(), we are checking for two flags, corresponding to autocommit=0 or being within an explicitly started transaction:

            diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
            index 418f39885f7..397d95dbbdd 100644
            --- a/storage/innobase/handler/ha_innodb.cc
            +++ b/storage/innobase/handler/ha_innodb.cc
            @@ -16180,7 +16180,8 @@ ha_innobase::external_lock(
             
             			if (sql_command == SQLCOM_LOCK_TABLES
             			    && THDVAR(thd, table_locks)
            -			    && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT)
            +			    && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT
            +						| OPTION_BEGIN)
             			    && thd_in_lock_tables(thd)) {
             
             				dberr_t	error = row_lock_table(m_prebuilt);
            

            In fact, this call is the only one where we are checking for only one of these two flags. Can you check if applying that fix would make LOCK TABLES behave better? This would have to be fixed in a separate bug, to provide a simpler work-around for this bug.

            marko Marko Mäkelä added a comment - ycp , thank you. It seems to me that this logic to acquire InnoDB table locks for LOCK TABLES is flawed. Elsewhere, such as in thd_trx_is_auto_commit() and innobase_register_trx() , we are checking for two flags, corresponding to autocommit=0 or being within an explicitly started transaction: diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index 418f39885f7..397d95dbbdd 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -16180,7 +16180,8 @@ ha_innobase::external_lock( if (sql_command == SQLCOM_LOCK_TABLES && THDVAR(thd, table_locks) - && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT) + && thd_test_options(thd, OPTION_NOT_AUTOCOMMIT + | OPTION_BEGIN) && thd_in_lock_tables(thd)) { dberr_t error = row_lock_table(m_prebuilt); In fact, this call is the only one where we are checking for only one of these two flags. Can you check if applying that fix would make LOCK TABLES behave better? This would have to be fixed in a separate bug, to provide a simpler work-around for this bug.
            ycp Yuchen Pei added a comment -

            Just to summarise discussions on the slack thread.

            > Can you check if applying that fix would make LOCK TABLES behave better?

            I tried this. Unfortunately LOCK TABLES resets the flags OPTION_BEGIN in trans_commit_implicit():

            // in mysql_execute_command():
              case SQLCOM_LOCK_TABLES:
                /* We must end the transaction first, regardless of anything */
                res= trans_commit_implicit(thd);
             
            // in trans_commit_implicit():
              thd->variables.option_bits&= ~(OPTION_BEGIN | OPTION_KEEP_LOG);
            

            And the resetting was from very early commits (<= 2010). So LOCK TABLES implicitly commits a current transaction, regardless of the value of autocommit.

            marko remarked "Ideally, LOCK TABLES would start transaction and UNLOCK TABLES would commit it. All this seems rather messy."

            ycp Yuchen Pei added a comment - Just to summarise discussions on the slack thread. > Can you check if applying that fix would make LOCK TABLES behave better? I tried this. Unfortunately LOCK TABLES resets the flags OPTION_BEGIN in trans_commit_implicit() : // in mysql_execute_command(): case SQLCOM_LOCK_TABLES: /* We must end the transaction first, regardless of anything */ res= trans_commit_implicit(thd);   // in trans_commit_implicit(): thd->variables.option_bits&= ~(OPTION_BEGIN | OPTION_KEEP_LOG); And the resetting was from very early commits (<= 2010). So LOCK TABLES implicitly commits a current transaction, regardless of the value of autocommit . marko remarked "Ideally, LOCK TABLES would start transaction and UNLOCK TABLES would commit it. All this seems rather messy."
            ycp Yuchen Pei added a comment - - edited

            marko and psergei: I put together a poc that works for a minimal example for MDEV-24813, as a draft pull request <https://github.com/MariaDB/server/pull/3487>:

            commit 701924a2c12ef20135615340bf4f5dec11bcbfc7 (HEAD -> mdev-24813, upstream/mdev-24813)
            Author: Yuchen Pei <ycp@mariadb.com>
            Date:   Thu Aug 29 15:26:40 2024 +1000
             
                MDEV-24813 [poc] Signal full table lock when cond is NULL
                
                Will work in the following minimal example
                
                --source include/have_innodb.inc
                CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;
                insert into t values (42);
                
                BEGIN;
                SELECT * FROM t LOCK IN SHARE MODE; # will acquire S lock
                COMMIT;
                
                DROP TABLE t;
            

            It is of course a quick hack, but is this similar to what you have in mind?

            ycp Yuchen Pei added a comment - - edited marko and psergei : I put together a poc that works for a minimal example for MDEV-24813 , as a draft pull request < https://github.com/MariaDB/server/pull/3487 >: commit 701924a2c12ef20135615340bf4f5dec11bcbfc7 (HEAD -> mdev-24813, upstream/mdev-24813) Author: Yuchen Pei <ycp@mariadb.com> Date: Thu Aug 29 15:26:40 2024 +1000   MDEV-24813 [poc] Signal full table lock when cond is NULL Will work in the following minimal example --source include/have_innodb.inc CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB; insert into t values (42); BEGIN; SELECT * FROM t LOCK IN SHARE MODE; # will acquire S lock COMMIT; DROP TABLE t; It is of course a quick hack, but is this similar to what you have in mind?
            ycp Yuchen Pei added a comment -

            Hi psergei, can you take a look at the draft pull request mentioned in the previous comment, to see whether it matches the cond_push concept of what you were proposing?

            ycp Yuchen Pei added a comment - Hi psergei , can you take a look at the draft pull request mentioned in the previous comment, to see whether it matches the cond_push concept of what you were proposing?

            People

              psergei Sergei Petrunia
              marko Marko Mäkelä
              Votes:
              3 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.