Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16402

Support Index Condition Pushdown for clustered PK scans

Details

    Description

      The following test will lock 3 records and their preceding gaps inside InnoDB, even though only the 2 first records match the condition id<=20:

      --source include/have_innodb.inc
      CREATE TABLE t1(id INT PRIMARY KEY, c INT NOT NULL) ENGINE=InnoDB;
      INSERT INTO t1 VALUES(10,10),(20,20),(30,30),(40,40);
      SELECT * FROM t1 WHERE id<=20 FOR UPDATE;
      DROP TABLE t1;
      

      InnoDB cannot be blamed for this, because the SQL layer is not invoking ha_innobase::idx_cond_push(), so prebuilt->idx_cond will be NULL and InnoDB cannot know that the id=30 record does not match the WHERE condition.

      To confirm this:

      break ha_innobase::create
      continue
      finish
      break lock_rec_lock
      break ha_innobase::delete_table
      continue
      

      There will be 3 calls to lock_rec_lock() before the DROP TABLE. There should be only 2 calls, with type=LOCK_ORDINARY to lock the keys id=10,id=20 and the gaps preceding them.

      Another reason to enable index condition pushdown (ICP) for table scan (or PRIMARY KEY scan) is that if the query would return BLOBs, it would be much more efficient to avoid copying and throwing away the BLOBs of non-matching rows.

      Attachments

        Issue Links

          Activity

            For the record, I also tried the following single-key lookups:

            SELECT * FROM t1 WHERE id=20 FOR UPDATE;
            SELECT * FROM t1 WHERE id=21 FOR UPDATE;
            

            The first one would correctly only acquire a LOCK_REC_NOT_GAP lock on the only matching record id=20, and the second one would only acquire a LOCK_GAP on the gap that protects the non-matching key, attached to the successor record id=30.

            I believe that ICP mostly helps with range or table scans.

            With the following test (UNIQUE KEY without NOT NULL), index condition pushdown will be used, and only the secondary index records (id=10,DB_ROW_ID=512),(id=20,DB_ROW_ID=513) (and preceding gaps) will be locked, interleaved with requests to lock the clustered index records (but not preceding gaps) (DB_ROW_ID=512),(DB_ROW_ID=513). There is no request to lock id=30, so the ICP is working as it should.

            --source include/have_innodb.inc
            CREATE TABLE t1(id INT UNIQUE, c INT) ENGINE=InnoDB;
            INSERT INTO t1 VALUES(10,10),(20,20),(30,30),(40,40);
            SELECT * FROM t1 WHERE id<=20 FOR UPDATE;
            DROP TABLE t1;
            

            After the last clustered index record was locked, there is a redundant call to lock (id=20,DB_ROW_ID=513) and the preceding gap (the lock already exists). That is a minor InnoDB bug.

            marko Marko Mäkelä added a comment - For the record, I also tried the following single-key lookups: SELECT * FROM t1 WHERE id=20 FOR UPDATE ; SELECT * FROM t1 WHERE id=21 FOR UPDATE ; The first one would correctly only acquire a LOCK_REC_NOT_GAP lock on the only matching record id=20 , and the second one would only acquire a LOCK_GAP on the gap that protects the non-matching key, attached to the successor record  id=30 . I believe that ICP mostly helps with range or table scans. With the following test ( UNIQUE KEY without NOT NULL ), index condition pushdown will be used, and only the secondary index records (id=10,DB_ROW_ID=512),(id=20,DB_ROW_ID=513) (and preceding gaps) will be locked, interleaved with requests to lock the clustered index records (but not preceding gaps) (DB_ROW_ID=512),(DB_ROW_ID=513) . There is no request to lock id=30 , so the ICP is working as it should. --source include/have_innodb.inc CREATE TABLE t1(id INT UNIQUE , c INT ) ENGINE=InnoDB; INSERT INTO t1 VALUES (10,10),(20,20),(30,30),(40,40); SELECT * FROM t1 WHERE id<=20 FOR UPDATE ; DROP TABLE t1; After the last clustered index record was locked, there is a redundant call to lock (id=20,DB_ROW_ID=513) and the preceding gap (the lock already exists). That is a minor InnoDB bug.

            InnoDB cannot be blamed for this, because the SQL layer is not invoking ha_innobase::idx_cond_push(), so prebuilt->idx_cond will be NULL and InnoDB cannot know that the id=30 record does not match the WHERE condition.

            SQL layer informs InnoDB that the record with id=30 does not match the WHERE by calling ::unlock_row. This is how it is called for the row with id=30:

              Breakpoint 11, ha_innobase::unlock_row (this=0x7fff8803a028) at /home/psergey/dev-git/10.3-r2/storage/innobase/handler/ha_innodb.cc:9030
            (gdb) x/4xc table->field[0].ptr
              0x7fff8800ea59:	30 '\036'	0 '\000'	0 '\000'	0 '\000'
            

            (I am debugging the SELECT * FROM t1 WHERE id<=20 FOR UPDATE; query )

            psergei Sergei Petrunia added a comment - InnoDB cannot be blamed for this, because the SQL layer is not invoking ha_innobase::idx_cond_push(), so prebuilt->idx_cond will be NULL and InnoDB cannot know that the id=30 record does not match the WHERE condition. SQL layer informs InnoDB that the record with id=30 does not match the WHERE by calling ::unlock_row. This is how it is called for the row with id=30: Breakpoint 11, ha_innobase::unlock_row (this=0x7fff8803a028) at /home/psergey/dev-git/10.3-r2/storage/innobase/handler/ha_innodb.cc:9030 (gdb) x/4xc table->field[0].ptr 0x7fff8800ea59: 30 '\036' 0 '\000' 0 '\000' 0 '\000' (I am debugging the SELECT * FROM t1 WHERE id<=20 FOR UPDATE; query )

            It is not difficult to get the SQL layer to call cond_push for Clustered PK, too.

            However, if InnoDB acknowledges that index condition pushdown has happened, it must make sure that it is not returning records that do not match the index condition. marko, will this happen automatically or one needs to make some changes inside storage/innodb for this to happen?

            psergei Sergei Petrunia added a comment - It is not difficult to get the SQL layer to call cond_push for Clustered PK, too. However, if InnoDB acknowledges that index condition pushdown has happened, it must make sure that it is not returning records that do not match the index condition. marko , will this happen automatically or one needs to make some changes inside storage/innodb for this to happen?

            Yes, unlock_row() would help a little, but it would be even better to not lock the record at all in the first place.
            Yes, InnoDB would filter every record with the index condition if a condition is available. But, I see that it is unnecessarily locking records before evaluating the condition, and then unlocking afterwards:

            	switch (row_search_idx_cond_check(buf, prebuilt, rec, offsets)) {
            	case ICP_NO_MATCH:
            		if (did_semi_consistent_read) {
            			row_unlock_for_mysql(prebuilt, TRUE);
            		}
            		goto next_rec;
            

            There are 3 calls to row_search_idx_cond_check() for different code paths: looking up a primary key (after using adaptive hash index), searching in the secondary index, and the last one (above) for validating a primary key record in the general case. This last call should/could be moved before lock acquisition in a follow-up InnoDB fix to improve performance further.

            marko Marko Mäkelä added a comment - Yes, unlock_row()  would help a little, but it would be even better to not lock the record at all in the first place. Yes, InnoDB would filter every record with the index condition if a condition is available. But, I see that it is unnecessarily locking records before evaluating the condition, and then unlocking afterwards: switch (row_search_idx_cond_check(buf, prebuilt, rec, offsets)) { case ICP_NO_MATCH: if (did_semi_consistent_read) { row_unlock_for_mysql(prebuilt, TRUE); } goto next_rec; There are 3 calls to row_search_idx_cond_check()  for different code paths: looking up a primary key (after using adaptive hash index), searching in the secondary index, and the last one (above) for validating a primary key record in the general case. This last call should/could be moved before lock acquisition in a follow-up InnoDB fix to improve performance further.
            marko Marko Mäkelä added a comment - - edited

            The index condition pushdown was initially enabled on all indexes in MySQL, but it was disabled on PRIMARY KEY due to a performance regression in sysbench.

            Several of the queries in sysbench have a WHERE condition that the optimizer uses for executing these queries as range scans. The upper and lower limit of the range scan will ensure that the WHERE condition is fulfilled. […] upper and lower limits ensure that the WHERE condition is fulfilled, the pushed index condition will not filter out any records.

            Unfortunately, InnoDB does not currently make use of the range limits, so it is unnecessarily locking (and possibly later unlocking) too many records.

            marko Marko Mäkelä added a comment - - edited The index condition pushdown was initially enabled on all indexes in MySQL, but it was disabled on PRIMARY KEY due to a performance regression in sysbench . Several of the queries in sysbench have a WHERE condition that the optimizer uses for executing these queries as range scans. The upper and lower limit of the range scan will ensure that the WHERE condition is fulfilled. […] upper and lower limits ensure that the WHERE condition is fulfilled, the pushed index condition will not filter out any records. Unfortunately, InnoDB does not currently make use of the range limits, so it is unnecessarily locking (and possibly later unlocking) too many records.
            marko Marko Mäkelä added a comment - - edited

            In MDEV-28747, oleg.smirnov reported experimenting with this. The MySQL commit message mentions WL#6061, which is hidden from the public view.

            The commit also mentions "several of the queries in sysbench". Which queries might those be? Would it be possible to avoid pushing down WHERE conditions that never actually filter anything? Could MDEV-16188 already serve those queries, making the MySQL commit redundant? If not, could removing some InnoDB prefetch logic in MDEV-16232 fix the performance problem?

            marko Marko Mäkelä added a comment - - edited In MDEV-28747 , oleg.smirnov reported experimenting with this. The MySQL commit message mentions WL#6061, which is hidden from the public view. The commit also mentions "several of the queries in sysbench". Which queries might those be? Would it be possible to avoid pushing down WHERE conditions that never actually filter anything? Could MDEV-16188 already serve those queries, making the MySQL commit redundant? If not, could removing some InnoDB prefetch logic in MDEV-16232 fix the performance problem?

            The code change that was mentioned in a MDEV-28747 comment would correspond to the following patch for 10.5 or later:

            diff --git a/sql/opt_index_cond_pushdown.cc b/sql/opt_index_cond_pushdown.cc
            index 6a24fa95b68..c35345be5d5 100644
            --- a/sql/opt_index_cond_pushdown.cc
            +++ b/sql/opt_index_cond_pushdown.cc
            @@ -340,7 +340,7 @@ void push_index_cond(JOIN_TAB *tab, uint keyno)
                   tab->join->thd->lex->sql_command != SQLCOM_UPDATE_MULTI &&
                   tab->join->thd->lex->sql_command != SQLCOM_DELETE_MULTI &&
                   tab->type != JT_CONST && tab->type != JT_SYSTEM &&
            -      !tab->table->file->is_clustering_key(keyno)) // 6
            +      true)
               {
                 DBUG_EXECUTE("where",
                              print_where(tab->select_cond, "full cond", QT_ORDINARY););
            

            marko Marko Mäkelä added a comment - The code change that was mentioned in a MDEV-28747 comment would correspond to the following patch for 10.5 or later: diff --git a/sql/opt_index_cond_pushdown.cc b/sql/opt_index_cond_pushdown.cc index 6a24fa95b68..c35345be5d5 100644 --- a/sql/opt_index_cond_pushdown.cc +++ b/sql/opt_index_cond_pushdown.cc @@ -340,7 +340,7 @@ void push_index_cond(JOIN_TAB *tab, uint keyno) tab->join->thd->lex->sql_command != SQLCOM_UPDATE_MULTI && tab->join->thd->lex->sql_command != SQLCOM_DELETE_MULTI && tab->type != JT_CONST && tab->type != JT_SYSTEM && - !tab->table->file->is_clustering_key(keyno)) // 6 + true) { DBUG_EXECUTE("where", print_where(tab->select_cond, "full cond", QT_ORDINARY););

            It makes sense to implement MDEV-16232 first. After that, the only remaining value of this optimization might be that we can avoid fetching BLOBs for rows that do not match the condition.

            marko Marko Mäkelä added a comment - It makes sense to implement MDEV-16232 first. After that, the only remaining value of this optimization might be that we can avoid fetching BLOBs for rows that do not match the condition.

            People

              psergei Sergei Petrunia
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.