[MDEV-6713] A join query is faster with index_condition_pushdown=off Created: 2014-09-09  Updated: 2022-12-27

Status: Open
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 1
Labels: None


 Description   

An example from customer: a join query is faster when one runs it with index_condition_pushdown=off. The difference is not big, but it's repeatable, so it is worth to investigate it.

Query and dataset were provided by joffrey.

Details are here in private comments (it's customer data).



 Comments   
Comment by Sergei Petrunia [ 2014-09-09 ]

Tried on current 10.0 tree, release build. Query execution time:

  • index_condition_pushdown=off: 0.31 .. 0.33 sec
  • index_condition_pushdown=on: 0.40..0.42 sec.
Comment by Elena Stepanova [ 2014-09-09 ]

Values from the current 10.0 tree revno 4383 (even scarier, but maybe it's a difference in build options; built as cmake && make):
ICP:
1 row in set (1.87 sec)
1 row in set (1.79 sec)
1 row in set (1.70 sec)
1 row in set (1.69 sec)
1 row in set (1.66 sec)
no ICP:
1 row in set (1.11 sec)
1 row in set (1.02 sec)
1 row in set (1.09 sec)
1 row in set (1.08 sec)
1 row in set (1.21 sec)
ICP:
1 row in set (1.71 sec)
1 row in set (1.67 sec)
1 row in set (1.69 sec)
1 row in set (1.77 sec)
1 row in set (1.61 sec)
no ICP:
1 row in set (1.48 sec)
1 row in set (1.01 sec)
1 row in set (1.09 sec)
1 row in set (1.13 sec)
1 row in set (0.96 sec)
Averages:
no ICP: 1,118
ICP: 1,716

Comment by Sergei Petrunia [ 2014-09-17 ]

Investigation:

InnoDB (and XtraDB too) has a "prefetch cache" that is used from
row_search_for_mysql().

When an index scan has read MYSQL_FETCH_CACHE_THRESHOLD=4 rows, then
prefetch cache is turned on (See row_sel_enqueue_cache_row_for_mysql() calls).
InnoDB reads MYSQL_FETCH_CACHE_SIZE=8 rows into the cache, and then subsequent
calls to row_search_for_mysql() read from the cache until it is empty.

The the cache stores records in MySQL format.

When ICP is not used, InnoDB will decode index tuple directly into the cache
record.

When ICP is used, row_search_for_mysql() needs to check ICP condition before
putting the row into cache.
ICP condition can only be checked when index fields are in table->record[0].
The solution is to unpack into table->record[0], make ICP check, and then put
the record into cache with memcpy:

row_sel_enqueue_cache_row_for_mysql(
/*================================*/
	byte*		mysql_rec,	/*!< in/out: MySQL record */
	row_prebuilt_t*	prebuilt)	/*!< in/out: prebuilt struct */
{
	/* For non ICP code path the row should already exist in the
	next fetch cache slot. */
 
	if (prebuilt->idx_cond != NULL) {
		byte*	dest = row_sel_fetch_last_buf(prebuilt);
 
		ut_memcpy(dest, mysql_rec, prebuilt->mysql_row_len);
	}

Comment by Sergei Petrunia [ 2014-09-17 ]

This works.

However, suppose the table has a few VARCHAR(255) columns, but actual values are much shorter (typical).

Let's assume the row we are unpacking has 10-byte values.
Then, decoding InnoDB index tuples will need to write 10 bytes (+1 byte for length).
However, ut_memcpy() call above will have to copy 255 bytes.
This seems to be the reason for the slowdown.

Comment by Sergei Petrunia [ 2014-09-17 ]

Using the above as a guideline, I created an artificial dataset that would show the problem.. and it didn't work. It seems, I'm missing something here.

Comment by Sergei Petrunia [ 2014-09-17 ]

Re-ran benchmark on the smaller query.

Top CPU consumers when ICP=ON:

    18.45%   mysqld  mysqld              [.] btr_search_guess_on_hash(dict_index_t*, btr_search_t*, dtuple_t const*, unsigned long, unsigned long, btr_cur_t*, unsigned long, mtr_t*)
    18.43%   mysqld  mysqld              [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long, mem_block_info_t**)
     8.08%   mysqld  mysqld              [.] mtr_memo_slot_release_func(mtr_memo_slot_t*)
     7.12%   mysqld  mysqld              [.] buf_page_get_known_nowait(unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*)
     5.44%   mysqld  mysqld              [.] buf_page_optimistic_get(unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*)
     4.52%   mysqld  mysqld              [.] row_search_for_mysql(unsigned char*, unsigned long, row_prebuilt_t*, unsigned long, unsigned long)

Top CPU consumers when ICP=OFF:

    29.21%   mysqld  mysqld              [.] row_sel_enqueue_cache_row_for_mysql(unsigned char*, row_prebuilt_t*)
    11.75%   mysqld  mysqld              [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long, mem_block_info_t**)
     5.37%   mysqld  mysqld              [.] mtr_memo_slot_release_func(mtr_memo_slot_t*)
     3.98%   mysqld  mysqld              [.] buf_page_get_known_nowait(unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*)
     3.57%   mysqld  mysqld              [.] buf_page_optimistic_get(unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*)
     2.79%   mysqld  mysqld              [.] row_search_for_mysql(unsigned char*, unsigned long, row_prebuilt_t*, unsigned long, unsigned long)

Comment by Sergei Petrunia [ 2014-09-17 ]

Again, the same picture, but ICP=ON has the new biggest user: row_sel_enqueue_cache_row_for_mysql

Comment by Jan Lindström (Inactive) [ 2014-11-13 ]

Is this really a bug? In my understanding there is no certainty that ICP would help on all cases, When ICP is used, row_search_for_mysql() needs to check ICP condition before putting the row into cache. ICP condition can only be checked when index fields are in table->record[0].
The solution is to unpack into table->record[0], make ICP check, and then put the record into cache with memcpy. All this takes time.

Comment by Sergei Golubchik [ 2022-11-08 ]

It seems that when the condition selectivity is low (most of the rows match), ICP would add more overhead than it's worth.

It is technically possible to estimate the selectivity of the ICP conidtion. But how to estimate the ICP overhead?

Comment by Michael Widenius [ 2022-12-27 ]

In theory, Index_condition_pushdown should almost always be equal or faster. The more rows that does not match the condition, the bigger the difference. Even if all rows matches the condition, the difference should be practically neglectable in theory. (more of this below).

The big benefits with index_condition_push_down

  • We check the condition earlier in the engine code, so that we don't have to go up to the sql level and check the row there (there is a lot of code executed when going in and out from the engine).
  • We can avoid reading the row if the condition does not match (if it is not an clustered index)

If there is even a few % of the rows that matches the condition, Index_condition_pushdown should perform better than not having it enabled.

The extra overhead for index_condition_pushdown are:

  • Extra call to check index_condition_pushdown (and code executed in it)
  • Copying a key to the row buffer
  • In case of InnoDB, therte is an internal row cache that causes extra overhead (more data to copy). The question is how to avoid this one.

To answer Sergei Golubchik, the above is not hard to estimate.
We already estimate things for handler_rowid_filter_checks.
However, a better route would be to minimise the overhead for ICP.

Checking the index_condition should be neglect able as this is the same check we have to in the
SQL level anyway.

Reasons why index_condition_pushdown is slower:

  • The index_condition_pushdown check code was not that optimal in 10.11. It's faster (but not yet perfect) in 11.0.
  • We call thd_kill_level() for every checked row, which adds overhead. This can be avoided for most calls.
  • InnoDB row cache overhead in case of ICP.
  • For some complex WHERE clauses the ICP condition may be executed twice. This happens for example with WHERE (key=1 and b=c) or (key=1 and d=e), in which case:
    ICP: key=1 OR key=1
    WHERE: (key=1 and b=c) or (key=1 and d=e)
Generated at Thu Feb 08 07:14:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.