Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.1.9
-
None
Description
Based on a user case provided by stephanevaroqui. I am not sure if the table definition could be publicly shared so I'm making up my own.
Consider a query:
explain
|
SELECT
|
t1.pk, t2.order_col, t2.str_cp, t2.pk
|
FROM
|
t1 LEFT JOIN t2 ON (t1.key1 = t2.pk)
|
WHERE
|
t2.key2 = t2.pk AND
|
t2.non_index_col = 'const2' AND
|
t1.validated >= t1.created AND
|
t1.key1 IN (<list of 2600 constants>)
|
ORDER BY
|
t2.order_col ASC;
|
It works fast when extended_keys=off, but becomes much slower (including EXPLAIN) when extended_keys=ON. MySQL is also affected, starting from version 5.6. (they've got Extended Keys in 5.6).
+------+-------------+-------+--------+---------------+-----------------+---------+---------+------+---------------------------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+--------+---------------+-----------------+---------+---------+------+---------------------------------------------------------------------+
|
| 1 | SIMPLE | t1 | range | key1 | key1 | 4 | NULL | 2662 | Using index condition; Using where; Using temporary; Using filesort |
|
| 1 | SIMPLE | t2 | eq_ref | PRIMARY,key2 | PRIMARY | 3 | t1.key1 | 1 | Using where |
|
+------+-------------+-------+--------+---------------+-----------------+---------+---------+------+---------------------------------------------------------------------+
|
Some obvious facts:
- ORDER BY clause seems to be irrelevant
- LEFT JOIN was converted into inner join because t2.non_index_col = 'const2' filters out NULL-complemented rows.
The following are my guesses about what is going on:
The optimizer sees these conditions
t1.key1 IN (<list of 2600 constants>)
t1.key1 = t2.pk
t2.key2 = t2.pk
And when "Extended Keys" feature is on, it also sees that INDEX(t2.key2) is
actually INDEX(t2.key2, t2.pk).
For that index, we can infer:
t2.key2 IN (<list of 2600 constants>)
t2.pk IN (<list of 2600 constants>)
if we name constants c1,c2, ... c2600, then the optimizer will consider all kinds of possible ranges in form t2.key2= c[i] AND t2.pk=c[j]. There are in total 2600*2600=6,760,000 such ranges. I guess that it is the attempt to probe them that makes things slow.
The optimizer fails to take into account the fact that it is known that t2.key2=t2.pk, which means we only need ranges that have i=j, and there are only 2600 such ranges.