Details
Description
With a large number of indexes the optimizer picks the wrong path.
Attachments
Issue Links
- causes
-
MDEV-24117 Memory management problem in statistics state for queries that have a large field IN (values) part
-
- Closed
-
- mentioned in
-
Page Failed to load
Activity
This patch was submitted for review: http://lists.askmonty.org/pipermail/commits/2020-September/014332.html
The main part of the patch is ok, but
The fix revealed another two bugs: ...the other in the function tree_or().
I am not sure about the fix for the one in tree_or (don't quite follow the logic).
It is:
@@ -8869,9 +8872,15 @@ tree_or(RANGE_OPT_PARAM *param,SEL_TREE *tree1,SEL_TREE *tree2)
|
}
|
bool no_imerge_from_ranges= FALSE;
|
+ SEL_TREE *rt1= tree1;
|
+ SEL_TREE *rt2= tree2;
|
/* Build the range part of the tree for the formula (1) */
|
if (sel_trees_can_be_ored(param, tree1, tree2, &ored_keys))
|
{
|
+ if (no_merges1)
|
+ rt1= new SEL_TREE(tree1, TRUE, param);
|
+ if (no_merges2)
|
+ rt2= new SEL_TREE(tree2, TRUE, param);
|
bool must_be_ored= sel_trees_must_be_ored(param, tree1, tree2, ored_keys);
|
no_imerge_from_ranges= must_be_ored;
|
@@ -8929,12 +8938,6 @@ tree_or(RANGE_OPT_PARAM *param,SEL_TREE *tree1,SEL_TREE *tree2)
|
else if (!no_ranges1 && !no_ranges2 && !no_imerge_from_ranges)
|
{
|
/* Build the imerge part of the tree for the formula (1) */
|
- SEL_TREE *rt1= tree1;
|
- SEL_TREE *rt2= tree2;
|
- if (no_merges1)
|
- rt1= new SEL_TREE(tree1, TRUE, param);
|
- if (no_merges2)
|
- rt2= new SEL_TREE(tree2, TRUE, param);
|
if (!rt1 || !rt2 ||
|
result->merges.push_back(imerge_from_ranges) ||
|
imerge_from_ranges->or_sel_tree(param, rt1) || |
Without this fix, there are two differences in .result files:
range.result has:
@@ -1274,7 +1274,7 @@
|
5 <= a AND b = 3 OR
|
3 <= a;
|
id select_type table type possible_keys key key_len ref rows Extra
|
-1 SIMPLE t1 range a a 5 NULL 3 Using where; Using index
|
+1 SIMPLE t1 range a a 5 NULL 4 Using where; Using index
|
SELECT * FROM t1 WHERE |
This one seems ok. The optimizer passes a different set of ranges but it is valid.
The other one is not ok:
CURRENT_TEST: main.range_vs_index_merge_innodb
|
--- /home/psergey/dev-git/10.2/mysql-test/r/range_vs_index_merge_innodb.result 2020-10-01 22:46:56.624684375 +0300
|
+++ /home/psergey/dev-git/10.2/mysql-test/r/range_vs_index_merge_innodb.reject 2020-10-01 22:54:18.154504261 +0300
|
@@ -1796,16 +1796,11 @@
|
WHERE ( state = 'Alabama' OR state >= 'Colorado' ) AND id != 9
|
OR ( capital >= 'Topeka' OR state = 'Kansas' ) AND state != 'Texas';
|
id select_type table type possible_keys key key_len ref rows Extra
|
-1 SIMPLE t1 range state,capital state 71 NULL 8 Using index condition; Using where
|
+1 SIMPLE t1 index_merge state,capital capital,state 67,71 NULL 5 Using sort_union(capital,state); Using where
|
SELECT * FROM t1 FORCE KEY (state,capital)
|
WHERE ( state = 'Alabama' OR state >= 'Colorado' ) AND id != 9
|
OR ( capital >= 'Topeka' OR state = 'Kansas' ) AND state != 'Texas';
|
id state capital
|
-4 Florida Tallahassee
|
-3 Georgia Atlanta
|
-2 Hawaii Honolulu
|
-6 Michigan Lansing
|
-7 Pennsylvania Harrisburg
|
8 Virginia Richmond
|
DROP TABLE t1;
|
# |
Let's denote the patch fragment posted in this comment https://jira.mariadb.org/browse/MDEV-23811?focusedCommentId=167583&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-167583
as "The tree_or() fix".
I'm wondering why can't one have a "more local" version of The tree_or() fix, like so:
@@ -8899,11 +8902,12 @@ tree_or(RANGE_OPT_PARAM *param,SEL_TREE *tree1,SEL_TREE *tree2)
|
{
|
SEL_ARG *key1= tree1->keys[key_no];
|
SEL_ARG *key2= tree2->keys[key_no];
|
- if (!must_be_ored)
|
- {
|
+ if (!must_be_ored || no_merges1)
|
key1->incr_refs();
|
+
|
+ if (!must_be_ored || no_merges2)
|
key2->incr_refs();
|
- }
|
+
|
if ((result->keys[key_no]= key_or(param, key1, key2)))
|
result->keys_map.set_bit(key_no);
|
} |
this one will not copy the whole SEL_TREE objects but rather mark the SEL_ARG objects as shared before passing them to key_or().
But I still don't get why one should do that.
A bit more detail: the query
EXPLAIN
|
SELECT * FROM t1 FORCE KEY (state,capital) |
WHERE (state = 'Alabama' OR state >= 'Colorado') AND id != 9 |
OR ( capital >= 'Topeka' OR state = 'Kansas' ) AND state != 'Texas'; |
calls tree_or() five (5) times.
The action of "The tree_or() fix" is essential in the first tree_or() call.
The first tree_or() call computes OR of SEL_TREEs representing {{state = 'Alabama' OR state >= 'Colorado' }}.
Here, the case is very simple - both SEL_TREEs use ranges on the just one, the same index.
It is not at all clear why one should call "incr_refs()" in this case.
Could it be that there is some code that is invoked after the tree_or() call. And that code works correctly when it is passed a SEL_TREE which is "shared" and works incorrectly when it is passed a SEL_TREE which is not shared?
Exploring this hypothesis further, I find a suspicious place in key_or.
It has this check
if (key2->use_count) |
and this seems wrong to me: key2 here has use_count=0, but it not the root node of the SEL_ARG graph. SEL_ARG::use_count is valid only for the root nodes.
Elsewhere in this function, one can see that the code checks key2_shared which is the value of SEL_ARG::shared of a root node.
So I make a patch which checks key2_shared instead:
@@ -9597,10 +9600,12 @@ key_or(RANGE_OPT_PARAM *param, SEL_ARG *key1,SEL_ARG *key2)
|
|
if (!tmp->next_key_part)
|
{
|
- if (key2->use_count)
|
+ SEL_ARG *key2_next= key2->next;
|
+ //if (key2->use_count)
|
+ if (key2_shared)
|
{
|
SEL_ARG *key2_cpy= new SEL_ARG(*key2);
|
- if (key2_cpy)
|
+ if (!key2_cpy)
|
return 0;
|
key2= key2_cpy;
|
} |
The whole patch for this MDEV thus becomes : https://gist.github.com/spetrunia/3eb23e92f8ab545d3180b496ce6ad24d
and this fixes the query.
igor any thoughts?
Sergei,
I've accepted all your changes for my original fix.
This is a new patch for 10.2.
One more question: https://lists.launchpad.net/maria-developers/msg12398.html
First I tried torun the reported query for 10.4 with optimizer trace enabled after the following changes for default settings:
.
The optimizer trace for the query showed that no range condition was extracted for the index incident_indexi0. I changed the WHERE condition to a much simpler OR formula and still could not get range condition for the index. It's interesting that swapping operands in this formula led to the choice the wanted range scan for this index.
After this I came up with a simple test case demonstrating the problem:
) engine=myisam;
(5,50), (1,10), (3,30), (7,70), (8,80), (4,40), (2,20), (6,60);
The optimizer chose a full table scan:
MariaDB [test]> explain select * from t1 where (a between 2 and 5) or ((a between 3 and 4) and b < 100);
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | t1 | ALL | idx1,idx2 | NULL | NULL | NULL | 64 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
though for the following equivalent query a range scan was chosen:
MariaDB [test]> explain select * from t1 where ((a between 3 and 4) and b < 100) or (a between 2 and 5);
+------+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+
| 1 | SIMPLE | t1 | range | idx1,idx2 | idx1 | 5 | NULL | 4 | Using index condition; Using where |
+------+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+