Details
-
Bug
-
Status: Open (View Workflow)
-
Critical
-
Resolution: Unresolved
-
10.6, 11.6(EOL)
Description
In EXPLAIN output, rows is "#rows to be examined". What if the optimizer does not plan to run the scan until all rows are read?
create table t1 ( |
a int, |
b int, |
key(a) |
);
|
insert into t1 select seq, seq from seq_1_to_10000; |
analyze table t1; |
explain select * from t1 order by a limit 10; |
+------+-------------+-------+-------+---------------+------+---------+------+------+-------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-------+
|
| 1 | SIMPLE | t1 | index | NULL | a | 5 | NULL | 10 | |
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-------+
|
rows=10 which is how many we expect to read before producing #LIMIT rows. Ok.
Now, with a quick select:
explain select * from t1 |
where a between 1 and 1000 |
order by a limit 10; |
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|
| 1 | SIMPLE | t1 | range | a | a | 5 | NULL | 1000 | Using index condition |
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|
rows=1000. It seems, LIMIT is not taken into account here.
This query might be somewhat exceptional as we first decide to use range on index a and then test_if_skip_sort_order() figures that this also produces rows in the desired order.
Another example with type=index:
alter table t1 add key(b); |
explain select * from t1 |
where a between 1 and 1000 order by b limit 20; |
+------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
|
| 1 | SIMPLE | t1 | index | a | b | 5 | NULL | 203 | Using where |
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
|
Here, the number 203 is the # of rows to scan computed by test_if_skip_sort_order().
(We need to get LIMIT 20 rows. range access provides selectivity of WHERE clause: (quick->rows=1000)/(table_rows=10000 )=0.1. To get 20 output rows, we scan 20/0.1=200 rows).
explain
|
select *from t1 |
where a between 1 and 1000 and b between 1 and 5000 |
order by b limit 20; |
+------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
|
| 1 | SIMPLE | t1 | range | a,b | b | 5 | NULL | 5000 | Using where |
|
+------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
|
Note that we see rows=5000. This is the estimate of #rows in range scan over index b, without taking LIMIT into account.
Internally, the optimizer has a different idea:
"reconsidering_access_paths_for_index_ordering": { |
"clause": "ORDER BY", |
"fanout": 1, |
"read_time": 1232.379546, |
"table": "t1", |
"rows_estimation": 1000, |
"possible_keys": [ |
{
|
"index": "a", |
"can_resolve_order": false, |
"cause": "not usable index for the query" |
},
|
{
|
"index": "b", |
"can_resolve_order": true, |
"updated_limit": 412, |
"range_scan_time": 243.5337535, |
"index_scan_time": 243.5337535, |
"records": 5000, |
"chosen": true |
}
|
]
|
}
|
},
|
note the updated_limit=412 .
This comes from 20 * (table_records / refkey_rows_estimate)
refkey_rows_estimate= 492 which comes from table->cond_selectivity=0.048 which is a product of quick select's selectivities.