Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6
Description
This is a known limitation but I'm filing a testcase for the record.
Generate a 1-millon row population with column "Country" having value distribution according to the "world" database:
source mysql-test/include/world_schema.inc
|
source mysql-test/include/world.inc
|
|
create table generated_pop (
|
pk int auto_increment primary key,
|
Country char(52)
|
);
|
set @total_pop= (select Sum(population) from Country) - 1;
|
|
create table tmp1 as
|
SELECT
|
name,
|
SUM(Population) over (order by Name) as cumulative_pop
|
FROM Country;
|
create table ten(a int primary key);
|
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
|
|
create table one_k(a int primary key);
|
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
|
create table rand_numbers as
|
select
|
rand() * @total_pop as col
|
from
|
one_k A, one_k B;
|
|
insert into generated_pop (Country)
|
select
|
(select name from tmp1
|
where cumulative_pop > col
|
order by cumulative_pop limit 1
|
)
|
from rand_numbers;
|
analyze table generated_pop persistent for all;
|
mysql> select count(*) from generated_pop where country='China';
|
+----------+
|
| count(*) |
|
+----------+
|
| 210306 |
|
+----------+
|
1 row in set (3.00 sec)
|
|
mysql> select count(*) from generated_pop where country='Chile';
|
+----------+
|
| count(*) |
|
+----------+
|
| 2479 |
|
+----------+
|
1 row in set (3.49 sec)
|
As expected, China has a much larger population than Chile.
But histogram doesn't allow the optimizer to find this out:
mysql> explain extended select * from generated_pop where country='China';
|
+------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
|
| 1 | SIMPLE | generated_pop | ALL | NULL | NULL | NULL | NULL | 1000000 | 21.09 | Using where |
|
+------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
|
1 row in set, 1 warning (0.01 sec)
|
|
mysql> explain extended select * from generated_pop where country='Chile';
|
+------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
|
| 1 | SIMPLE | generated_pop | ALL | NULL | NULL | NULL | NULL | 1000000 | 21.09 | Using where |
|
+------+-------------+---------------+------+---------------+------+---------+------+---------+----------+-------------+
|
1 row in set, 1 warning (0.00 sec)
|
Attachments
Issue Links
- relates to
-
MDEV-21130 Histograms: use JSON as on-disk format
- Closed
MySQL 8, for comparison:
(not exactly the same data due to different seed for RAND, real numbers are: 210369 in China and 2509 in Chile)
mysql> explain select * from generated_pop where country='China'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: generated_pop
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 917136
filtered: 21.07
Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from generated_pop where country='Chile'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: generated_pop
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 917136
filtered: 0.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)