|
EXPLAINs for Q8:
https://gist.githubusercontent.com/spetrunia/c52932bd7c8db0f53dd23942f20b6889/raw/e3a58b04229b47c79104a1ba1f15f1a14b5dc7ee/gistfile1.txt
it is not immediately clear what the problem is
|
|
Query:
select
|
o_year,
|
sum(case
|
when nation = 'EGYPT' then volume
|
else 0
|
end) / sum(volume) as mkt_share
|
from
|
(
|
select
|
year(o_orderdate) as o_year,
|
l_extendedprice * (1 - l_discount) as volume,
|
n2.n_name as nation
|
from
|
part,
|
supplier,
|
lineitem,
|
orders,
|
customer,
|
nation n1,
|
nation n2,
|
region
|
where
|
p_partkey = l_partkey
|
and s_suppkey = l_suppkey
|
and l_orderkey = o_orderkey
|
and o_custkey = c_custkey
|
and c_nationkey = n1.n_nationkey
|
and n1.n_regionkey = r_regionkey
|
and r_name = 'MIDDLE EAST'
|
and s_nationkey = n2.n_nationkey
|
and o_orderdate between date('1995-01-01') and date ('1996-12-31')
|
and p_type = 'PROMO POLISHED BRASS'
|
) as all_nations
|
group by
|
o_year
|
order by
|
o_year
|
|
|
(need to use a debug build.. query execution times for 10.11-clean: 6 seconds vs 10 minutes)
|
|
ANALYZE outputs on 10.11-clean:
https://gist.github.com/spetrunia/d2efe8547f568a994f2ba79b96804cd5
ANALYZE outputs on 10.11-selectivity:
https://gist.github.com/spetrunia/dfa769d40e617493057a43cdb14cd5b2
Note that table "part" is near the top in 10.11-clean and is way down (below the heavy-weight tables) in 10.11-selectivity. The optimizer has the same idea about its selectivity in both cases:
10.11-clean:
"block-nl-join": {
|
"table": {
|
"table_name": "part",
|
"access_type": "ALL",
|
"possible_keys": ["PRIMARY"],
|
"r_loops": 1,
|
"rows": 2000000,
|
"r_rows": 2000000,
|
"r_table_time_ms": 431.838558,
|
"r_other_time_ms": 88.35411719,
|
"filtered": 16.40625,
|
"r_filtered": 0.6798,
|
"attached_condition": "part.p_type = 'PROMO POLISHED BRASS'"
|
},
|
"buffer_type": "flat",
|
"buffer_size": "352",
|
"join_type": "BNL",
|
"r_filtered": 100
|
}
|
and
{
|
"table": {
|
"table_name": "part",
|
"access_type": "eq_ref",
|
"possible_keys": ["PRIMARY"],
|
"key": "PRIMARY",
|
"key_length": "4",
|
"used_key_parts": ["p_partkey"],
|
"ref": ["dbt3.lineitem.l_partkey"],
|
"r_loops": 3641030,
|
"rows": 1,
|
"r_rows": 1,
|
"r_table_time_ms": 9430.665396,
|
"r_other_time_ms": 253.2227552,
|
"filtered": 16.40625,
|
"r_filtered": 0.675742853,
|
"attached_condition": "part.p_type = 'PROMO POLISHED BRASS'"
|
}
|
},
|
|
|
(Just an observation: the selectivity comes from the DOUBLE_PREC_HB histogram.. using JSOH_HB histogram would give a more precise estimate of 0.679799974. But this is not directly related to the question of why 10.11-selectivity produces a worse plan here )
|
|
With JSON_HB histogram, 10.11-selectivity switches using the good join order: (pasting outputs with hot-run query times):
+------+-------------+----------+--------+----------------------------------------------------------------------------------------+---------------------+---------+---------------------------------------------------+---------+------------+----------+------------+-------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|
+------+-------------+----------+--------+----------------------------------------------------------------------------------------+---------------------+---------+---------------------------------------------------+---------+------------+----------+------------+-------------------------------------------------+
|
| 1 | SIMPLE | region | ALL | PRIMARY | NULL | NULL | NULL | 5 | 5.00 | 20.00 | 20.00 | Using where; Using temporary; Using filesort |
|
| 1 | SIMPLE | part | ALL | PRIMARY | NULL | NULL | NULL | 2000000 | 2000000.00 | 0.68 | 0.68 | Using where; Using join buffer (flat, BNL join) |
|
| 1 | SIMPLE | lineitem | ref | PRIMARY,i_l_suppkey_partkey,i_l_partkey,i_l_suppkey,i_l_orderkey,i_l_orderkey_quantity | i_l_suppkey_partkey | 5 | dbt3.part.p_partkey | 29 | 30.06 | 100.00 | 100.00 | Using index condition |
|
| 1 | SIMPLE | supplier | eq_ref | PRIMARY,i_s_nationkey | PRIMARY | 4 | dbt3.lineitem.l_suppkey | 1 | 1.00 | 100.00 | 100.00 | Using where |
|
| 1 | SIMPLE | orders | eq_ref | PRIMARY,i_o_orderdate,i_o_custkey | PRIMARY | 4 | dbt3.lineitem.l_orderkey | 1 | 1.00 | 63.45 | 30.34 | Using where |
|
| 1 | SIMPLE | customer | eq_ref | PRIMARY,i_c_nationkey | PRIMARY | 4 | dbt3.orders.o_custkey | 1 | 1.00 | 100.00 | 100.00 | Using where |
|
| 1 | SIMPLE | n1 | eq_ref | PRIMARY,i_n_regionkey | i_n_regionkey | 9 | dbt3.region.r_regionkey,dbt3.customer.c_nationkey | 1 | 0.20 | 100.00 | 100.00 | Using index |
|
| 1 | SIMPLE | n2 | eq_ref | PRIMARY | PRIMARY | 4 | dbt3.supplier.s_nationkey | 1 | 1.00 | 100.00 | 100.00 | |
|
+------+-------------+----------+--------+----------------------------------------------------------------------------------------+---------------------+---------+---------------------------------------------------+---------+------------+----------+------------+-------------------------------------------------+
|
8 rows in set (5.949 sec)
|
10.11-clean with JSON_HB histogram uses the same join order:
+------+-------------+----------+--------+----------------------------------------------------------------------------------------+-------------+---------+---------------------------+---------+------------+----------+------------+-------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|
+------+-------------+----------+--------+----------------------------------------------------------------------------------------+-------------+---------+---------------------------+---------+------------+----------+------------+-------------------------------------------------+
|
| 1 | SIMPLE | region | ALL | PRIMARY | NULL | NULL | NULL | 5 | 5.00 | 20.00 | 20.00 | Using where; Using temporary; Using filesort |
|
| 1 | SIMPLE | part | ALL | PRIMARY | NULL | NULL | NULL | 2000000 | 2000000.00 | 0.68 | 0.68 | Using where; Using join buffer (flat, BNL join) |
|
| 1 | SIMPLE | lineitem | ref | PRIMARY,i_l_suppkey_partkey,i_l_partkey,i_l_suppkey,i_l_orderkey,i_l_orderkey_quantity | i_l_partkey | 5 | dbt3.part.p_partkey | 29 | 30.06 | 100.00 | 100.00 | Using where |
|
| 1 | SIMPLE | supplier | eq_ref | PRIMARY,i_s_nationkey | PRIMARY | 4 | dbt3.lineitem.l_suppkey | 1 | 1.00 | 100.00 | 100.00 | Using where |
|
| 1 | SIMPLE | orders | eq_ref | PRIMARY,i_o_orderdate,i_o_custkey | PRIMARY | 4 | dbt3.lineitem.l_orderkey | 1 | 1.00 | 49.60 | 30.34 | Using where |
|
| 1 | SIMPLE | customer | eq_ref | PRIMARY,i_c_nationkey | PRIMARY | 4 | dbt3.orders.o_custkey | 1 | 1.00 | 100.00 | 100.00 | Using where |
|
| 1 | SIMPLE | n1 | eq_ref | PRIMARY,i_n_regionkey | PRIMARY | 4 | dbt3.customer.c_nationkey | 1 | 1.00 | 100.00 | 19.84 | Using where |
|
| 1 | SIMPLE | n2 | eq_ref | PRIMARY | PRIMARY | 4 | dbt3.supplier.s_nationkey | 1 | 1.00 | 100.00 | 100.00 | |
|
+------+-------------+----------+--------+----------------------------------------------------------------------------------------+-------------+---------+---------------------------+---------+------------+----------+------------+-------------------------------------------------+
|
8 rows in set (5.832 sec)
|
|
|
The reason for that bad plan with old histograms are the following:
For the table 'part' we have a selectivity for part.p_type of 0.164062
analyze statement show the following for the plan used by 10.11:
"table_name": "part",
"access_type": "ALL",
"possible_keys": ["PRIMARY"],
"r_loops": 1,
"rows": 6000000,
"r_rows": 6000000,
"r_table_time_ms": 633.6312614,
"r_other_time_ms": 160.5703603,
"filtered": 16.40625,
"r_filtered": 0.6712,
Which means there are 40320 rows.
However the optimizer thinks there will be 984375 accepted rows!
And things goes even worse from from there:
"plan_prefix": ["region", "part"],
"best_access_path": {
"table": "lineitem",
"plan_details": {
"record_count": 984375
...
"plan_prefix": ["region", "part", "lineitem"],
"best_access_path": {
"table": "supplier",
"plan_details": {
"record_count": 29530954.69
The optimizer in selectivity tree seams to choose the best plan available based on the data it has. Not much that can be done (with old histograms).
I was trying to repeat the original query on my machine on 10.11 but was not successful in getting the same plan as was described above 
The reason it work in 10.11 was probably that we have some code that cuts number of records if they are too big, which may work with the
above plan but generally is a bad idea and we should not do that.
|
|
axel has done a re-run with histogram_type=JSON_HB. The results are:
| Query |
10.11-clean |
10.11-selectivity |
speed, new/old |
| 1 |
34.99584 |
35.454256 |
1.01 |
| 2 |
0.758981 |
0.706447 |
0.93 |
| 3 |
17.470826 |
17.509154 |
1.00 |
| 4 |
4.272932 |
4.32523 |
1.01 |
| 5 |
15.443718 |
14.481821 |
0.94 |
| 6 |
16.903128 |
16.737965 |
0.99 |
| 7 |
9.580914 |
9.002417 |
0.94 |
| 8 |
4.349653 |
4.164267 |
0.96 |
| 9 |
63.623487 |
62.357763 |
0.98 |
| 10 |
16.630696 |
16.228546 |
0.98 |
| 11 |
1.982979 |
1.97917 |
1.00 |
| 12 |
21.55811 |
22.124506 |
1.03 |
| 13 |
52.068532 |
52.959199 |
1.02 |
| 14 |
5.070112 |
4.911057 |
0.97 |
| 15 |
16.942634 |
16.717818 |
0.99 |
| 16 |
2.463478 |
2.552343 |
1.04 |
| 17 |
0.862639 |
0.853985 |
0.99 |
| 18 |
56.278951 |
54.198422 |
0.96 |
| 19 |
1.329065 |
1.308108 |
0.98 |
| 20 |
2.534968 |
2.540852 |
1.00 |
| 21 |
14.852989 |
15.327279 |
1.03 |
| 22 |
1.473712 |
1.438353 |
0.98 |
As one can see , there is no large difference on any query.
|