|
The problem here is EITS statistics does not calculate statistics for the partitions of the table.
So a temporary solution would be to not read EITS statistics for partitioned tables.
|
|
Patch
http://lists.askmonty.org/pipermail/commits/2018-August/012842.html
|
|
... the patch looks a bit suspicious - why does it change only number of records in the table and not other statistics...
I'm wondering how does partition pruning work with regular optimizer estimates. Can that approach be extended to EITS stats, too...
|
Optimizer statistics for partitioned tables.
Number of rows in the table
Each partition has its own stats.records. Number of records reported
for the table is a sum of numbers of records in all partitions.
In a query: After partition pruning is done, we compute the sum
of rows for used partitions. This is table_records for the table.
records_in_range() estimates.
It walks through the partitions that are left after paritition pruning, and
after having called records_in_range() for a representative set of the
partitions, we compute the return value.
A representative set is defined as "partitions that have the majority of rows of the
used_partitions set". (see ha_partition::min_rows_for_estimate).
rec_per_key estimates.
A partitioned table has a single TABLE object and so it has a single
TABLE::key_info->rec_per_key.
ha_partition::info(HA_STATUS_CONST) has this:
We first scans through all partitions to get the one holding most rows.
|
We will then allow the handler with the most rows to set
|
the rec_per_key and use this as an estimate on the total table.
|
|
|
When one runs ALTER TABLE t1 ANALYZE PARTITION p1, it will collect EITS stats for all partitions. varun, please report this as a separate issue and fix it by not collecting EITS stats when ANALYZE PARTITION syntax is used.
|
|
It should not be hard to come up with a scheme of using global EITS stats in a partitioned table.
- Columns that are used in partitioning expression are hard and/or should not be used because for them, condition selectivity is typically "handled" partition pruning
- * Sometimes it is not, e.g. PARTITION BY HASH(col1) and "WHERE col1<10", but I don't think we should spend time on such cases now.
- For other columns, we can assume independence between the parititioning function and column distribution.
- This means selectivity of conditions on that column can be used.
|
|
Tried a similar example on MySQL 8.0.11:
create table t2 (
|
part_key int,
|
a int,
|
b int
|
) partition by list(part_key) (
|
partition p0 values in (0),
|
partition p1 values in (1),
|
partition p2 values in (2),
|
partition p3 values in (3),
|
partition p4 values in (4)
|
);
|
insert into t2
|
select mod(a,5), a/100, a from one_k;
|
mysql> analyze table t2 update histogram on part_key;
|
+---------+-----------+----------+-----------------------------------------------------+
|
| Table | Op | Msg_type | Msg_text |
|
|
+---------+-----------+----------+-----------------------------------------------------+
|
| test.t2 | histogram | status | Histogram statistics created for column 'part_key'. |
|
+---------+-----------+----------+-----------------------------------------------------+
|
1 row in set (0.08 sec)
|
mysql> explain select * from t2;
|
+----+-------------+-------+----------------+------+---------------+------+---------+------+------+----------+-------+
|
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+----+-------------+-------+----------------+------+---------------+------+---------+------+------+----------+-------+
|
| 1 | SIMPLE | t2 | p0,p1,p2,p3,p4 | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | NULL |
|
+----+-------------+-------+----------------+------+---------------+------+---------+------+------+----------+-------+
|
1 row in set, 1 warning (0.00 sec)
|
mysql> explain select * from t2 partition (p1) ;
|
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
|
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
|
| 1 | SIMPLE | t2 | p1 | ALL | NULL | NULL | NULL | NULL | 200 | 100.00 | NULL |
|
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
|
1 row in set, 1 warning (0.00 sec)
|
mysql> explain select * from t2 partition (p1) where part_key=1;
|
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|
| 1 | SIMPLE | t2 | p1 | ALL | NULL | NULL | NULL | NULL | 200 | 20.00 | Using where |
|
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|
1 row in set, 1 warning (0.00 sec)
|
Oops. part_key=1 for all 200 rows in partition p0, but we still see filtered=20%
without explicit partition selection, it's the same:
mysql> explain select * from t2 where part_key=1;
|
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|
| 1 | SIMPLE | t2 | p1 | ALL | NULL | NULL | NULL | NULL | 200 | 20.00 | Using where |
|
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|
1 row in set, 1 warning (0.00 sec)
|
|
|
MariaDB, the same dataset as above:
analyze table t2 persistent for all;
|
Before Varun's patch:
MariaDB [test]> explain format=json select * from t2 where part_key=1\G
|
*************************** 1. row ***************************
|
EXPLAIN: {
|
"query_block": {
|
"select_id": 1,
|
"table": {
|
"table_name": "t2",
|
"partitions": ["p1"],
|
"access_type": "ALL",
|
"rows": 1000,
|
"filtered": 20,
|
"attached_condition": "t2.part_key = 1"
|
}
|
}
|
}
|
when we arrive in set_statistics_for_table, we have
- table->stats.records=200, this is result of partition pruning
- read_stats->cardinality=1000. This is data from EITS, it covers all partitions.
We take the EITS estimate of 1000 (which is wrong). Filtered is 20% because EITS is able to estimate the selectivity of "part_key=1000".
|
|
Applied Varun's patch and now I get:
set histogram_size=100;
|
analyze table t2 persistent for columns (part_key) indexes ();
|
MariaDB [test]> explain format=json select * from t2 where part_key in (1,2)\G
|
*************************** 1. row ***************************
|
EXPLAIN: {
|
"query_block": {
|
"select_id": 1,
|
"table": {
|
"table_name": "t2",
|
"partitions": ["p1", "p2"],
|
"access_type": "ALL",
|
"rows": 400,
|
"filtered": 39.604,
|
"attached_condition": "t2.part_key in (1,2)"
|
}
|
}
|
}
|
Here, rows=400 comes from partition pruning. But then, condition selectivity is computed again for part_key in (1,2), which causes "filtered": 39.604, which means the selectivity is counted twice.
|
|
My suggestion on how EITS estimates should work with partition pruning:
If the table is partitioned, EITS stats should be used as follows:
1. Dont use #rows estimate from mysql.column_stats. (This is what the submitted patch does).
2. It is ok to use EITS to produce condition selectivities, as long as the column is not
part of partitioning definition. (If it is, we assume that condition on the column were
handled by partition pruning). (This is what still needs to be done)
3. Do not use EITS estimates for index cardinalities. (one can probably come up with
something smarter but this is an acceptable first step). (nothing to do here for now)
|
|
Patch
http://lists.askmonty.org/pipermail/commits/2018-November/013080.html
|
|
Review input provided over email. Changes are needed.
|
|
New Patch
http://lists.askmonty.org/pipermail/commits/2018-November/013110.html
|
|
Review input provided over email. The patch needs to be adjusted.
|
|
Patch
http://lists.askmonty.org/pipermail/commits/2018-December/013176.html
|
|
Ok to push.
|