Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
Description
This is to track the work being done on testing precision of histograms. (Some tests were already done, will post the results here)
We are going to measure the precision of selectivity estimate for equality (range predicates do not make much sense for names I guess).
explain select * from pop1980 where firstname=$CONST |
select count(*) from pop1980 where firstname=$CONST |
I would like a few constants:
- 3 different names from top-3
- 3 different names at the end of the first quartile.
(the first quartile is: Count the number of total different names = 17711
Rank all names by their frequency:
select firstname, count(*) as CNT from pop1980 group by firstname order by CNT desc |
end of quartile is 17711/4 = 4427)
pick 4428th, 4429th, 4430th names.
Then 3 names at the end of the second quartile.
and 3rd and 4th.
the repeat the above "selectivity test" for each constant.
We need to compare:
- MariaDB, analyze with sampling
- MariaDB, analyze with full scan
- MySQL, with 100 buckets
- MySQL, with 1024 buckets
- PostgreSQL.
For MySQL/MariaDB, use EXPLAIN FORMAT=JSON as it prints selectivity with greater precision.
Attachments
Issue Links
- relates to
-
MDEV-17886 Benchmark speed of EITS ANALYZE TABLE, histogram collection
- Closed