[MDEV-18758] Test histograms precision Created: 2019-02-27  Updated: 2023-10-03

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: histogram-test

Issue Links:
Relates
relates to MDEV-17886 Benchmark speed of EITS ANALYZE TABLE... Closed

 Description   

This is to track the work being done on testing precision of histograms. (Some tests were already done, will post the results here)

We are going to measure the precision of selectivity estimate for equality (range predicates do not make much sense for names I guess).

explain select * from pop1980 where firstname=$CONST
select count(*) from pop1980 where firstname=$CONST

I would like a few constants:

  • 3 different names from top-3
  • 3 different names at the end of the first quartile.

(the first quartile is: Count the number of total different names = 17711
Rank all names by their frequency:

select firstname, count(*) as CNT from pop1980 group by firstname order by CNT desc


end of quartile is 17711/4 = 4427)

pick 4428th, 4429th, 4430th names.

Then 3 names at the end of the second quartile.

and 3rd and 4th.

the repeat the above "selectivity test" for each constant.

We need to compare:

  • MariaDB, analyze with sampling
  • MariaDB, analyze with full scan
  • MySQL, with 100 buckets
  • MySQL, with 1024 buckets
  • PostgreSQL.

For MySQL/MariaDB, use EXPLAIN FORMAT=JSON as it prints selectivity with greater precision.



 Comments   
Comment by Sergei Petrunia [ 2019-02-27 ]

Test #2: FlightStats

Let's study these columns:

  • AirTime (flight time in minutes. Note that for many flights it is 0)
  • FlightDate

For flightDate - Please check if the number of flights done in a week varies
across weeks (I guess it should), and pick a popular, mid-range, and unpopular
week. Then,

EXPLAIN SELECT (*) FROM ontime WHERE FlightDate between $date and DATE_ADD($date, INTERVAL 1 WEEK);

For AirTime:

EXPLAIN SELECT * FROM ontime WHERE AirTime=0  // the special value

Then let's pick some values, like

EXPLAIN SELECT * FROM ontime WHERE AirTime BETWEEN 1 and 15 ; // flights 15 minutes
EXPLAIN SELECT * FROM ontime WHERE AirTime BETWEEN 60 and 90;
EXPLAIN SELECT * FROM ontime WHERE AirTime BETWEEN 300 and 330;
EXPLAIN SELECT * FROM ontime WHERE AirTime> 600 and Airtime < 630;

Please measure these in the same settings as in the previous benchmarks.

Comment by Sergei Petrunia [ 2019-02-27 ]

First results: https://docs.google.com/spreadsheets/d/1c2fyVo5bFtCDPfesaxLVtqx6fSRb0LaiVVjXD3i0jRY/edit?usp=sharing

Comment by Sergei Petrunia [ 2019-02-27 ]

cc cvicentiu

Generated at Thu Feb 08 08:46:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.