[MDEV-32472] Test histogram precision for ANALYZE, 2023 Created: 2023-10-13 Updated: 2024-01-22 |
|
| Status: | In Progress |
| Project: | MariaDB Server |
| Component/s: | None |
| Fix Version/s: | 10.6 |
| Type: | Task | Priority: | Major |
| Reporter: | Sergei Petrunia | Assignee: | Lena Startseva |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | histogram-test | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
We need to test precision of histograms (both DOUBLE_PREC_HB and JSON_HB) with different analyze_sample_percentage settings. Goals:
Means:
We reuse the approach from histogram-test [1] script:
Things to check for
[1] https://github.com/spetrunia/histogram-test. |
| Comments |
| Comment by Sergei Petrunia [ 2023-10-19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
(notes from yesterday discussion) Tasks for milestone 1Manually examine the provided datasets and pick a set of table.columnName for use. The columns should be such that the values are not uniformly distributed and one can have a meaningful query with a range predicate. Load the datasets manually. Loading can take a while, so automating it seems premature at this point. We assume one will keep the datadir stashed somewhere between the runs. Create a script that takes $TABLE_NAME $COLUMN_NAME and produces interesting ranges. Basic interesting ranges:
This can be run only once. Now, the following part need to be automated:
This will allow to evaluate estimation precision and catch mis-estimations. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2023-10-19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
The first test we need should use 10.6 and the default settings:
and then check the values of analyze_sample_percentage. How does the values of 0 (automatically pick sample size) 10 and 50 compare against the default 100? Note. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2023-11-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ok analyzing the first result: result of testing-edit2.ods
First, need to check the red areas. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Lena Startseva [ 2023-11-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Case for red area ("Poisson distribution" lambda = 1): Create and fill a table:
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mason Sharp [ 2023-11-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2023-11-13 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
just an observation: unlike in MariaDB, in MySQL performance schema instruments temporary file operations, so one can monitor temp file usage. Using a {{SELECT COUNT(DISTINCT) }} query as an example:
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2023-12-04 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
... Stumbled upon a way to monitor tempfile usage...
So, a script to monitor would be:
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2023-12-04 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Running
For
Then for
it shows
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mason Sharp [ 2024-01-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
From psergei's comment on 2023-12-04, we are indeed using less disk space with less of a percentage, possibly providing a workaround for customers while waiting for MDEV-6529. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2024-01-09 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
masonmariadb, I'll still need to make them. Lena has posted the details for one of the red areas and the ball is on my side to look at it. |