[MDEV-32472] Test histogram precision for ANALYZE, 2023 Created: 2023-10-13  Updated: 2024-01-22

Status: In Progress
Project: MariaDB Server
Component/s: None
Fix Version/s: 10.6

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Lena Startseva
Resolution: Unresolved Votes: 1
Labels: histogram-test

Attachments: File result of testing-edit2.ods    
Issue Links:
Relates
relates to MDEV-32580 The value of cardinality in mysql.tab... Open
relates to MDEV-6529 optimize VARCHAR temp storage during ... Confirmed

 Description   

We need to test precision of histograms (both DOUBLE_PREC_HB and JSON_HB) with different analyze_sample_percentage settings.

Goals:

  • Catch possible bugs, or just undesirable estimates.
  • Validate analyze_sample_percentage=0 before making it the default.

Means:
Running full benchmarks is not a good way to achieve this:

  • It is difficult to see if better query plans are caused by better/worse selectivity estimates.
  • Benchmark queries do only a few requests to histogram data.

We reuse the approach from histogram-test [1] script:

  • Get some dataset somewhere
  • Collect histogram
  • Run small test queries and see what estimates the optimizer gets from histogram code for various ranges.
    • ask for common/uncommon values
    • ask for wide/narrow intervals
    • etc

Things to check for

  • Height-balanced histograms should not produce errors that are larger than bucket_size (right?)
    • But for DOUBLE_PREC_HB there is also an error due to imprecise storage of bucket endpoint. Not sure what are the bounds on that.
  • Using sampling instead of the full dataset should not reduce the precision much.
    • what happens to n_distinct estimate?
    • Monitor disk space. Does it increase less with the percentage 0. How much less?

[1] https://github.com/spetrunia/histogram-test.



 Comments   
Comment by Sergei Petrunia [ 2023-10-19 ]

(notes from yesterday discussion)

Tasks for milestone 1

Manually examine the provided datasets and pick a set of table.columnName for use. The columns should be such that the values are not uniformly distributed and one can have a meaningful query with a range predicate.

Load the datasets manually. Loading can take a while, so automating it seems premature at this point. We assume one will keep the datadir stashed somewhere between the runs.

Create a script that takes $TABLE_NAME $COLUMN_NAME and produces interesting ranges.

Basic interesting ranges:

t.column='most_popular_value_in_the_table';
t.column='least_popular_value_in_the_table';
t.column='min_value_in_the_table';
t.column='max_value_in_the_table';
$left_neighbor_of_popular= (select col from t1 where col < $most_popular_value order by col desc limit 1)
$right_neighbor_of_popular= ...;
t.column=$left_neighbor_of_popular;
t.column=$right_neighbor_of_popular;
t.column between $left_neighbor_of_popular and $right_neighbor_of_popular;

This can be run only once.

Now, the following part need to be automated:

  • Set the required settings: histogram_type, analyze_sample_percentage.
  • Run ANALYZE to collect statistics with them
  • Compute the histogram estimates and real #rows for a set of interesting ranges
  • Produce a table with {range_name, filtered, r_filtered}

    values.

This will allow to evaluate estimation precision and catch mis-estimations.

Comment by Sergei Petrunia [ 2023-10-19 ]

The first test we need should use 10.6 and the default settings:

histogram_type=DOUBLE_PREC_HB
histogram_size=254

and then check the values of analyze_sample_percentage. How does the values of 0 (automatically pick sample size) 10 and 50 compare against the default 100?

Note.
It would be also nice to track how much space is used in the temp.directory but at the moment I don't see how to do that.
I've checked - performance_schema doesn't track temporary file writes...

Comment by Sergei Petrunia [ 2023-11-10 ]

Ok analyzing the first result: result of testing-edit2.ods .

  • Cells marked with "FAIL" are the cells where the difference between filtered and r_filtered has exceeded the bucket size.
  • There are some cells marked with FAIL. In most, the difference is insignificant.
  • There are two Red areas, where analyze_sample_percentage < 100 caused a big discrepancy.
  • The two pink areas show places where the difference is higher than in other FAIL areas...

First, need to check the red areas.
Also, processing the spreadsheet is time-consuming, we should switch to some other way.

Comment by Lena Startseva [ 2023-11-10 ]

Case for red area ("Poisson distribution" lambda = 1):

Create and fill a table:

create table t1(a int); -- table with result
 
DELIMITER $$
 
CREATE PROCEDURE pd(IN l INT, n INT) -- l - lambda in Poisson distribution, n - count
BEGIN
DECLARE x, k, i INT;
DECLARE p0, p, ver DOUBLE;
 
set i=1;
WHILE i <= n DO
    set k = 0;
    set p0 = EXP(-l);
    set p = p0;
    set ver = RAND();
    WHILE ver >= 0 DO
        set ver = ver-p;
        IF ver >=0 THEN
            set p = p*l/(k+1);
            set k=k+1;
        END IF;
    END WHILE;
   insert into t1 values (k+3);
set i=i+1;
END WHILE;
 
END;
$$
 
DELIMITER ;
 
call pd(1, 1000000);
 
insert into t1 select seq from seq_1_to_2;
insert into t1 select seq from seq_12_to_100;
 
set analyze_sample_percentage=10;
analyze table t1 persistent for all;
analyze select * from t1 where t1.a = 2;

Comment by Mason Sharp [ 2023-11-10 ]
  • We want results of analyze_sample_percentage = 0 and = 100 to be close, to confirm our sampling algorithm is correct
  • If results with analyze_sample_percentage = 100 are wrong, that is a separate issue, and we have bugs. Is this the case?
  • Do we have results for monitoring disk usage for 0 vs 100 for large tables?
Comment by Sergei Petrunia [ 2023-11-13 ]

just an observation: unlike in MariaDB, in MySQL performance schema instruments temporary file operations, so one can monitor temp file usage.

Using a {{SELECT COUNT(DISTINCT) }} query as an example:

UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES';
UPDATE performance_schema.setup_consumers SET ENABLED = 'YES';
 
select 
  event_name, source, timer_wait, object_name, object_type,
  object_instance_begin, nesting_event_id, operation, number_of_bytes
from 
  events_waits_history_long 
where 
  event_name='wait/io/file/sql/io_cache';

+---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+
| event_name                | source             | timer_wait | object_name  | object_type | object_instance_begin | nesting_event_id | operation | number_of_bytes |
+---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   36268436 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   26239308 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25965756 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24482904 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25177904 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   26441136 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25628820 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24544620 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25093392 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   26585696 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24908800 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24778696 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25735016 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   26504520 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25397524 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24916028 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25496492 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   27547576 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24461776 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   23730636 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25158444 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   27067192 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25512616 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24772024 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   29405172 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   27908420 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25396968 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   28189200 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25007212 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |    6350632 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |            2304 |
| wait/io/file/sql/io_cache | uniques.cc:124     |  225208912 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | read      |         2097156 |
| wait/io/file/sql/io_cache | uniques.cc:124     |  200659288 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | read      |         1902844 |
| wait/io/file/sql/io_cache | mf_cache.cc:93     |  398505772 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1535991 | close     |            NULL |
+---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+

Comment by Sergei Petrunia [ 2023-12-04 ]

... Stumbled upon a way to monitor tempfile usage...
MariaDB makes these syscalls:

strace -p 572442 -f --trace=file,desc
...
[pid 572504] openat(AT_FDCWD, "/tmp", O_RDWR|O_TRUNC|O_CLOEXEC|O_TMPFILE, 0660) = 55
...
[pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
[pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
[pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
[pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
[pid 572504] close(55)                  = 0
...

So, a script to monitor would be:

#!/usr/bin/env bpftrace
 
BEGIN
{
  printf("Tracing tempfile writes. Ctrl-C to stop\n");
}
 
tracepoint:syscalls:sys_enter_openat / comm == "mysqld" /
{
  if (str(args->filename) == "/tmp") {
    @opening_tmp_file[tid]=1;
  } else {
    @opening_tmp_file[tid]=0;
  }
}
 
tracepoint:syscalls:sys_exit_openat / comm == "mysqld" /
{
  if (@opening_tmp_file[tid]) {
    @tempfiles[args->ret]=1;
  }
}
 
tracepoint:syscalls:sys_enter_write / comm == "mysqld"/
{
  if (@tempfiles[args->fd]) {
    @write_in_progress[tid]=args->fd;
  } else {
    @write_in_progress[tid]=0;
  }
}
 
tracepoint:syscalls:sys_exit_write / comm == "mysqld"/
{
  if (@write_in_progress[tid]) {
    @tempfile_size[@write_in_progress[tid]]+=args->ret;
    @totals[0]= sum(args->ret);
  }
}
 
interval:s:$1 { exit(); }

Comment by Sergei Petrunia [ 2023-12-04 ]

Running

bpftrace  tempfile-usage.bt  60

For
analyze table t1 persistent for all;
It shows

@tempfile_size[55]: 102600000
@tempfile_size[54]: 102600000
@tempfile_size[53]: 102600000
 
@tempfiles[55]: 1
@tempfiles[54]: 1
@tempfiles[53]: 1
 
@totals[0]: 307800000

Then for

set analyze_sample_percentage=50;
analyze table t1 persistent for all;

it shows

@tempfile_size[54]: 51189192
@tempfile_size[55]: 51189192
@tempfile_size[53]: 51189192
 
@tempfiles[55]: 1
@tempfiles[53]: 1
@tempfiles[54]: 1
 
@totals[0]: 153567576

Comment by Mason Sharp [ 2024-01-05 ]

From psergei's comment on 2023-12-04, we are indeed using less disk space with less of a percentage, possibly providing a workaround for customers while waiting for MDEV-6529.
What are your conclusions from the test result spreadsheet psergei regarding analyze_sample_percentage=0?

Comment by Sergei Petrunia [ 2024-01-09 ]

masonmariadb, I'll still need to make them. Lena has posted the details for one of the red areas and the ball is on my side to look at it.

Generated at Thu Feb 08 10:31:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.