[MDEV-32472] Test histogram precision for ANALYZE, 2023 - Jira

Details

Type: Task
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: 10.6
Component/s: None
Labels:
- histogram-test

Description

We need to test precision of histograms (both DOUBLE_PREC_HB and JSON_HB) with different analyze_sample_percentage settings.

Goals:

Catch possible bugs, or just undesirable estimates.
Validate analyze_sample_percentage=0 before making it the default.

Means:
Running full benchmarks is not a good way to achieve this:

It is difficult to see if better query plans are caused by better/worse selectivity estimates.
Benchmark queries do only a few requests to histogram data.

We reuse the approach from histogram-test [1] script:

Get some dataset somewhere
Collect histogram
Run small test queries and see what estimates the optimizer gets from histogram code for various ranges.
- ask for common/uncommon values
- ask for wide/narrow intervals
- etc

Things to check for

Height-balanced histograms should not produce errors that are larger than bucket_size (right?)
- But for DOUBLE_PREC_HB there is also an error due to imprecise storage of bucket endpoint. Not sure what are the bounds on that.

Using sampling instead of the full dataset should not reduce the precision much.
- what happens to n_distinct estimate?
- Monitor disk space. Does it increase less with the percentage 0. How much less?

[1] https://github.com/spetrunia/histogram-test.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

result of testing-edit2.ods
23 kB
2023-11-10 11:02

Issue Links

relates to

MDEV-32580 The value of cardinality in mysql.table_stats may exceed by more than 10% the actual number of records in the table

Open

MDEV-6529 optimize VARCHAR temp storage during EITS ANALYZE

Stalled

Activity

Ascending order - Click to sort in descending order

View 5 older comments

Sergei Petrunia added a comment - 2023-11-13 08:53 - edited

just an observation: unlike in MariaDB, in MySQL performance schema instruments temporary file operations, so one can monitor temp file usage.

Using a {{SELECT COUNT(DISTINCT) }} query as an example:

UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES';

UPDATE performance_schema.setup_consumers SET ENABLED = 'YES';

select

  event_name, source, timer_wait, object_name, object_type,

  object_instance_begin, nesting_event_id, operation, number_of_bytes

from

  events_waits_history_long

where

  event_name='wait/io/file/sql/io_cache';

+---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+

| event_name                | source             | timer_wait | object_name  | object_type | object_instance_begin | nesting_event_id | operation | number_of_bytes |

+---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   36268436 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   26239308 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25965756 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24482904 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25177904 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   26441136 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25628820 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24544620 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25093392 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   26585696 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24908800 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24778696 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25735016 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   26504520 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25397524 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24916028 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25496492 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   27547576 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24461776 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   23730636 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25158444 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   27067192 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25512616 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24772024 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   29405172 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   27908420 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25396968 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   28189200 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25007212 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |

| wait/io/file/sql/io_cache | mf_iocache.cc:1690 |    6350632 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |            2304 |

| wait/io/file/sql/io_cache | uniques.cc:124     |  225208912 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | read      |         2097156 |

| wait/io/file/sql/io_cache | uniques.cc:124     |  200659288 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | read      |         1902844 |

| wait/io/file/sql/io_cache | mf_cache.cc:93     |  398505772 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1535991 | close     |            NULL |

+---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+

Sergei Petrunia added a comment - 2023-11-13 08:53 - edited just an observation: unlike in MariaDB, in MySQL performance schema instruments temporary file operations, so one can monitor temp file usage. Using a {{SELECT COUNT(DISTINCT) }} query as an example: UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES'; UPDATE performance_schema.setup_consumers SET ENABLED = 'YES'; select event_name, source, timer_wait, object_name, object_type, object_instance_begin, nesting_event_id, operation, number_of_bytes from events_waits_history_long where event_name='wait/io/file/sql/io_cache'; +---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+ | event_name | source | timer_wait | object_name | object_type | object_instance_begin | nesting_event_id | operation | number_of_bytes | +---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+ | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 36268436 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 26239308 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25965756 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 24482904 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25177904 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 26441136 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25628820 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 24544620 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25093392 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 26585696 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 24908800 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 24778696 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25735016 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 26504520 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25397524 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 24916028 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25496492 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 27547576 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 24461776 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 23730636 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25158444 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 27067192 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25512616 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 24772024 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 29405172 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 27908420 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25396968 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 28189200 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25007212 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 6350632 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 2304 | | wait/io/file/sql/io_cache | uniques.cc:124 | 225208912 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | read | 2097156 | | wait/io/file/sql/io_cache | uniques.cc:124 | 200659288 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | read | 1902844 | | wait/io/file/sql/io_cache | mf_cache.cc:93 | 398505772 | /tmp/MYfd=45 | FILE | 140737121288512 | 1535991 | close | NULL | +---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+

Sergei Petrunia added a comment - 2023-12-04 08:11

... Stumbled upon a way to monitor tempfile usage...
MariaDB makes these syscalls:

strace -p 572442 -f --trace=file,desc

...

[pid 572504] openat(AT_FDCWD, "/tmp", O_RDWR|O_TRUNC|O_CLOEXEC|O_TMPFILE, 0660) = 55

...

[pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536

[pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536

[pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536

[pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536

[pid 572504] close(55)                  = 0

...

So, a script to monitor would be:

#!/usr/bin/env bpftrace

BEGIN

  printf("Tracing tempfile writes. Ctrl-C to stop\n");

tracepoint:syscalls:sys_enter_openat / comm == "mysqld" /

  if (str(args->filename) == "/tmp") {

    @opening_tmp_file[tid]=1;

  } else {

    @opening_tmp_file[tid]=0;

tracepoint:syscalls:sys_exit_openat / comm == "mysqld" /

  if (@opening_tmp_file[tid]) {

    @tempfiles[args->ret]=1;

tracepoint:syscalls:sys_enter_write / comm == "mysqld"/

  if (@tempfiles[args->fd]) {

    @write_in_progress[tid]=args->fd;

  } else {

    @write_in_progress[tid]=0;

tracepoint:syscalls:sys_exit_write / comm == "mysqld"/

  if (@write_in_progress[tid]) {

    @tempfile_size[@write_in_progress[tid]]+=args->ret;

    @totals[0]= sum(args->ret);

interval:s:$1 { exit(); }

Sergei Petrunia added a comment - 2023-12-04 08:11 ... Stumbled upon a way to monitor tempfile usage... MariaDB makes these syscalls: strace -p 572442 -f --trace=file,desc ... [pid 572504] openat(AT_FDCWD, "/tmp", O_RDWR|O_TRUNC|O_CLOEXEC|O_TMPFILE, 0660) = 55 ... [pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536 [pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536 [pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536 [pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536 [pid 572504] close(55) = 0 ... So, a script to monitor would be: #!/usr/bin/env bpftrace BEGIN { printf("Tracing tempfile writes. Ctrl-C to stop\n"); } tracepoint:syscalls:sys_enter_openat / comm == "mysqld" / { if (str(args->filename) == "/tmp") { @opening_tmp_file[tid]=1; } else { @opening_tmp_file[tid]=0; } } tracepoint:syscalls:sys_exit_openat / comm == "mysqld" / { if (@opening_tmp_file[tid]) { @tempfiles[args->ret]=1; } } tracepoint:syscalls:sys_enter_write / comm == "mysqld"/ { if (@tempfiles[args->fd]) { @write_in_progress[tid]=args->fd; } else { @write_in_progress[tid]=0; } } tracepoint:syscalls:sys_exit_write / comm == "mysqld"/ { if (@write_in_progress[tid]) { @tempfile_size[@write_in_progress[tid]]+=args->ret; @totals[0]= sum(args->ret); } } interval:s:$1 { exit(); }

Sergei Petrunia added a comment - 2023-12-04 08:11

Running

bpftrace  tempfile-usage.bt  60

For
analyze table t1 persistent for all;
It shows

@tempfile_size[55]: 102600000

@tempfile_size[54]: 102600000

@tempfile_size[53]: 102600000

@tempfiles[55]: 1

@tempfiles[54]: 1

@tempfiles[53]: 1

@totals[0]: 307800000

Then for

set analyze_sample_percentage=50;

analyze table t1 persistent for all;

it shows

@tempfile_size[54]: 51189192

@tempfile_size[55]: 51189192

@tempfile_size[53]: 51189192

@tempfiles[55]: 1

@tempfiles[53]: 1

@tempfiles[54]: 1

@totals[0]: 153567576

Sergei Petrunia added a comment - 2023-12-04 08:11 Running bpftrace tempfile-usage.bt 60 For analyze table t1 persistent for all; It shows @tempfile_size[55]: 102600000 @tempfile_size[54]: 102600000 @tempfile_size[53]: 102600000 @tempfiles[55]: 1 @tempfiles[54]: 1 @tempfiles[53]: 1 @totals[0]: 307800000 Then for set analyze_sample_percentage=50; analyze table t1 persistent for all; it shows @tempfile_size[54]: 51189192 @tempfile_size[55]: 51189192 @tempfile_size[53]: 51189192 @tempfiles[55]: 1 @tempfiles[53]: 1 @tempfiles[54]: 1 @totals[0]: 153567576

Mason Sharp (Inactive) added a comment - 2024-01-05 18:43

From psergei's comment on 2023-12-04, we are indeed using less disk space with less of a percentage, possibly providing a workaround for customers while waiting for MDEV-6529.
What are your conclusions from the test result spreadsheet psergei regarding analyze_sample_percentage=0?

Mason Sharp (Inactive) added a comment - 2024-01-05 18:43 From psergei 's comment on 2023-12-04, we are indeed using less disk space with less of a percentage, possibly providing a workaround for customers while waiting for MDEV-6529 . What are your conclusions from the test result spreadsheet psergei regarding analyze_sample_percentage=0?

Sergei Petrunia added a comment - 2024-01-09 09:44

masonmariadb, I'll still need to make them. Lena has posted the details for one of the red areas and the ball is on my side to look at it.

Sergei Petrunia added a comment - 2024-01-09 09:44 masonmariadb , I'll still need to make them. Lena has posted the details for one of the red areas and the ball is on my side to look at it.

MariaDB Server

Test histogram precision for ANALYZE, 2023

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration