Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32472

Test histogram precision for ANALYZE, 2023

Details

    Description

      We need to test precision of histograms (both DOUBLE_PREC_HB and JSON_HB) with different analyze_sample_percentage settings.

      Goals:

      • Catch possible bugs, or just undesirable estimates.
      • Validate analyze_sample_percentage=0 before making it the default.

      Means:
      Running full benchmarks is not a good way to achieve this:

      • It is difficult to see if better query plans are caused by better/worse selectivity estimates.
      • Benchmark queries do only a few requests to histogram data.

      We reuse the approach from histogram-test [1] script:

      • Get some dataset somewhere
      • Collect histogram
      • Run small test queries and see what estimates the optimizer gets from histogram code for various ranges.
        • ask for common/uncommon values
        • ask for wide/narrow intervals
        • etc

      Things to check for

      • Height-balanced histograms should not produce errors that are larger than bucket_size (right?)
        • But for DOUBLE_PREC_HB there is also an error due to imprecise storage of bucket endpoint. Not sure what are the bounds on that.
      • Using sampling instead of the full dataset should not reduce the precision much.
        • what happens to n_distinct estimate?
        • Monitor disk space. Does it increase less with the percentage 0. How much less?

      [1] https://github.com/spetrunia/histogram-test.

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia added a comment - - edited

            just an observation: unlike in MariaDB, in MySQL performance schema instruments temporary file operations, so one can monitor temp file usage.

            Using a {{SELECT COUNT(DISTINCT) }} query as an example:

            UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES';
            UPDATE performance_schema.setup_consumers SET ENABLED = 'YES';
             
            select 
              event_name, source, timer_wait, object_name, object_type,
              object_instance_begin, nesting_event_id, operation, number_of_bytes
            from 
              events_waits_history_long 
            where 
              event_name='wait/io/file/sql/io_cache';
            

            +---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+
            | event_name                | source             | timer_wait | object_name  | object_type | object_instance_begin | nesting_event_id | operation | number_of_bytes |
            +---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   36268436 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   26239308 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25965756 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24482904 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25177904 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   26441136 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25628820 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24544620 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25093392 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   26585696 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24908800 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24778696 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25735016 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   26504520 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25397524 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24916028 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25496492 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   27547576 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24461776 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   23730636 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25158444 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   27067192 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25512616 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   24772024 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   29405172 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   27908420 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25396968 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   28189200 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |   25007212 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |           65536 |
            | wait/io/file/sql/io_cache | mf_iocache.cc:1690 |    6350632 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | write     |            2304 |
            | wait/io/file/sql/io_cache | uniques.cc:124     |  225208912 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | read      |         2097156 |
            | wait/io/file/sql/io_cache | uniques.cc:124     |  200659288 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1225845 | read      |         1902844 |
            | wait/io/file/sql/io_cache | mf_cache.cc:93     |  398505772 | /tmp/MYfd=45 | FILE        |       140737121288512 |          1535991 | close     |            NULL |
            +---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+
            

            psergei Sergei Petrunia added a comment - - edited just an observation: unlike in MariaDB, in MySQL performance schema instruments temporary file operations, so one can monitor temp file usage. Using a {{SELECT COUNT(DISTINCT) }} query as an example: UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES'; UPDATE performance_schema.setup_consumers SET ENABLED = 'YES';   select event_name, source, timer_wait, object_name, object_type, object_instance_begin, nesting_event_id, operation, number_of_bytes from events_waits_history_long where event_name='wait/io/file/sql/io_cache'; +---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+ | event_name | source | timer_wait | object_name | object_type | object_instance_begin | nesting_event_id | operation | number_of_bytes | +---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+ | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 36268436 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 26239308 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25965756 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 24482904 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25177904 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 26441136 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25628820 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 24544620 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25093392 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 26585696 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 24908800 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 24778696 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25735016 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 26504520 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25397524 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 24916028 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25496492 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 27547576 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 24461776 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 23730636 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25158444 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 27067192 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25512616 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 24772024 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 29405172 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 27908420 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25396968 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 28189200 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 25007212 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 65536 | | wait/io/file/sql/io_cache | mf_iocache.cc:1690 | 6350632 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | write | 2304 | | wait/io/file/sql/io_cache | uniques.cc:124 | 225208912 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | read | 2097156 | | wait/io/file/sql/io_cache | uniques.cc:124 | 200659288 | /tmp/MYfd=45 | FILE | 140737121288512 | 1225845 | read | 1902844 | | wait/io/file/sql/io_cache | mf_cache.cc:93 | 398505772 | /tmp/MYfd=45 | FILE | 140737121288512 | 1535991 | close | NULL | +---------------------------+--------------------+------------+--------------+-------------+-----------------------+------------------+-----------+-----------------+

            ... Stumbled upon a way to monitor tempfile usage...
            MariaDB makes these syscalls:

            strace -p 572442 -f --trace=file,desc
            ...
            [pid 572504] openat(AT_FDCWD, "/tmp", O_RDWR|O_TRUNC|O_CLOEXEC|O_TMPFILE, 0660) = 55
            ...
            [pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
            [pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
            [pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
            [pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
            [pid 572504] close(55)                  = 0
            ...
            

            So, a script to monitor would be:

            #!/usr/bin/env bpftrace
             
            BEGIN
            {
              printf("Tracing tempfile writes. Ctrl-C to stop\n");
            }
             
            tracepoint:syscalls:sys_enter_openat / comm == "mysqld" /
            {
              if (str(args->filename) == "/tmp") {
                @opening_tmp_file[tid]=1;
              } else {
                @opening_tmp_file[tid]=0;
              }
            }
             
            tracepoint:syscalls:sys_exit_openat / comm == "mysqld" /
            {
              if (@opening_tmp_file[tid]) {
                @tempfiles[args->ret]=1;
              }
            }
             
            tracepoint:syscalls:sys_enter_write / comm == "mysqld"/
            {
              if (@tempfiles[args->fd]) {
                @write_in_progress[tid]=args->fd;
              } else {
                @write_in_progress[tid]=0;
              }
            }
             
            tracepoint:syscalls:sys_exit_write / comm == "mysqld"/
            {
              if (@write_in_progress[tid]) {
                @tempfile_size[@write_in_progress[tid]]+=args->ret;
                @totals[0]= sum(args->ret);
              }
            }
             
            interval:s:$1 { exit(); }
            

            psergei Sergei Petrunia added a comment - ... Stumbled upon a way to monitor tempfile usage... MariaDB makes these syscalls: strace -p 572442 -f --trace=file,desc ... [pid 572504] openat(AT_FDCWD, "/tmp", O_RDWR|O_TRUNC|O_CLOEXEC|O_TMPFILE, 0660) = 55 ... [pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536 [pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536 [pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536 [pid 572504] write(55, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536 [pid 572504] close(55) = 0 ... So, a script to monitor would be: #!/usr/bin/env bpftrace   BEGIN { printf("Tracing tempfile writes. Ctrl-C to stop\n"); }   tracepoint:syscalls:sys_enter_openat / comm == "mysqld" / { if (str(args->filename) == "/tmp") { @opening_tmp_file[tid]=1; } else { @opening_tmp_file[tid]=0; } }   tracepoint:syscalls:sys_exit_openat / comm == "mysqld" / { if (@opening_tmp_file[tid]) { @tempfiles[args->ret]=1; } }   tracepoint:syscalls:sys_enter_write / comm == "mysqld"/ { if (@tempfiles[args->fd]) { @write_in_progress[tid]=args->fd; } else { @write_in_progress[tid]=0; } }   tracepoint:syscalls:sys_exit_write / comm == "mysqld"/ { if (@write_in_progress[tid]) { @tempfile_size[@write_in_progress[tid]]+=args->ret; @totals[0]= sum(args->ret); } }   interval:s:$1 { exit(); }

            Running

            bpftrace  tempfile-usage.bt  60
            

            For
            analyze table t1 persistent for all;
            It shows

            @tempfile_size[55]: 102600000
            @tempfile_size[54]: 102600000
            @tempfile_size[53]: 102600000
             
            @tempfiles[55]: 1
            @tempfiles[54]: 1
            @tempfiles[53]: 1
             
            @totals[0]: 307800000
            

            Then for

            set analyze_sample_percentage=50;
            analyze table t1 persistent for all;
            

            it shows

            @tempfile_size[54]: 51189192
            @tempfile_size[55]: 51189192
            @tempfile_size[53]: 51189192
             
            @tempfiles[55]: 1
            @tempfiles[53]: 1
            @tempfiles[54]: 1
             
            @totals[0]: 153567576
            

            psergei Sergei Petrunia added a comment - Running bpftrace tempfile-usage.bt 60 For analyze table t1 persistent for all; It shows @tempfile_size[55]: 102600000 @tempfile_size[54]: 102600000 @tempfile_size[53]: 102600000   @tempfiles[55]: 1 @tempfiles[54]: 1 @tempfiles[53]: 1   @totals[0]: 307800000 Then for set analyze_sample_percentage=50; analyze table t1 persistent for all; it shows @tempfile_size[54]: 51189192 @tempfile_size[55]: 51189192 @tempfile_size[53]: 51189192   @tempfiles[55]: 1 @tempfiles[53]: 1 @tempfiles[54]: 1   @totals[0]: 153567576

            From psergei's comment on 2023-12-04, we are indeed using less disk space with less of a percentage, possibly providing a workaround for customers while waiting for MDEV-6529.
            What are your conclusions from the test result spreadsheet psergei regarding analyze_sample_percentage=0?

            masonmariadb Mason Sharp (Inactive) added a comment - From psergei 's comment on 2023-12-04, we are indeed using less disk space with less of a percentage, possibly providing a workaround for customers while waiting for MDEV-6529 . What are your conclusions from the test result spreadsheet psergei regarding analyze_sample_percentage=0?

            masonmariadb, I'll still need to make them. Lena has posted the details for one of the red areas and the ball is on my side to look at it.

            psergei Sergei Petrunia added a comment - masonmariadb , I'll still need to make them. Lena has posted the details for one of the red areas and the ball is on my side to look at it.

            People

              lstartseva Lena Startseva
              psergei Sergei Petrunia
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.