Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27566

Benchmark-like test for JSON_HB histograms

Details

    Description

      (Filing this as a follow-up to discussions on Slack)

      Before making JSON_HB histograms enabled by default (MDEV-27062), it would be nice to make sure they will not cause performance regressions at the users' systems.

      Note: we want to check for performance regressions, not histogram accuracy (Elena has run some tests for histogram accuracy already)

      Note: it won't be possible to avoid performance regressions completely. One can construct realistic-looking cases where improved histogram precision will cause a worse query plan due to the "Optimizer was making two wrong choices, we've fixed one" effect.
      (The most likely second wrong choice is not accounting for correlations between conditions).

      The following was suggested: run some test that was not specifically designed with JSON_HB histograms in mind and check the results. The problem is finding a benchmark or workload where JSON_HB vs DOUBLE_PREC_HB difference will be apparent.

      Attachments

        Issue Links

          Activity

            Join Order Benchmark looks like a possible candidate.
            I have a script that runs it and collects the results: https://github.com/spetrunia/join-order-benchmark-tool

            I've ran it for

            • No EITS - without EITS statistics.
            • "eits1" (or just "eits") - with DOUBLE_PREC_HB histograms collected for all tables/columns
            • "eits2" - with JSON_HB histograms collected for all tables/columns
              set @ratio=2.0;

            The results are:

            MariaDB [imdbload]> select 
                ->   (select count(*) from my_eits_vs_no_eits where eits_relative_qtime <= 100/@ratio) as eits_faster,
                ->   (select count(*) from my_eits_vs_no_eits where eits_relative_qtime >= 100*@ratio ) as eits_slower;
            +-------------+-------------+
            | eits_faster | eits_slower |
            +-------------+-------------+
            |          63 |          12 |
            +-------------+-------------+
            1 row in set (0.000 sec)
            

            DOUBLE_PREC_HB histograms improve speed for 63 queries and cause 12 slowdowns (there are 113 queries in total).

            MariaDB [imdbload]> select 
                ->   (select count(*) from my_eits1_vs_eits2 where eits2_relative_qtime <= 100/@ratio) as eits_faster,
                ->   (select count(*) from my_eits1_vs_eits2 where eits2_relative_qtime >= 100* @ratio ) as eits_slower;
            +-------------+-------------+
            | eits_faster | eits_slower |
            +-------------+-------------+
            |           2 |           1 |
            +-------------+-------------+
            1 row in set (0.000 sec)
            

            Switching to JSON_HB made two queries faster, one query slower.

            psergei Sergei Petrunia added a comment - Join Order Benchmark looks like a possible candidate. I have a script that runs it and collects the results: https://github.com/spetrunia/join-order-benchmark-tool I've ran it for No EITS - without EITS statistics. "eits1" (or just "eits") - with DOUBLE_PREC_HB histograms collected for all tables/columns "eits2" - with JSON_HB histograms collected for all tables/columns set @ratio=2.0; The results are: MariaDB [imdbload]> select -> (select count(*) from my_eits_vs_no_eits where eits_relative_qtime <= 100/@ratio) as eits_faster, -> (select count(*) from my_eits_vs_no_eits where eits_relative_qtime >= 100*@ratio ) as eits_slower; +-------------+-------------+ | eits_faster | eits_slower | +-------------+-------------+ | 63 | 12 | +-------------+-------------+ 1 row in set (0.000 sec) DOUBLE_PREC_HB histograms improve speed for 63 queries and cause 12 slowdowns (there are 113 queries in total). MariaDB [imdbload]> select -> (select count(*) from my_eits1_vs_eits2 where eits2_relative_qtime <= 100/@ratio) as eits_faster, -> (select count(*) from my_eits1_vs_eits2 where eits2_relative_qtime >= 100* @ratio ) as eits_slower; +-------------+-------------+ | eits_faster | eits_slower | +-------------+-------------+ | 2 | 1 | +-------------+-------------+ 1 row in set (0.000 sec) Switching to JSON_HB made two queries faster, one query slower.

            The queries where using JSON_HB caused a change were:

            MariaDB [imdbload]> select * from my_cmp1 where eits1>0 and eits2>0 and (eits1/eits2>@ratio or eits1/eits2< 1/@ratio);
            +------------+---------+-------+-------+
            | query_name | no_eits | eits1 | eits2 |
            +------------+---------+-------+-------+
            | 24a        |    9078 |   394 |   172 |  #  JSON_HB is faster
            | 29a        |    8864 |  1780 |    37 | #  JSON_HB is faster
            | 8a         |    7720 |  2756 |  7671 | #  JSON_HB is slower (back to no-histogram speed)
            +------------+---------+-------+-------+
            3 rows in set (0.001 sec)
            

            psergei Sergei Petrunia added a comment - The queries where using JSON_HB caused a change were: MariaDB [imdbload]> select * from my_cmp1 where eits1>0 and eits2>0 and (eits1/eits2>@ratio or eits1/eits2< 1/@ratio); +------------+---------+-------+-------+ | query_name | no_eits | eits1 | eits2 | +------------+---------+-------+-------+ | 24a | 9078 | 394 | 172 | # JSON_HB is faster | 29a | 8864 | 1780 | 37 | # JSON_HB is faster | 8a | 7720 | 2756 | 7671 | # JSON_HB is slower (back to no-histogram speed) +------------+---------+-------+-------+ 3 rows in set (0.001 sec)

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.