[MDEV-27566] Benchmark-like test for JSON_HB histograms - Jira

Details

Type: Task
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Optimizer, Tests
Labels:

Description

(Filing this as a follow-up to discussions on Slack)

Before making JSON_HB histograms enabled by default (~~MDEV-27062~~), it would be nice to make sure they will not cause performance regressions at the users' systems.

Note: we want to check for performance regressions, not histogram accuracy (Elena has run some tests for histogram accuracy already)

Note: it won't be possible to avoid performance regressions completely. One can construct realistic-looking cases where improved histogram precision will cause a worse query plan due to the "Optimizer was making two wrong choices, we've fixed one" effect.
(The most likely second wrong choice is not accounting for correlations between conditions).

The following was suggested: run some test that was not specifically designed with JSON_HB histograms in mind and check the results. The problem is finding a benchmark or workload where JSON_HB vs DOUBLE_PREC_HB difference will be apparent.

Attachments

Issue Links

blocks

MDEV-27062 Make histogram_type=JSON_HB the new default

Closed

relates to

MDEV-21130 Histograms: use JSON as on-disk format

Closed

Activity

Ascending order - Click to sort in descending order

Sergei Petrunia added a comment - 2022-01-25 13:54

Join Order Benchmark looks like a possible candidate.
I have a script that runs it and collects the results: https://github.com/spetrunia/join-order-benchmark-tool

I've ran it for

No EITS - without EITS statistics.
"eits1" (or just "eits") - with DOUBLE_PREC_HB histograms collected for all tables/columns
"eits2" - with JSON_HB histograms collected for all tables/columns
set @ratio=2.0;

The results are:

MariaDB [imdbload]> select

    ->   (select count(*) from my_eits_vs_no_eits where eits_relative_qtime <= 100/@ratio) as eits_faster,

    ->   (select count(*) from my_eits_vs_no_eits where eits_relative_qtime >= 100*@ratio ) as eits_slower;

+-------------+-------------+

| eits_faster | eits_slower |

+-------------+-------------+

|          63 |          12 |

+-------------+-------------+

1 row in set (0.000 sec)

DOUBLE_PREC_HB histograms improve speed for 63 queries and cause 12 slowdowns (there are 113 queries in total).

MariaDB [imdbload]> select

    ->   (select count(*) from my_eits1_vs_eits2 where eits2_relative_qtime <= 100/@ratio) as eits_faster,

    ->   (select count(*) from my_eits1_vs_eits2 where eits2_relative_qtime >= 100* @ratio ) as eits_slower;

+-------------+-------------+

| eits_faster | eits_slower |

+-------------+-------------+

|           2 |           1 |

+-------------+-------------+

1 row in set (0.000 sec)

Switching to JSON_HB made two queries faster, one query slower.

Sergei Petrunia added a comment - 2022-01-25 13:54 Join Order Benchmark looks like a possible candidate. I have a script that runs it and collects the results: https://github.com/spetrunia/join-order-benchmark-tool I've ran it for No EITS - without EITS statistics. "eits1" (or just "eits") - with DOUBLE_PREC_HB histograms collected for all tables/columns "eits2" - with JSON_HB histograms collected for all tables/columns set @ratio=2.0; The results are: MariaDB [imdbload]> select -> (select count(*) from my_eits_vs_no_eits where eits_relative_qtime <= 100/@ratio) as eits_faster, -> (select count(*) from my_eits_vs_no_eits where eits_relative_qtime >= 100*@ratio ) as eits_slower; +-------------+-------------+ | eits_faster | eits_slower | +-------------+-------------+ | 63 | 12 | +-------------+-------------+ 1 row in set (0.000 sec) DOUBLE_PREC_HB histograms improve speed for 63 queries and cause 12 slowdowns (there are 113 queries in total). MariaDB [imdbload]> select -> (select count(*) from my_eits1_vs_eits2 where eits2_relative_qtime <= 100/@ratio) as eits_faster, -> (select count(*) from my_eits1_vs_eits2 where eits2_relative_qtime >= 100* @ratio ) as eits_slower; +-------------+-------------+ | eits_faster | eits_slower | +-------------+-------------+ | 2 | 1 | +-------------+-------------+ 1 row in set (0.000 sec) Switching to JSON_HB made two queries faster, one query slower.

Sergei Petrunia added a comment - 2022-01-25 13:56

The queries where using JSON_HB caused a change were:

MariaDB [imdbload]> select * from my_cmp1 where eits1>0 and eits2>0 and (eits1/eits2>@ratio or eits1/eits2< 1/@ratio);

+------------+---------+-------+-------+

| query_name | no_eits | eits1 | eits2 |

+------------+---------+-------+-------+

| 24a        |    9078 |   394 |   172 |  #  JSON_HB is faster

| 29a        |    8864 |  1780 |    37 | #  JSON_HB is faster

| 8a         |    7720 |  2756 |  7671 | #  JSON_HB is slower (back to no-histogram speed)

+------------+---------+-------+-------+

3 rows in set (0.001 sec)

Sergei Petrunia added a comment - 2022-01-25 13:56 The queries where using JSON_HB caused a change were: MariaDB [imdbload]> select * from my_cmp1 where eits1>0 and eits2>0 and (eits1/eits2>@ratio or eits1/eits2< 1/@ratio); +------------+---------+-------+-------+ | query_name | no_eits | eits1 | eits2 | +------------+---------+-------+-------+ | 24a | 9078 | 394 | 172 | # JSON_HB is faster | 29a | 8864 | 1780 | 37 | # JSON_HB is faster | 8a | 7720 | 2756 | 7671 | # JSON_HB is slower (back to no-histogram speed) +------------+---------+-------+-------+ 3 rows in set (0.001 sec)

People

Assignee:: Sergei Petrunia

Reporter:: Sergei Petrunia

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2022-01-21 13:29

Updated:: 2024-06-21 09:06

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server