(Filing this as a follow-up to discussions on Slack)
Before making JSON_HB histograms enabled by default (MDEV-27062), it would be nice to make sure they will not cause performance regressions at the users' systems.
Note: we want to check for performance regressions, not histogram accuracy (Elena has run some tests for histogram accuracy already)
Note: it won't be possible to avoid performance regressions completely. One can construct realistic-looking cases where improved histogram precision will cause a worse query plan due to the "Optimizer was making two wrong choices, we've fixed one" effect.
(The most likely second wrong choice is not accounting for correlations between conditions).
The following was suggested: run some test that was not specifically designed with JSON_HB histograms in mind and check the results. The problem is finding a benchmark or workload where JSON_HB vs DOUBLE_PREC_HB difference will be apparent.
Attachments
Issue Links
blocks
MDEV-27062Make histogram_type=JSON_HB the new default
"eits1" (or just "eits") - with DOUBLE_PREC_HB histograms collected for all tables/columns
"eits2" - with JSON_HB histograms collected for all tables/columns
set @ratio=2.0;
The results are:
MariaDB [imdbload]> select
-> (select count(*) from my_eits_vs_no_eits where eits_relative_qtime <= 100/@ratio) as eits_faster,
-> (select count(*) from my_eits_vs_no_eits where eits_relative_qtime >= 100*@ratio ) as eits_slower;
+-------------+-------------+
| eits_faster | eits_slower |
+-------------+-------------+
| 63 | 12 |
+-------------+-------------+
1 row in set (0.000 sec)
DOUBLE_PREC_HB histograms improve speed for 63 queries and cause 12 slowdowns (there are 113 queries in total).
MariaDB [imdbload]> select
-> (select count(*) from my_eits1_vs_eits2 where eits2_relative_qtime <= 100/@ratio) as eits_faster,
-> (select count(*) from my_eits1_vs_eits2 where eits2_relative_qtime >= 100* @ratio ) as eits_slower;
+-------------+-------------+
| eits_faster | eits_slower |
+-------------+-------------+
| 2 | 1 |
+-------------+-------------+
1 row in set (0.000 sec)
Switching to JSON_HB made two queries faster, one query slower.
Sergei Petrunia
added a comment - Join Order Benchmark looks like a possible candidate.
I have a script that runs it and collects the results: https://github.com/spetrunia/join-order-benchmark-tool
I've ran it for
No EITS - without EITS statistics.
"eits1" (or just "eits") - with DOUBLE_PREC_HB histograms collected for all tables/columns
"eits2" - with JSON_HB histograms collected for all tables/columns
set @ratio=2.0;
The results are:
MariaDB [imdbload]> select
-> (select count(*) from my_eits_vs_no_eits where eits_relative_qtime <= 100/@ratio) as eits_faster,
-> (select count(*) from my_eits_vs_no_eits where eits_relative_qtime >= 100*@ratio ) as eits_slower;
+-------------+-------------+
| eits_faster | eits_slower |
+-------------+-------------+
| 63 | 12 |
+-------------+-------------+
1 row in set (0.000 sec)
DOUBLE_PREC_HB histograms improve speed for 63 queries and cause 12 slowdowns (there are 113 queries in total).
MariaDB [imdbload]> select
-> (select count(*) from my_eits1_vs_eits2 where eits2_relative_qtime <= 100/@ratio) as eits_faster,
-> (select count(*) from my_eits1_vs_eits2 where eits2_relative_qtime >= 100* @ratio ) as eits_slower;
+-------------+-------------+
| eits_faster | eits_slower |
+-------------+-------------+
| 2 | 1 |
+-------------+-------------+
1 row in set (0.000 sec)
Switching to JSON_HB made two queries faster, one query slower.
Join Order Benchmark looks like a possible candidate.
I have a script that runs it and collects the results: https://github.com/spetrunia/join-order-benchmark-tool
I've ran it for
set @ratio=2.0;
The results are:
MariaDB [imdbload]> select
-> (select count(*) from my_eits_vs_no_eits where eits_relative_qtime <= 100/@ratio) as eits_faster,
-> (select count(*) from my_eits_vs_no_eits where eits_relative_qtime >= 100*@ratio ) as eits_slower;
+-------------+-------------+
| eits_faster | eits_slower |
+-------------+-------------+
| 63 | 12 |
+-------------+-------------+
1 row in set (0.000 sec)
DOUBLE_PREC_HB histograms improve speed for 63 queries and cause 12 slowdowns (there are 113 queries in total).
MariaDB [imdbload]> select
-> (select count(*) from my_eits1_vs_eits2 where eits2_relative_qtime <= 100/@ratio) as eits_faster,
-> (select count(*) from my_eits1_vs_eits2 where eits2_relative_qtime >= 100* @ratio ) as eits_slower;
+-------------+-------------+
| eits_faster | eits_slower |
+-------------+-------------+
| 2 | 1 |
+-------------+-------------+
1 row in set (0.000 sec)
Switching to JSON_HB made two queries faster, one query slower.