[MDEV-27497] DOUBLE_PREC_HB histogram produces wrong estimates for COMPRESSED columns - Jira

Details

Type: Bug
Status: Stalled (View Workflow)
Priority: Minor
Resolution: Unresolved
Affects Version/s: 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL)
Fix Version/s: 10.5, 10.6
Component/s: Optimizer
Labels:
None

Description

--source include/have_sequence.inc

CREATE TABLE t (a VARCHAR(16) COMPRESSED COLLATE ucs2_sinhala_ci);

INSERT INTO t SELECT REPEAT(CHR(seq%26+97),16) FROM seq_1_to_100;

set histogram_type= DOUBLE_PREC_HB;

analyze table t persistent for all;

ANALYZE FORMAT=JSON SELECT a FROM t WHERE 'c' BETWEEN 'l' AND a ;

set histogram_type= JSON_HB;

analyze table t persistent for all;

ANALYZE FORMAT=JSON SELECT a FROM t WHERE 'c' BETWEEN 'l' AND a ;

# Cleanup

DROP TABLE t;

DOUBLE_PREC_HB preview-10.8-MDEV-26519-json-histograms 21a81025
"table": {
"table_name": "t",
"access_type": "ALL",
"r_loops": 1,
"rows": 100,
"r_rows": 100,
"r_table_time_ms": 0.006661524,
"r_other_time_ms": 0.006350519,
"filtered": 3.846199989,
"r_filtered": 0,
"attached_condition": "'c' between 'l' and t.a"
}

JSON_HB
"table": {
"table_name": "t",
"access_type": "ALL",
"r_loops": 1,
"rows": 100,
"r_rows": 100,
"r_table_time_ms": 0.005778671,
"r_other_time_ms": 0.005598088,
"filtered": 93,
"r_filtered": 0,
"attached_condition": "'c' between 'l' and t.a"
}

The actual result set is naturally empty. So, DOUBLE_PREC is close, JSON is far away.

Reproducible with at least MyISAM, InnoDB, Aria.

Attachments

Issue Links

is caused by

MDEV-26519 JSON Histograms: improve histogram collection

Closed

Activity

Ascending order - Click to sort in descending order

Elena Stepanova created issue - 2022-01-13 20:44

Elena Stepanova made changes - 2022-01-13 20:44

Field	Original Value	New Value
Link		This issue is caused by ~~MDEV-26519~~ [ ~~MDEV-26519~~ ]

Sergei Petrunia made changes - 2022-01-14 12:29

Status

Open [ 1 ]

In Progress [ 3 ]

Sergei Petrunia added a comment - 2022-01-14 12:40

Summarizing the report data:

DOUBLE_PREC_HB

          "filtered": 3.846199989,

          "r_filtered": 0,

JSON

          "filtered": 93,

          "r_filtered": 0,

If I try to remove the COMPRESSED attribute, I get this:

DOUBLE_PREC_HB

          "filtered": 92.96875,

          "r_filtered": 0,

JSON_HB

          "filtered": 93,

          "r_filtered": 0,

DOUBLE_PREC_HB becomes just as bad as JSON_HB. It looks like the good estimate in the original report has been an accident.

Sergei Petrunia added a comment - 2022-01-14 12:40 Summarizing the report data: DOUBLE_PREC_HB "filtered": 3.846199989, "r_filtered": 0, JSON "filtered": 93, "r_filtered": 0, If I try to remove the COMPRESSED attribute, I get this: DOUBLE_PREC_HB "filtered": 92.96875, "r_filtered": 0, JSON_HB "filtered": 93, "r_filtered": 0, DOUBLE_PREC_HB becomes just as bad as JSON_HB. It looks like the good estimate in the original report has been an accident.

Sergei Petrunia added a comment - 2022-01-14 12:52

The column is actually too short to be compressed.
Status counters confirm this: no [de]compressions happen during the test run:

MariaDB [test]> show status like '%compressions';

+-----------------------+-------+

| Variable_name         | Value |

+-----------------------+-------+

| Column_compressions   | 0     |

| Column_decompressions | 0     |

+-----------------------+-------+

2 rows in set (0.003 sec)

Sergei Petrunia added a comment - 2022-01-14 12:52 The column is actually too short to be compressed. Status counters confirm this: no [de] compressions happen during the test run: MariaDB [test]> show status like '%compressions'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | Column_compressions | 0 | | Column_decompressions | 0 | +-----------------------+-------+ 2 rows in set (0.003 sec)

Sergei Petrunia added a comment - 2022-01-14 14:59

I compare the histograms that were produced when

the testcase used COMPRESSED column
the testcase didn't use COMPRESSED column.

The JSON_HB histograms are the same. The DOUBLE_PREC_HB histograms are different.
Compressed:
(I've done some manual reformatting to make the differences more apparent)

MariaDB [test]> select decode_histogram(hist_type,histogram) from save10;
0.00000,0.00000,0.00000,0.03999,
0.00000,0.00000,0.00000,0.00000,0.03999,
0.00000,0.00000,0.00000,0.00000,0.00000,0.04001,
0.00000,0.00000,0.00000,0.00000,0.03999,
0.00000,0.00000,0.00000,0.00000,0.04001,
0.00000,0.00000,0.00000,0.00000,0.03999,
0.00000,0.00000,0.00000,0.00000,0.03999,
0.00000,0.00000,0.00000,0.00000,0.04001,
0.00000,0.00000,0.00000,0.00000,0.03999,
0.00000,0.00000,0.00000,0.00000,0.04001,
0.00000,0.00000,0.00000,0.00000,0.00000,0.03999,
0.00000,0.00000,0.00000,0.00000,0.03999,
0.00000,0.00000,0.00000,0.00000,0.04001,
0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000 \|

Non-compressed:

MariaDB [test]> select decode_histogram(hist_type,histogram) from save12;
0.00000,0.00000,0.00000,0.04056,
0.00000,0.00000,0.00000,0.00000,0.03880,
0.00000,0.00000,0.00000,0.00000,0.00000,0.02292,
0.00000,0.00000,0.00000,0.00000,0.05292,
0.00000,0.00000,0.00000,0.00000,0.08112,
0.00000,0.00000,0.00000,0.00000,0.01411,
0.00000,0.00000,0.00000,0.00000,0.05644,
0.00000,0.00000,0.00000,0.00000,0.04585,
0.00000,0.00000,0.00000,0.00000,0.03703,
0.00000,0.00000,0.00000,0.00000,0.02998,
0.00000,0.00000,0.00000,0.00000,0.00000, 0.02293,
0.00000,0.00000,0.00000,0.00000,0.07936,
0.00000,0.00000,0.00000,0.00000,0.01587,
0.00000,0.00000,0.00000,0.00000,0.05290,
0.00000,0.00000,0.00000,0.00000,0.06526,0.00000,0.00000,0.00000,0.00000,0.02293,0.00000,0.00000,0.00000,0.00000,0.02116,0.00000,0.00000,0.00000,0.00000,0.07407,0.00000,0.00000,0.00000,0.00000,0.04233,0.00000,0.00000,0.00000,0.00000,0.00000,0.05115,0.00000,0.00000,0.00000,0.00000,0.06525,0.00000,0.00000,0.00000,0.00000,0.02293,0.00000,0.00000,0.00000,0.00000,0.01587,0.00000,0.00000,0.00000,0.00706,0.00000,0.00000,0.00000,0.02116,0.00000,0.00000,0.00000 \|

Sergei Petrunia added a comment - 2022-01-14 14:59 I compare the histograms that were produced when the testcase used COMPRESSED column the testcase didn't use COMPRESSED column. The JSON_HB histograms are the same. The DOUBLE_PREC_HB histograms are different. Compressed: (I've done some manual reformatting to make the differences more apparent) MariaDB [test]> select decode_histogram(hist_type,histogram) from save10; 0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.00000,0.04001, 0.00000,0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.04001, 0.00000,0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.04001, 0.00000,0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.04001, 0.00000,0.00000,0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.03999, 0.00000,0.00000,0.00000,0.00000,0.04001, 0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000,0.03999,0.00000,0.00000,0.00000,0.04001,0.00000,0.00000,0.00000 | Non-compressed: MariaDB [test]> select decode_histogram(hist_type,histogram) from save12; 0.00000,0.00000,0.00000,0.04056, 0.00000,0.00000,0.00000,0.00000,0.03880, 0.00000,0.00000,0.00000,0.00000,0.00000,0.02292, 0.00000,0.00000,0.00000,0.00000,0.05292, 0.00000,0.00000,0.00000,0.00000,0.08112, 0.00000,0.00000,0.00000,0.00000,0.01411, 0.00000,0.00000,0.00000,0.00000,0.05644, 0.00000,0.00000,0.00000,0.00000,0.04585, 0.00000,0.00000,0.00000,0.00000,0.03703, 0.00000,0.00000,0.00000,0.00000,0.02998, 0.00000,0.00000,0.00000,0.00000,0.00000, 0.02293, 0.00000,0.00000,0.00000,0.00000,0.07936, 0.00000,0.00000,0.00000,0.00000,0.01587, 0.00000,0.00000,0.00000,0.00000,0.05290, 0.00000,0.00000,0.00000,0.00000,0.06526,0.00000,0.00000,0.00000,0.00000,0.02293,0.00000,0.00000,0.00000,0.00000,0.02116,0.00000,0.00000,0.00000,0.00000,0.07407,0.00000,0.00000,0.00000,0.00000,0.04233,0.00000,0.00000,0.00000,0.00000,0.00000,0.05115,0.00000,0.00000,0.00000,0.00000,0.06525,0.00000,0.00000,0.00000,0.00000,0.02293,0.00000,0.00000,0.00000,0.00000,0.01587,0.00000,0.00000,0.00000,0.00706,0.00000,0.00000,0.00000,0.02116,0.00000,0.00000,0.00000 |

Sergei Petrunia added a comment - 2022-01-14 15:19

The condition is impossible to satisfy but the optimizer doesn't recognize that.

The ranges it is estimating and the estimates:

DOUBLE_PREC_HB, non-COMPRESSED:

            "rows_estimation": [

                "selectivity_for_indexes": [],

                "selectivity_for_columns": [

                    "column_name": "a",

                    "ranges": ["c <= a"],

                    "selectivity_from_histogram": 0.9296875

],

                "cond_selectivity": 0.9296875

},

DOUBLE_PREC_HB, COMPRESSED:

            "rows_estimation": [

                "selectivity_for_indexes": [],

                "selectivity_for_columns": [

                    "column_name": "a",

                    "ranges": ["c <= a"],

                    "selectivity_from_histogram": 0.038462

],

                "cond_selectivity": 0.038462

},

JSON:

                "selectivity_for_columns": [

                    "column_name": "a",

                    "ranges": ["c <= a"],

                    "selectivity_from_histogram": 0.93

],

                "cond_selectivity": 0.93

},

what would be the right selectivity for t.a >= 'c'? The two values that are less are aaaaa and bbbbb.
sel = (26 - 2)/26.= 0.92, which agrees with the JSON_HB selectivity.

Sergei Petrunia added a comment - 2022-01-14 15:19 The condition is impossible to satisfy but the optimizer doesn't recognize that. The ranges it is estimating and the estimates: DOUBLE_PREC_HB, non-COMPRESSED: "rows_estimation": [ { "selectivity_for_indexes": [], "selectivity_for_columns": [ { "column_name": "a", "ranges": ["c <= a"], "selectivity_from_histogram": 0.9296875 } ], "cond_selectivity": 0.9296875 }, DOUBLE_PREC_HB, COMPRESSED: "rows_estimation": [ { "selectivity_for_indexes": [], "selectivity_for_columns": [ { "column_name": "a", "ranges": ["c <= a"], "selectivity_from_histogram": 0.038462 } ], "cond_selectivity": 0.038462 }, JSON: "selectivity_for_columns": [ { "column_name": "a", "ranges": ["c <= a"], "selectivity_from_histogram": 0.93 } ], "cond_selectivity": 0.93 }, what would be the right selectivity for t.a >= 'c' ? The two values that are less are aaaaa and bbbbb. sel = (26 - 2)/26.= 0.92, which agrees with the JSON_HB selectivity.

Sergei Petrunia made changes - 2022-01-14 15:21

Summary

Less precise filtered estimation with JSON histogram (#10) with COMPRESSED column and impossible condition

DOUBLE_PREC_HB histogram produces wrong estimates for COMPRESSED columns

Sergei Petrunia made changes - 2022-01-14 15:21

Fix Version/s		10.2 [ 14601 ]
Fix Version/s		10.3 [ 22126 ]
Fix Version/s		10.4 [ 22408 ]
Fix Version/s		10.5 [ 23123 ]
Fix Version/s		10.6 [ 24028 ]
Fix Version/s		10.7 [ 24805 ]
Fix Version/s	10.8 [ 26121 ]

Sergei Petrunia made changes - 2022-01-14 15:21

Affects Version/s		10.2 [ 14601 ]
Affects Version/s		10.3 [ 22126 ]
Affects Version/s		10.4 [ 22408 ]
Affects Version/s		10.5 [ 23123 ]
Affects Version/s		10.6 [ 24028 ]
Affects Version/s		10.7 [ 24805 ]
Affects Version/s	N/A [ 14700 ]

Ralf Gebhardt made changes - 2022-08-04 08:41

Fix Version/s

10.2 [ 14601 ]

Julien Fritsch made changes - 2023-03-06 08:07

Fix Version/s

10.7 [ 24805 ]

Julien Fritsch made changes - 2023-04-27 13:58

Fix Version/s

10.3 [ 22126 ]

Julien Fritsch added a comment - 2023-12-05 09:26

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Julien Fritsch added a comment - 2023-12-05 09:26 Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

JiraAutomate made changes - 2023-12-05 11:48

Status

In Progress [ 3 ]

Stalled [ 10000 ]

JiraAutomate added a comment - 2023-12-05 11:48

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

JiraAutomate added a comment - 2023-12-05 11:48 Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Julien Fritsch made changes - 2024-09-10 15:08

Fix Version/s

10.4 [ 22408 ]

People

Assignee:: Sergei Petrunia

Reporter:: Elena Stepanova

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2022-01-13 20:44

Updated:: 2024-09-10 15:08

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration