Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Duplicate
-
N/A
-
None
Description
Upon request, I intentionally make tables bigger than they need to be for reproducing the problem, so that both the number of rows and the number of unique values exceed the histogram size.
--source include/have_sequence.inc
|
|
CREATE TABLE t1 (f TIME) ENGINE=MyISAM; |
INSERT INTO t1 SELECT IF(seq%2,'00:00:00',SEC_TO_TIME(seq+7200)) FROM seq_1_to_1000; |
|
SET histogram_type= JSON_HB; |
ANALYZE TABLE t1 PERSISTENT FOR ALL; |
ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00'; |
|
SET histogram_type= DOUBLE_PREC_HB; |
ANALYZE TABLE t1 PERSISTENT FOR ALL; |
ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00'; |
|
DROP TABLE t1; |
JSON_HB:
preview-10.8-MDEV-26519-json-histograms da3231a8 |
"table": {
|
"table_name": "t1",
|
"access_type": "ALL",
|
"r_loops": 1,
|
"rows": 1000,
|
"r_rows": 1000,
|
"r_table_time_ms": 0.037146099,
|
"r_other_time_ms": 0.077310006,
|
"filtered": 99.75002289,
|
"r_filtered": 50,
|
"attached_condition": "t1.f > '00:01:00'"
|
}
|
DOUBLE_PREC_HB:
"table": {
|
"table_name": "t1",
|
"access_type": "ALL",
|
"r_loops": 1,
|
"rows": 1000,
|
"r_rows": 1000,
|
"r_table_time_ms": 0.033997955,
|
"r_other_time_ms": 0.07762081,
|
"filtered": 50,
|
"r_filtered": 50,
|
"attached_condition": "t1.f > '00:01:00'"
|
}
|
Attachments
Issue Links
- is caused by
-
MDEV-26519 JSON Histograms: improve histogram collection
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue is caused by |
Description |
_Upon request, I intentionally make tables bigger than they need to be for reproducing the problem, so that both the number of rows and the number of unique values exceed the histogram size._
{code:sql} --source include/have_sequence.inc CREATE TABLE t1 (f TIME) ENGINE=MyISAM; INSERT INTO t1 SELECT IF(seq%2,'00:00:00',SEC_TO_TIME(seq+7200)) FROM seq_1_to_1000; ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00'; SET histogram_type= JSON_HB; ANALYZE TABLE t1 PERSISTENT FOR ALL; ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00'; SET histogram_type= DOUBLE_PREC_HB; ANALYZE TABLE t1 PERSISTENT FOR ALL; ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00'; DROP TABLE t1; {code} JSON_HB: {code:json|title=preview-10.8- "table": { "table_name": "t1", "access_type": "ALL", "r_loops": 1, "rows": 1000, "r_rows": 1000, "r_table_time_ms": 0.037146099, "r_other_time_ms": 0.077310006, "filtered": 99.75002289, "r_filtered": 50, "attached_condition": "t1.f > '00:01:00'" } {code} {code:json} "table": { "table_name": "t1", "access_type": "ALL", "r_loops": 1, "rows": 1000, "r_rows": 1000, "r_table_time_ms": 0.033997955, "r_other_time_ms": 0.07762081, "filtered": 50, "r_filtered": 50, "attached_condition": "t1.f > '00:01:00'" } {code} |
_Upon request, I intentionally make tables bigger than they need to be for reproducing the problem, so that both the number of rows and the number of unique values exceed the histogram size._
{code:sql} --source include/have_sequence.inc CREATE TABLE t1 (f TIME) ENGINE=MyISAM; INSERT INTO t1 SELECT IF(seq%2,'00:00:00',SEC_TO_TIME(seq+7200)) FROM seq_1_to_1000; ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00'; SET histogram_type= JSON_HB; ANALYZE TABLE t1 PERSISTENT FOR ALL; ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00'; SET histogram_type= DOUBLE_PREC_HB; ANALYZE TABLE t1 PERSISTENT FOR ALL; ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00'; DROP TABLE t1; {code} JSON_HB: {code:json|title=preview-10.8- "table": { "table_name": "t1", "access_type": "ALL", "r_loops": 1, "rows": 1000, "r_rows": 1000, "r_table_time_ms": 0.037146099, "r_other_time_ms": 0.077310006, "filtered": 99.75002289, "r_filtered": 50, "attached_condition": "t1.f > '00:01:00'" } {code} DOUBLE_PREC_HB: {code:json} "table": { "table_name": "t1", "access_type": "ALL", "r_loops": 1, "rows": 1000, "r_rows": 1000, "r_table_time_ms": 0.033997955, "r_other_time_ms": 0.07762081, "filtered": 50, "r_filtered": 50, "attached_condition": "t1.f > '00:01:00'" } {code} |
Description |
_Upon request, I intentionally make tables bigger than they need to be for reproducing the problem, so that both the number of rows and the number of unique values exceed the histogram size._
{code:sql} --source include/have_sequence.inc CREATE TABLE t1 (f TIME) ENGINE=MyISAM; INSERT INTO t1 SELECT IF(seq%2,'00:00:00',SEC_TO_TIME(seq+7200)) FROM seq_1_to_1000; ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00'; SET histogram_type= JSON_HB; ANALYZE TABLE t1 PERSISTENT FOR ALL; ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00'; SET histogram_type= DOUBLE_PREC_HB; ANALYZE TABLE t1 PERSISTENT FOR ALL; ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00'; DROP TABLE t1; {code} JSON_HB: {code:json|title=preview-10.8- "table": { "table_name": "t1", "access_type": "ALL", "r_loops": 1, "rows": 1000, "r_rows": 1000, "r_table_time_ms": 0.037146099, "r_other_time_ms": 0.077310006, "filtered": 99.75002289, "r_filtered": 50, "attached_condition": "t1.f > '00:01:00'" } {code} DOUBLE_PREC_HB: {code:json} "table": { "table_name": "t1", "access_type": "ALL", "r_loops": 1, "rows": 1000, "r_rows": 1000, "r_table_time_ms": 0.033997955, "r_other_time_ms": 0.07762081, "filtered": 50, "r_filtered": 50, "attached_condition": "t1.f > '00:01:00'" } {code} |
_Upon request, I intentionally make tables bigger than they need to be for reproducing the problem, so that both the number of rows and the number of unique values exceed the histogram size._
{code:sql} --source include/have_sequence.inc CREATE TABLE t1 (f TIME) ENGINE=MyISAM; INSERT INTO t1 SELECT IF(seq%2,'00:00:00',SEC_TO_TIME(seq+7200)) FROM seq_1_to_1000; SET histogram_type= JSON_HB; ANALYZE TABLE t1 PERSISTENT FOR ALL; ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00'; SET histogram_type= DOUBLE_PREC_HB; ANALYZE TABLE t1 PERSISTENT FOR ALL; ANALYZE FORMAT=JSON SELECT * FROM t1 WHERE f > '00:01:00'; DROP TABLE t1; {code} JSON_HB: {code:json|title=preview-10.8- "table": { "table_name": "t1", "access_type": "ALL", "r_loops": 1, "rows": 1000, "r_rows": 1000, "r_table_time_ms": 0.037146099, "r_other_time_ms": 0.077310006, "filtered": 99.75002289, "r_filtered": 50, "attached_condition": "t1.f > '00:01:00'" } {code} DOUBLE_PREC_HB: {code:json} "table": { "table_name": "t1", "access_type": "ALL", "r_loops": 1, "rows": 1000, "r_rows": 1000, "r_table_time_ms": 0.033997955, "r_other_time_ms": 0.07762081, "filtered": 50, "r_filtered": 50, "attached_condition": "t1.f > '00:01:00'" } {code} |
Fix Version/s | 10.8.1 [ 26815 ] | |
Fix Version/s | 10.8 [ 26121 ] | |
Resolution | Duplicate [ 3 ] | |
Status | Open [ 1 ] | Closed [ 6 ] |
"histogram_hb": [
{
"start": "00:00:00",
"size": 0.5,
"ndv": 1
},
{
"start": "02:00:02",
"size": 0.004,
"ndv": 4
},
ANALYZE SELECT * FROM t1 WHERE f > '00:01:00';
filtered=99.7
r_filtered=50
in Histogram_json_hb::range_selectivity():
exclusive_endp=true
equal=false
idx=0
We hit the first bucket...
and then we compute sel= position_in_interval()= 0.0049
min= 0.0049 * (bucket_size=0.5) = 0.002
max=1.0
which gives filtered=99%.
Things go wrong when we consider the singleton bucket to be a general bucket (which spans from 00:00 to 02:00) and call position_in_interval(00:01). This returns some value (which doesn't look perfect btw), but the issue here is that for the singleton bucket we can see that min_endp > bucket_start which means sel=1.0.