[MDEV-26901] Estimation for filtered rows less precise with JSON histogram comparing to DOUBLE_PREC (#4) Created: 2021-10-25  Updated: 2022-01-19  Resolved: 2022-01-11

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: N/A
Fix Version/s: 10.8.1

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-26519 JSON Histograms: improve histogram co... Closed

 Description   

create or replace table t (f int);
insert into t values
  (7),(5),(0),(5),(112),(9),(9),(7),(5),(9),
  (1),(7),(0),(6),(6),(2),(1),(6),(169),(7);
 
select f from t where f in (77, 1, 144, 73, 14, 12);
 
set histogram_type= JSON_HB;
analyze table t persistent for all;
 
explain format=json select f from t where f in (77, 1, 144, 73, 14, 12);
 
set histogram_type= DOUBLE_PREC_HB;
analyze table t persistent for all;
 
explain format=json select f from t where f in (77, 1, 144, 73, 14, 12);
 
# Cleanup
drop table t;

The result set is 2 rows (out of 20):

preview-10.7-MDEV-26519-json-histograms 7fa0542dc6

select f from t where f in (77, 1, 144, 73, 14, 12);
f
1
1

With DOUBLE_PREC_HB, filtered value is 10.98, which is quite good:

set histogram_type= DOUBLE_PREC_HB;
analyze table t persistent for all;
Table	Op	Msg_type	Msg_text
test.t	analyze	status	Engine-independent statistics collected
test.t	analyze	status	OK
explain format=json select f from t where f in (77, 1, 144, 73, 14, 12);
EXPLAIN
{
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "t",
      "access_type": "ALL",
      "rows": 20,
      "filtered": 10.98332787,
      "attached_condition": "t.f in (77,1,144,73,14,12)"
    }
  }
}

With JSON_HB, it's 35, which isn't so good:

set histogram_type= JSON_HB;
analyze table t persistent for all;
Table	Op	Msg_type	Msg_text
test.t	analyze	status	Engine-independent statistics collected
test.t	analyze	status	OK
explain format=json select f from t where f in (77, 1, 144, 73, 14, 12);
EXPLAIN
{
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "t",
      "access_type": "ALL",
      "rows": 20,
      "filtered": 35,
      "attached_condition": "t.f in (77,1,144,73,14,12)"
    }
  }
}



 Comments   
Comment by Elena Stepanova [ 2021-10-25 ]

Raised to critical on procedural reasons. Feel free to demote and remove from "must-do" scope.

Comment by Sergei Petrunia [ 2021-11-26 ]

Debugging...

analyze select f from t where f in (77, 1, 144, 73, 14, 12);

Here, only the value "1" matches a histogram bucket:

    {
      "start": "1",
      "size": 0.1,
      "ndv": 1
    },

The rest of the values do not match a bucket and use this estimate:

    /*
      The bucket has a single value and it doesn't match! Return a very
      small value.
    */
    sel= 1.0 / total_rows;

Comment by Sergei Petrunia [ 2021-11-26 ]

that is, it seems to be working as intended...

Comment by Sergei Petrunia [ 2022-01-11 ]

... but it doesn't mean we should keep it this way. Adjusted the code to avoid this problem

Generated at Thu Feb 08 09:48:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.