[MDEV-27495] Less precise filtered estimation with JSON histogram (#9) or wrong value of r_filtered Created: 2022-01-13  Updated: 2022-01-14  Resolved: 2022-01-14

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: N/A
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Not a Bug Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-26519 JSON Histograms: improve histogram co... Closed

 Description   

In the test case below, the filtered value with JSON histogram is far away from r_filtered, while with DOUBLE_PREC it's pretty close.
However, I'm not sure that r_filtered = 33% is even correct.
The table contains 500 rows, and the only condition in the query returns 18 rows, so I don't see where 33% comes from. So, maybe the JSON estimation is actually good here, while both r_filtered and DOUBLE_PREC estimation are off. In this case please re-categorize as needed.

DROP TABLE IF EXISTS t;
CREATE TABLE t (a int) ENGINE=MyISAM;
 
INSERT INTO t VALUES
  (NULL),(335086),(7),(1208746),(NULL),(596640),(NULL),(6),(63898),(5),
  (632029),(6),(995492),(NULL),(2),(NULL),(NULL),(433258),(3),(NULL),(6),
  (NULL),(6),(3),(NULL),(NULL),(8),(5),(0),(2030240),(534970),(1775698),
  (1564541),(9),(337969),(79757),(NULL),(6),(8),(9),(1887175),(7),(81723),
  (NULL),(NULL),(1409876),(777585),(1194263),(NULL),(6),(0),(7),(710345),
  (NULL),(NULL),(3),(1127285),(NULL),(NULL),(7),(912065),(0),(NULL),
  (1377174),(NULL),(NULL),(7),(1689387),(6),(NULL),(NULL),(NULL),(NULL),
  (8),(1087570),(6),(327746),(4),(NULL),(1163002),(NULL),(4),(0),(1359282),
  (2),(1497104),(NULL),(1838481),(3),(4),(NULL),(6),(8),(8),(1630339),
  (NULL),(NULL),(NULL),(4),(4),(6),(3),(1551892),(1174209),(3),(6),(NULL),
  (2),(NULL),(1058603),(0),(NULL),(2),(NULL),(0),(NULL),(1968636),(NULL),
  (NULL),(NULL),(NULL),(2),(532677),(3),(904069),(NULL),(2),(NULL),
  (1511260),(NULL),(NULL),(2139488),(3),(2),(1),(NULL),(NULL),(NULL),
  (2118975),(2),(1),(1),(6),(4915),(1),(1168900),(217252),(1178665),(4),
  (NULL),(5),(8),(1430716),(1198326),(545325),(6),(7),(952238),(4),(NULL),
  (413073),(158925),(2100036),(NULL),(NULL),(NULL),(3),(1220084),(8),
  (NULL),(1508442),(1639973),(NULL),(8),(1690108),(1135084),(NULL),(6),
  (NULL),(NULL),(1),(4),(1965883),(NULL),(4),(5),(107545),(NULL),(NULL),
  (9),(1667564),(7),(1),(8),(667222),(NULL),(NULL),(1598554),(1),(8),(6),
  (6),(NULL),(0),(2),(NULL),(2),(1710817),(NULL),(411435),(1),(NULL),(NULL),
  (NULL),(408158),(5),(NULL),(3),(NULL),(8),(5),(471728),(NULL),(NULL),
  (802292),(NULL),(NULL),(7),(1561264),(7),(NULL),(4),(NULL),(NULL),
  (241762),(NULL),(816972),(4),(NULL),(NULL),(1751908),(295502),(NULL),
  (NULL),(NULL),(NULL),(1090191),(NULL),(2),(838205),(NULL),(9),(NULL),
  (1857946),(NULL),(1776746),(291701),(311886),(205783),(1337655),(560529),
  (83427),(953942),(0),(1951138),(1474822),(1637220),(3),(9),(0),(323748),
  (4),(6),(10355),(3),(735904),(6),(3),(799343),(NULL),(NULL),(7),(1),
  (475464),(1094517),(8),(NULL),(NULL),(5),(1479999),(NULL),(NULL),(42533),
  (3),(2022048),(3),(NULL),(7),(NULL),(NULL),(2030764),(2),(1335886),(8),
  (2),(8),(1584726),(0),(1467154),(5),(5),(8),(NULL),(3),(9),(1179517),
  (NULL),(NULL),(866779),(NULL),(NULL),(2),(6),(NULL),(NULL),(NULL),
  (571277),(NULL),(NULL),(582025),(1679163),(6),(9),(3),(728826),(695927),
  (1321533),(4),(4),(NULL),(3),(3),(1265631),(1077477),(8),(1013187),
  (NULL),(0),(569377),(1410990),(996999),(240583),(474808),(0),(NULL),
  (NULL),(NULL),(2),(6),(686817),(844104),(67109),(NULL),(9),(NULL),(NULL),
  (NULL),(9),(NULL),(NULL),(3),(144835),(NULL),(6),(NULL),(NULL),(0),
  (NULL),(NULL),(330695),(1056571),(3),(NULL),(884539),(NULL),(5),(NULL),
  (7),(6),(NULL),(3),(NULL),(NULL),(5),(324731),(NULL),(1852834),(NULL),
  (1159922),(9),(NULL),(1),(132907),(558105),(7),(1793262),(NULL),(NULL),
  (6),(509542),(3),(6),(7),(7),(4),(6),(1),(NULL),(5),(1402733),(109052),
  (8),(NULL),(695992),(1210909),(570294),(6),(8),(NULL),(2),(0),(7),
  (724042),(818151),(NULL),(NULL),(1571750),(3),(350355),(NULL),(764215),
  (2114257),(9),(NULL),(1256194),(NULL),(NULL),(363987),(9),(2),(7),(NULL),
  (1826095),(5),(1600127),(1829437),(NULL),(2),(1417085),(6),(1468858),(3),
  (8),(108724),(6),(8),(1801454),(2093482),(902234),(104530),(2145714),
  (NULL),(8),(9),(8),(7),(NULL),(6),(NULL),(1277100),(1),(4),(NULL),(NULL),
  (4),(5),(NULL),(NULL),(NULL),(96272),(NULL),(NULL),(4),(NULL),(810025),
  (NULL),(1056244),(8),(982319),(3);
 
set histogram_type= DOUBLE_PREC_HB; analyze table t persistent for all;
ANALYZE FORMAT=JSON SELECT a FROM t WHERE a = 7 ORDER BY a LIMIT 1;
 
set histogram_type= JSON_HB; analyze table t persistent for all;
ANALYZE FORMAT=JSON SELECT a FROM t WHERE a = 7 ORDER BY a LIMIT 1;
 
# Cleanup
DROP TABLE t;

DOUBLE_PREC_HB preview-10.8-MDEV-26519-json-histograms 9a86900b

       "table": {
          "table_name": "t",
          "access_type": "ALL",
          "r_loops": 1,
          "rows": 500,
          "r_rows": 3,
          "r_table_time_ms": 0.004537902,
          "r_other_time_ms": 0.004708198,
          "filtered": 36.87187576,
          "r_filtered": 33.33333333,
          "attached_condition": "t.a = 7"
        }

JSON_HB

        "table": {
          "table_name": "t",
          "access_type": "ALL",
          "r_loops": 1,
          "rows": 500,
          "r_rows": 3,
          "r_table_time_ms": 0.002915076,
          "r_other_time_ms": 0.003756541,
          "filtered": 3.599999905,
          "r_filtered": 33.33333333,
          "attached_condition": "t.a = 7"
        }

Also reproducible with InnoDB and Aria.



 Comments   
Comment by Sergei Petrunia [ 2022-01-14 ]

The table contains 500 rows, BUT the query has

 

"WHERE a=7 ORDER BY a LIMIT 1"

The "WHERE a=7" part makes "ORDER BY a" no-op, so it becomes

"WHERE a=7 LIMIT 1"

That is, the execution will stop as soon as one matching row is found.
In this case, r_rows=3 so it has found a match after reading 3 rows.

This is confirmed by r_filtered=33.33333333 - one row out of 3 has matched the condition.

Comment by Sergei Petrunia [ 2022-01-14 ]

Now, running the query without the LIMIT 1:

set histogram_type= JSON_HB; analyze table t persistent for all;
ANALYZE FORMAT=JSON SELECT a FROM t WHERE a = 7;

  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 2.772328112,
    "nested_loop": [
      {
        "table": {
          "table_name": "t",
          "access_type": "ALL",
          "r_loops": 1,
          "rows": 500,
          "r_rows": 500,
          "r_table_time_ms": 1.015146412,
          "r_other_time_ms": 1.722158846,
          "filtered": 3.599999905,
          "r_filtered": 3.6,
          "attached_condition": "t.a = 7"
        }
      }
    ]
  }
} |

set histogram_type= DOUBLE_PREC_HB; analyze table t persistent for all;
ANALYZE FORMAT=JSON SELECT a FROM t WHERE a = 7 ORDER BY a;

  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 2.60342905,
    "nested_loop": [
      {
        "table": {
          "table_name": "t",
          "access_type": "ALL",
          "r_loops": 1,
          "rows": 500,
          "r_rows": 500,
          "r_table_time_ms": 0.949193061,
          "r_other_time_ms": 1.617091957,
          "filtered": 36.87187576,
          "r_filtered": 3.6,
          "attached_condition": "t.a = 7"
        }
      }
    ]
  }

Comment by Sergei Petrunia [ 2022-01-14 ]

JSON_HB provides a better estimate than DOUBLE_PREC_HB.

Comment by Elena Stepanova [ 2022-01-14 ]

psergei,

Why do we only compare estimation of JSON vs DOUBLE_PREC without LIMIT?

Back to the original case with LIMIT, if as you say r_filtered=33% is correct and refers to 1 row out of 3, then filtered estimation for DOUBLE_PREC is good, but filtered=3% with JSON_HB is completely off and means nothing, what is 3% out of 3 rows?

Upd:

Okay, I guess filtered rows estimation doesn't take into account LIMIT, and it just happens to coincide with r_filtered with LIMIT in case of DOUBLE_PREC.

Comment by Sergei Petrunia [ 2022-01-14 ]

elenst, that's correct.

Generated at Thu Feb 08 09:53:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.