[MCOL-5191]  Join Optimizer: histogram statistics/ needed for cost-based optimization Created: 2022-08-15  Updated: 2023-04-06  Resolved: 2022-12-05

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr, PrimProc
Affects Version/s: None
Fix Version/s: 22.08.8

Type: Task Priority: Major
Reporter: Roman Assignee: Denis Khalikov
Resolution: Fixed Votes: 0
Labels: None

Attachments: Microsoft Word Cost based join optimizer.docx    
Issue Links:
Relates
relates to MCOL-4713 Optimizer statistics Closed
Sprint: 2021-17, 2022-22
Assigned for Review: Roman Roman
Assigned for Testing: Daniel Lee Daniel Lee (Inactive)

 Description   

The previous step delivers a statistics that allows to detect uniqueness of a column to be able to cut lops in a JOIN graph the most effective way.
The current step will deliver a histogram collected for columns of tables. The strained goal is to collect a Number of Dictinct Values statistics also.
In the scope of this project the mentioned statistics collection is a manual operation and run by ANALYZE TABLE command as previously.



 Comments   
Comment by Daniel Lee (Inactive) [ 2022-12-05 ]

Build verified: 23.02

engine: a1d89d8f311d8187d3357536a64d77ef6f9c2b8e
server: bf7f6987c8fb7ceda9ae048ada129d11798d4392
buildNo: 6151

Retested the test case in MCOL-1205

MariaDB [tpch1m]> select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'AMERICA' and o_orderdate >= '1993-01-01' and o_orderdate < date_add( '1993-01-01' , interval 1 year) group by n_name order by revenue desc;
+-----------+-------------+
| n_name    | revenue     |
+-----------+-------------+
| PERU      | 527161.1575 |
| ARGENTINA |  34521.3330 |
+-----------+-------------+
2 rows in set (0.290 sec)
 
MariaDB [tpch1m]> analyze table customer, orders, lineitem,supplier, nation, region
    -> ;
+-----------------+---------+----------+----------+
| Table           | Op      | Msg_type | Msg_text |
+-----------------+---------+----------+----------+
| tpch1m.customer | analyze | status   | OK       |
| tpch1m.orders   | analyze | status   | OK       |
| tpch1m.lineitem | analyze | status   | OK       |
| tpch1m.supplier | analyze | status   | OK       |
| tpch1m.nation   | analyze | status   | OK       |
| tpch1m.region   | analyze | status   | OK       |
+-----------------+---------+----------+----------+
6 rows in set (0.035 sec)
 
MariaDB [tpch1m]> select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'AMERICA' and o_orderdate >= '1993-01-01' and o_orderdate < date_add( '1993-01-01' , interval 1 year) group by n_name order by revenue desc;
+-----------+-------------+
| n_name    | revenue     |
+-----------+-------------+
| PERU      | 527161.1575 |
| ARGENTINA |  34521.3330 |
+-----------+-------------+
2 rows in set (0.036 sec)

Generated at Thu Feb 08 02:56:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.