[MCOL-4713] Optimizer statistics - Jira

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.1.1
Component/s: None
Labels:
- tech_debt

Sprint:
2021-7, 2021-8, 2021-9

Description

Optimizer statistics

Most engines feeds their query optimizers with different types of statistics, e.g histograms of most common values, fractions of NULL values, Number of Distinct Values, relation cardinality. The statistics allows to make educated guesses regarding cardinality of intermediate and final result of a query, e.g. optimizer finds the optimal join order using this statistics. MCS now has a rudimentary statistics that consist of a relation cardinality only.

The goal

We need to design and build facilities that allows MCS to make educated guesses about cardinality of intermediate and final result of a query and use this knowledge to produce an optimal plan.

What is statistics

The simplest version of statistics must have the list of attributes listed earlier. The improved version should have an extensible format to add attributes(e.g. correlated or dependant histograms on functionaly dependant columns) w/o lots of changes.

What triggers stats collection

There must be a user knob in the form of ANALYZE SQL statement. Bulk insert operation inserting more then N values must also trigger ANALYZE to regresh the stats.

Consumers

ExeMgr or whatever the optimizer goes into is the most important consumer. The fact that there might be more then one EM in the cluster must be taken into account.

Where to put stats and how to save it b/w restarts

EMs in the cluster must have direct access to the stats so stats in-memory representation must reside in the RAM of the primary EM. Other EMs must have a synchronized version of the stats.

Producers

EM must initiate statistics collection. The statistics collection process must not differ much from SELECT * FROM query. The sampling method isn't yet choosen.

Please see the design doc for more details.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

screen.jpg
314 kB
2021-06-25 12:56
Screenshot from 2021-05-25 00-59-29.png
114 kB
2021-05-25 12:43

Issue Links

blocks

MCOL-1205 support queries with circular INNER joins

Closed

MCOL-4530 TPC-H query 19: - semi joins / 'lineitem' and 'part' are not joined.

Closed

includes

MCOL-125 SHOW TABLE STATUS returns invalid data for CS tables

Closed

relates to

MCOL-5191 Join Optimizer: histogram statistics/ needed for cost-based optimization

Closed

Activity

Ascending order - Click to sort in descending order

Denis Khalikov (Inactive) added a comment - 2021-05-25 12:43 - edited

I would like to logically split this task into 3 sub tasks:
1. MCS Plugin part.
1.1. Implement the "Analyze Table execplan" in the MCS plugin part. (Handle analyze table command).
2. ExeManager.
2.1. Implement "Analyze Table Job Step", "Analyze Table Command".
2.2. Parse "Analyze Table execplan", lower it to "Analyze Column Job Steps" and initialize TupleBPS with it.
2.3. Transfer "Analyze Column Job Step" to Primitive Server as "Analyze Column Command ".
2.4. Initialize Batch primitive processor.
2.5. Run Batch Primitive processor.
3. Batch Primitive processor.
3.1. Implement statistic collection ("Analyze Column Command" execution) by Batch Primitive processor.
3.2. Collect statistics for column.
3.3. Send it back to ExeManager.

Denis Khalikov (Inactive) added a comment - 2021-05-25 12:43 - edited I would like to logically split this task into 3 sub tasks: 1. MCS Plugin part. 1.1. Implement the "Analyze Table execplan" in the MCS plugin part. (Handle analyze table command). 2. ExeManager. 2.1. Implement "Analyze Table Job Step", "Analyze Table Command". 2.2. Parse "Analyze Table execplan", lower it to "Analyze Column Job Steps" and initialize TupleBPS with it. 2.3. Transfer "Analyze Column Job Step" to Primitive Server as "Analyze Column Command ". 2.4. Initialize Batch primitive processor. 2.5. Run Batch Primitive processor. 3. Batch Primitive processor. 3.1. Implement statistic collection ("Analyze Column Command" execution) by Batch Primitive processor. 3.2. Collect statistics for column. 3.3. Send it back to ExeManager.

Daniel Lee (Inactive) added a comment - 2021-07-06 16:12

Build verified: 6.1.1 ( #2727)

MariaDB [tpch10]> analyze table customer, orders, lineitem, supplier, nation, region;
----------------------------------------+

Table

Msg_type

Msg_text

----------------------------------------+

tpch10.customer	analyze	status	OK
tpch10.orders	analyze	status	OK
tpch10.lineitem	analyze	status	OK
tpch10.supplier	analyze	status	OK
tpch10.nation	analyze	status	OK
tpch10.region	analyze	status	OK

----------------------------------------+
6 rows in set (0.295 sec)

The analyze command did help dbt3's query #5 to work. Please see ~~MCOL-1205~~.

Daniel Lee (Inactive) added a comment - 2021-07-06 16:12 Build verified: 6.1.1 ( #2727) MariaDB [tpch10] > analyze table customer, orders, lineitem, supplier, nation, region; ---------------- ------- -------- ---------+ Table Op Msg_type Msg_text ---------------- ------- -------- ---------+ tpch10.customer analyze status OK tpch10.orders analyze status OK tpch10.lineitem analyze status OK tpch10.supplier analyze status OK tpch10.nation analyze status OK tpch10.region analyze status OK ---------------- ------- -------- ---------+ 6 rows in set (0.295 sec) The analyze command did help dbt3's query #5 to work. Please see MCOL-1205 .

People

Assignee:: Daniel Lee (Inactive)

Reporter:: Gregory Dorman (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2021-05-08 13:18

Updated:: 2024-10-03 15:51

Resolved:: 2021-07-06 16:12

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB ColumnStore

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration