[MCOL-4717] Conduct experiments measuring the overall impact of using MURMUR inside collation comparators - Jira

Details

Type: Sub-Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Won't Do
Affects Version/s: None
Fix Version/s: 23.10.0
Component/s: PrimProc
Labels:
None

Sprint:
2021-7, 2021-8, 2021-9, 2021-10, 2021-11, 2021-12

Description

The goal is to evaluate a proposal to correct the performance problem.

The method:

create a test version of latin1_nopad_bin collation function which would use MURMUR.
measure the difference in performance (the expectation is 2x vs current 6x).
compare behavior of aggregate queries which use so updated collation against:
a) original CS method - present in 1.2
b) current - as in 5.5.2

The last should be done using both flights (30 million rows) and Quinnstreet (1 billion rows).

Once we have the facts, we will be making a decision on what to do.

Attachments

Activity

Roman added a comment - 2023-01-29 10:07

I am curious what exactly are we talking about: collation-aware comparators or hashers?
If we are talking comparators MM3 is outside the scope. If we are talking hashers a different story. There are multiple SQL operators that uses hashing, e.g. JOIN, GROUP BY, DISTINCT. JFYI GB and JOIN are using MM3 to hash weights arrays that represents char/varchar/text preserving order relation. Here is where the hashing takes place. JOIN still uses MDB hashing internally. DISTINCT is using MDB hashing also but I am aware that it will be replaced with GB code.

Roman added a comment - 2023-01-29 10:07 I am curious what exactly are we talking about: collation-aware comparators or hashers? If we are talking comparators MM3 is outside the scope. If we are talking hashers a different story. There are multiple SQL operators that uses hashing, e.g. JOIN, GROUP BY, DISTINCT. JFYI GB and JOIN are using MM3 to hash weights arrays that represents char/varchar/text preserving order relation. Here is where the hashing takes place. JOIN still uses MDB hashing internally. DISTINCT is using MDB hashing also but I am aware that it will be replaced with GB code.

People

Assignee:: Leonid Fedorov

Reporter:: Gregory Dorman (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2021-05-11 15:39

Updated:: 2024-02-21 13:28

Resolved:: 2023-10-27 17:05

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB ColumnStore