Details
-
Sub-Task
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Won't Do
-
None
-
None
-
2021-7, 2021-8, 2021-9, 2021-10, 2021-11, 2021-12
Description
The goal is to evaluate a proposal to correct the performance problem.
The method:
- create a test version of latin1_nopad_bin collation function which would use MURMUR.
- measure the difference in performance (the expectation is 2x vs current 6x).
- compare behavior of aggregate queries which use so updated collation against:
a) original CS method - present in 1.2
b) current - as in 5.5.2
The last should be done using both flights (30 million rows) and Quinnstreet (1 billion rows).
Once we have the facts, we will be making a decision on what to do.
I am curious what exactly are we talking about: collation-aware comparators or hashers?
If we are talking comparators MM3 is outside the scope. If we are talking hashers a different story. There are multiple SQL operators that uses hashing, e.g. JOIN, GROUP BY, DISTINCT. JFYI GB and JOIN are using MM3 to hash weights arrays that represents char/varchar/text preserving order relation. Here is where the hashing takes place. JOIN still uses MDB hashing internally. DISTINCT is using MDB hashing also but I am aware that it will be replaced with GB code.