[MDEV-26975] Join with ORed equi-join conditions is very slow Created: 2021-11-04  Updated: 2021-11-09

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: None

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

We've encountered this when working on a customer case:

SELECT 
  ...
FROM 
  t1,t2,t3,t4, ...
WHERE
  (t1.key1 = t2.col1 OR t1.key2 = t2.key1) AND
  (t3.key1 = t4.col1 OR t3.key2 = t4.key1) AND
  <join conditions between other tables>

(The tables and columns are renamed. keyX is an indexed column, colY is a non-indexed column).
Consider table t1. The join condition has form:

  t1.key1 = t2.col1 OR 
  t1.key2 = t2.key1

Each of the OR-parts can be used to construct ref access. However, when they are OR-ed, no efficient access method can be used.
(The workaround we could use was to rewrite the query as a four-way UNION).

This MDEV is about constructing an efficient access method (Work name "ref_or_ref", mimicking "ref_or_null").

Ideas about implementation

Execution

The execution part is fairly easy. We need to run the ref access code for
multiple indexes and lookup keys, and make sure we don't return the row twice
if we get it from both indexes.

Optimization

This one is harder.
The approaches that were discussed are:

1. Hook into range optimizer
2. Hook into ref optimizer.

For #2, look at merge_key_fields(). Here, the optimizer performs an OR
operation between KEY_FIELD objects (representing t.keyXpartY= ...).
If current code is invoked for

merge_key_fields(KEY_FIELD("t.colX=...), KEY_FIELD("t.colY=..."))

it will remove both.
If we delayed this removal, perhaps we could save the option to do ref-or-ref
access.

(Constructing all possible ref_or_ref accesses looks dangerous as that might
cause a combinatorial of the number of possible options)


Generated at Thu Feb 08 09:49:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.