[MCOL-5983] Bit-wise operations slower than expected - Jira

XML

Word

Printable

Details

Type: Bug
Status: Open (View Workflow)
Priority: Critical
Resolution: Unresolved
Affects Version/s: 23.02.4
Fix Version/s: None
Component/s: None
Labels:
None

Epic Link:
Performance
Sprint:
2025-4

Description

On a single-node MariaDB Columnstore instance, a query combining a LIKE pattern with a bitwise AND filter on a column (sig) is significantly slower than expected, despite each condition being fast when executed individually.

Table

molecole with 167,772,160 rows
sig is a BIGINT (64-bit), used to store Bloom filters
smiles is a TEXT column storing molecular strings
The data is highly duplicated (10 unique records repeated)

Queries and Performance
Fast (~1 sec):

SELECT count(*) FROM molecole

WHERE smiles LIKE '%C(=O)C(O)=C(C4=O)O)O[%' AND sig = 18373419839315247039;

Slow (~5 sec):

SELECT count(*) FROM molecole

WHERE smiles LIKE '%C(=O)C(O)=C(C4=O)O)O[%' AND (sig & @mask) = @mask;

Fast (~1 sec):

SELECT count(*) FROM molecole

WHERE (sig & @mask) = @mask;

Key Observations

(sig & @mask) = @mask alone is fast and behaves as expected.
The combination with LIKE causes the bit operation to seemingly not pre-filter, leading to full evaluation of the LIKE for every record.
This negates the expected performance gain from short-circuiting evaluation or extent elimination.

This leads to the following deduction:

The LIKE pattern, when executed only on the subset of rows that pass the equality filter (sig = ...), takes ~1 second, meaning it is not inherently expensive on the matched rows.
The bitwise operation alone is also ~1 second, proving it is not computationally expensive.
Therefore, the combined query should not take more than ~2 seconds in a naive worst-case additive model, assuming the optimizer uses the bitwise condition to filter early and apply LIKE only to the remaining subset.
However, in practice it takes ~4.87 seconds, which suggests:
The optimizer is not applying the bitwise filter first, or
It is not using it to pre-filter the data before applying the expensive LIKE, thus, it ends up evaluating LIKE on a much larger dataset, defeating the filtering benefit of (sig & @mask) = @mask.

Conclusion
The optimizer appears to treat (sig & @mask) = @mask as a non-SARGable expression when combined with LIKE, preventing extent elimination and even early filtering. This seems to result in an unnecessary full scan + evaluation of the LIKE pattern.

Attaching the SQL code to replicate the environment and execution times

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

SQLCODE.sql
5 kB
2025-04-18 13:57

Issue Links

relates to

MCOL-4935 A prototype planner/optimizer using term rewriting

Open

Activity

People

Assignee:: Leonid Fedorov

Reporter:: Patrizio Tamorri (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2025-04-18 13:57

Updated:: 2025-11-05 12:10

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.