Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
When parallel_ces RBO rule is applied and a query contains EXISTS/NOT EXISTS with a self-join (same physical table, different aliases), results are inflated by exactly parallel_factor×.
Repro:
SET @@columnstore_unstable_optimizer=ON; |
SET @@columnstore_query_accel_parallel_factor=5; |
SELECT COUNT(*) FROM lineitem l1 |
WHERE EXISTS (SELECT * FROM lineitem l2 |
WHERE l2.l_orderkey = l1.l_orderkey AND l2.l_suppkey <> l1.l_suppkey) |
AND l1.l_receiptdate > l1.l_commitdate; |
-- Expected: 366216, Actual: 1831080 (5×)
|
- factor=1 → correct, factor=N → N× inflation
- Only ExistsFilter affected (IN/SelectFilter works correctly)
- Only self-joins reproduce (EXISTS on different table is fine)
- Impacts TPC-H Q21 (numwait = parallel_factor × correct value)
Regression: Correct in b5e11f2be (2025-11-06). Wrong values baked into .result by 4d13bd51b (2025-12-09).
Possible root cause: ColumnStatistics stores stats by {{
{schema, table}}} without alias. For self-joins both aliases share one entry with the alias of whichever table was registered first. MariaDB processes subquery tables first, so l2 registers before l1. makeUnionFromTable() gets a keyColumn with alias="l2" and creates range-partition filters on l2.l_orderkey inside UNION ALL units that only scan l1 → filter is a no-op → all rows in every partition → N× duplication.
Fix: In makeUnionFromTable() (rbo_apply_parallel_ces.cpp), align keyColumn alias with the target table after obtaining it from statistics.
Attachments
Issue Links
- relates to
-
MCOL-6148 Semi-joins and correlated subqueries in filters are not processed by QA
-
- Closed
-