Details
-
Bug
-
Status: In Testing (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
2026-7
Description
Summary
When a query aggregates over a JOIN to a derived table (subquery) and the SELECT list contains two or more arithmetic expressions over an aggregate, each dividing by a COUNT(DISTINCT ...) column projected from that derived table, the second (and later) such expression silently reuses the first expression's divisor. The result is wrong with no warning or error.
The trigger is specifically a distinct-aggregate (COUNT(DISTINCT)) column of the derived table used as an operand. The same query shape with plain COUNT, SUM, MIN, or MAX divisors returns correct results.
Steps to reproduce
CREATE DATABASE IF NOT EXISTS bugdb; |
USE bugdb; |
CREATE TABLE d1 (a INT, b INT) ENGINE=ColumnStore; |
INSERT INTO d1 VALUES (1,2),(1,2),(2,3),(2,3),(2,3); |
|
|
SELECT a, |
COUNT(*)/cntb AS e1, |
COUNT(*)/dcntb AS e2 |
FROM d1 |
JOIN ( SELECT COUNT(b) cntb, COUNT(DISTINCT b) dcntb, a AS aa |
FROM d1 GROUP BY a ) d |
ON d1.a = d.aa |
GROUP BY a, cntb, dcntb |
ORDER BY a; |
Per group, the number of joined rows per a is 2 and 3; the divisors cntb = 2 and 3 and dcntb = 1 and 1. So e1 (rows divided by cntb) should be 1, and e2 (rows divided by dcntb) should be 2 and 3.
| a | e1 (expected) | e2 (expected) | e2 (ColumnStore actual) |
|---|---|---|---|
| 1 | 1.0000 | 2.0000 | 1.0000 (wrong) |
| 2 | 1.0000 | 3.0000 | 1.0000 (wrong) |
e2 is wrongly computed with cntb as its divisor — it reuses e1's divisor instead of its own dcntb. The same query on a row-store engine (e.g. InnoDB) returns the correct e2 = 2.0000 / 3.0000.
Real-world form (original report)
Setup — create the ColumnStore table and bulk-load the data file:
USE cs_lab; |
CREATE TABLE `temp_rilevazione_1520600` ( |
`ri_id` bigint(20) DEFAULT NULL, |
`ri_vi_id` int(11) DEFAULT NULL, |
`ri_mo_id` int(11) DEFAULT NULL, |
`ri_ma_id` int(11) DEFAULT NULL, |
`ri_ce_id` int(11) DEFAULT NULL, |
`ytd_anno` int(11) DEFAULT NULL |
) ENGINE=Columnstore DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci; |
# tab-separated bulk load (237,452 rows)
|
cpimport cs_lab temp_rilevazione_1520600 240229.tsv -s '\t' |
The reporting query computes two ratios against subquery aggregates:
SELECT
|
ri_ma_id AS p4, |
ri_ce_id AS p125, |
ytd_anno AS p13, |
COUNT(ri_id)*100/npres AS m0, -- divisor npres = COUNT(ri_id) |
COUNT(ri_id)/nvi AS m1 -- divisor nvi = COUNT(DISTINCT ri_vi_id) |
FROM temp_rilevazione_1520600 |
JOIN ( SELECT COUNT(ri_id) npres, COUNT(DISTINCT ri_vi_id) nvi, |
ri_ce_id centraleqb, ytd_anno ytd
|
FROM temp_rilevazione_1520600 |
GROUP BY centraleqb, ytd ) DENOM |
ON ri_ce_id = centraleqb AND ytd_anno = ytd |
GROUP BY p4, p125, p13, npres, nvi |
ORDER BY m1 DESC; |
m1 is silently evaluated as COUNT(ri_id)/npres (it reuses m0's divisor npres) instead of COUNT(ri_id)/nvi. With one metric only (just m1) the query is correct; adding m0 is what corrupts m1. Cross-check over the full result: SUM(m1) = 10.00 (buggy ColumnStore) vs 894.07 (correct — matches a row-store engine on the same data).
Trigger conditions (all required)
- Outer aggregation (GROUP BY).
- A JOIN to a derived table / subquery.
- Two or more SELECT arithmetic expressions over an aggregate.
- Their divisors are bare columns projected from the derived table, at least one of which is a COUNT(DISTINCT) (distinct-aggregate) output of the subquery.
Verified NOT to trigger: no join; single expression; or divisors that are plain COUNT / SUM / MIN / MAX (no COUNT(DISTINCT)). The defect is operand-level — each expression keeps its own operator/constants but the distinct-aggregate divisor operand is bound to the wrong column.
Impact
Silent incorrect results for a common analytical pattern (computing multiple ratios/percentages against subquery aggregates via a join). No error is raised, so wrong values can go unnoticed.
Workarounds
Both verified to return correct results:
- Materialize the derived table into a real table and join to it instead of the subquery.
- Wrap each derived divisor in an outer aggregate (constant within the group), e.g. COUNT(ri_id)/MAX(nvi) instead of COUNT(ri_id)/nvi. Use MAX/MIN/AVG (not SUM).