Details
Description
Summary:
MariaDB returns inconsistent results when the same predicate is evaluated in a HAVING clause versus a projection in a derived table. This is caused by the optimizer incorrectly simplifying or caching expressions during range scan generation for PRIMARY KEY columns.
Reproduce Steps:
CREATE TABLE t0(c0 REAL PRIMARY KEY);
|
|
|
INSERT INTO t0 VALUES (-1716357116);
|
INSERT INTO t0 VALUES (695457401);
|
-- Query A cardinality: 2
|
SELECT t0.c0 AS g0 FROM t0 GROUP BY t0.c0 HAVING ((COUNT(g0) != 101057358) AND (g0 < ((+ -375910672) | (g0 IS NULL))));
|
-- Query B cardinality: 1
|
SELECT ref0 FROM (SELECT t0.c0 AS ref0, (((COUNT(t0.c0) != 101057358) AND (t0.c0 < ((+ -375910672) | (t0.c0 IS NULL))))) AS ref1 FROM t0 GROUP BY t0.c0) AS s WHERE ref1;
|
Observed Behavior:
Query A uses access_type: range and incorrectly evaluates the attached_condition, returning 2 rows. The EXPLAIN shows the optimizer simplifies the NULL check on the PK: t0.c0 < <cache>(-375910672 | (/always not null/ 1 is null)).
Query B evaluates the predicate as a functional expression during materialization and correctly returns 1 row.
mysql> -- Query A cardinality: 2
|
mysql> SELECT t0.c0 AS g0 FROM t0 GROUP BY t0.c0 HAVING ((COUNT(g0) != 101057358) AND (g0 < ((+ -375910672) | (g0 IS NULL))));
|
+-------------+
|
| g0 |
|
+-------------+
|
| -1716357116 |
|
| 695457401 |
|
+-------------+
|
2 rows in set (0.00 sec)
|
|
|
mysql> -- Query B cardinality: 1
|
mysql> SELECT ref0 FROM (SELECT t0.c0 AS ref0, (((COUNT(t0.c0) != 101057358) AND (t0.c0 < ((+ -375910672) | (t0.c0 IS NULL))))) AS ref1 FROM t0 GROUP BY t0.c0) AS s WHERE ref1;
|
+-------------+
|
| ref0 |
|
+-------------+
|
| -1716357116 |
|
+-------------+
|
1 row in set (0.00 sec)
|
MySQL v9.6.0 returns 2 rows for both (or 1 depending on data, but always consistent), indicating MariaDB's internal divergence.
Root Cause Analysis:
The Range Optimizer in MariaDB handles bitwise OR operations involving negative constants and IS NULL checks on non-nullable columns differently when they are pushed into the storage engine scan compared to when they are evaluated in the projection layer. The <cache> mechanism in the range scan likely stores an incorrectly casted or truncated value of the complex expression.