Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
10.11, 11.4, 11.8, 12.3, 11.8.6
-
MariaDB Version: 11.8.6-MariaDB
Client Version: 15.2 for linux-systemd (x86_64)
OS: Ubuntu 20.04.4 LTS
CPU: x86_64 (Intel(R) Xeon(R) Platinum 8358P)
-
Not for Release Notes
Description
Problem Description
A "Wrong Result" bug exists in MariaDB for an `OR` predicate combining two
mixed-type comparisons:
- `VARCHAR = INT`
- `DATE = VARCHAR`
In the baseline state S1, the query returns the expected rows and values.
Then S2 is obtained from S1 by append-only inserts into the left table,
followed by `ANALYZE TABLE`. In S2:
- branch A alone, `r.c4 = m.c4`, returns 0 rows
- branch B alone, `m.c5 = r.c4`, returns 504 rows
- but `(r.c4 = m.c4 OR m.c5 = r.c4)` returns only 126 rows
This is logically impossible. Since branch A is empty, `(A OR B)` must be
exactly equal to `B`.
The result is restored by avoiding the left-side indexes with
`IGNORE INDEX`, which returns the full 504 rows.
This shows that the wrong result is plan-dependent and tied to the optimizer
path chosen after statistics change.
Step-by-step Reproduction
|
|
SET SESSION sql_mode = ''; |
SET SESSION max_recursive_iterations = 10000; |
|
|
DROP DATABASE IF EXISTS repro_mdb_or_mixed_cast; |
CREATE DATABASE repro_mdb_or_mixed_cast; |
USE repro_mdb_or_mixed_cast; |
|
|
DROP TABLE IF EXISTS t_main; |
DROP TABLE IF EXISTS t_ref; |
|
|
CREATE TABLE t_main ( |
id INT NOT NULL PRIMARY KEY, |
c2 VARCHAR(32) NOT NULL, |
c3 VARCHAR(64) NOT NULL, |
c4 INT NULL, |
c5 DATE NOT NULL, |
KEY idx_c4 (c4), |
KEY idx_c5 (c5) |
) ENGINE=InnoDB;
|
|
|
CREATE TABLE t_ref ( |
id INT NOT NULL PRIMARY KEY, |
c4 VARCHAR(64) NOT NULL, |
c15 DOUBLE NULL |
) ENGINE=InnoDB;
|
|
|
-- [Phase S1: baseline state]
|
-- Branch A is intended to be dead: c4 never equals the numeric coercion of t_ref.c4.
|
-- Branch B is intended to be live: c5 = '0000-00-00' should match these strings by implicit DATE coercion.
|
INSERT INTO t_main (id, c2, c3, c4, c5) VALUES |
(1, 'z', 'not-a-date', 14, '0000-00-00'), |
(2, 'y', '0000-00-00', 15, '0000-00-00'), |
(3, 'x', '0', 16, '0000-00-00'), |
(4, 'w', '01e0', 17, '0000-00-00'); |
|
|
INSERT INTO t_ref (id, c4, c15) VALUES |
(9000001, 'not-a-date', 1.0), |
(9000002, '0000-00-00', 0.0), |
(9000003, '0', 1.0), |
(9000004, '01e0', -1.0); |
|
|
ANALYZE TABLE t_main, t_ref; |
|
|
SELECT 'S1_orig_count' AS stage, COUNT(*) AS cnt |
FROM t_main m |
JOIN t_ref r |
ON m.c3 = r.c4 |
WHERE (r.c4 = m.c4 OR m.c5 = r.c4); |
|
|
SELECT 'S1_orig_vals' AS stage, |
GROUP_CONCAT(
|
DISTINCT (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END) |
ORDER BY (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END) |
) AS vals |
FROM t_main m |
JOIN t_ref r |
ON m.c3 = r.c4 |
WHERE (r.c4 = m.c4 OR m.c5 = r.c4); |
|
|
EXPLAIN
|
SELECT COUNT(*) |
FROM t_main m |
JOIN t_ref r |
ON m.c3 = r.c4 |
WHERE (r.c4 = m.c4 OR m.c5 = r.c4); |
|
|
-- [Phase S2: append-only expansion]
|
INSERT INTO t_main (id, c2, c3, c4, c5) |
WITH RECURSIVE seq(n) AS ( |
SELECT 1 |
UNION ALL |
SELECT n + 1 FROM seq WHERE n < 500 |
)
|
SELECT
|
1000 + n,
|
CASE MOD(n, 4) |
WHEN 0 THEN 'zzzz' |
WHEN 1 THEN 'yyyy' |
WHEN 2 THEN 'xxxx' |
ELSE 'wwww' |
END, |
CASE MOD(n, 4) |
WHEN 0 THEN 'not-a-date' |
WHEN 1 THEN '0000-00-00' |
WHEN 2 THEN '0' |
ELSE '01e0' |
END, |
14 + MOD(n, 4),
|
'0000-00-00' |
FROM seq; |
|
|
ANALYZE TABLE t_main; |
|
|
-- Original query
|
SELECT 'S2_orig_count' AS stage, COUNT(*) AS cnt |
FROM t_main m |
JOIN t_ref r |
ON m.c3 = r.c4 |
WHERE (r.c4 = m.c4 OR m.c5 = r.c4); |
|
|
SELECT 'S2_orig_vals' AS stage, |
GROUP_CONCAT(
|
DISTINCT (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END) |
ORDER BY (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END) |
) AS vals |
FROM t_main m |
JOIN t_ref r |
ON m.c3 = r.c4 |
WHERE (r.c4 = m.c4 OR m.c5 = r.c4); |
|
|
EXPLAIN
|
SELECT COUNT(*) |
FROM t_main m |
JOIN t_ref r |
ON m.c3 = r.c4 |
WHERE (r.c4 = m.c4 OR m.c5 = r.c4); |
|
|
-- Branch A only: VARCHAR = INT
|
SELECT 'S2_branch_a_count' AS stage, COUNT(*) AS cnt |
FROM t_main m |
JOIN t_ref r |
ON m.c3 = r.c4 |
WHERE r.c4 = m.c4; |
|
|
SELECT 'S2_branch_a_vals' AS stage, |
GROUP_CONCAT(
|
DISTINCT (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END) |
ORDER BY (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END) |
) AS vals |
FROM t_main m |
JOIN t_ref r |
ON m.c3 = r.c4 |
WHERE r.c4 = m.c4; |
|
|
-- Branch B only: DATE = VARCHAR
|
SELECT 'S2_branch_b_count' AS stage, COUNT(*) AS cnt |
FROM t_main m |
JOIN t_ref r |
ON m.c3 = r.c4 |
WHERE m.c5 = r.c4; |
|
|
SELECT 'S2_branch_b_vals' AS stage, |
GROUP_CONCAT(
|
DISTINCT (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END) |
ORDER BY (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END) |
) AS vals |
FROM t_main m |
JOIN t_ref r |
ON m.c3 = r.c4 |
WHERE m.c5 = r.c4; |
|
|
-- Original query without left-side indexes
|
SELECT 'S2_no_index_count' AS stage, COUNT(*) AS cnt |
FROM t_main m IGNORE INDEX (idx_c4, idx_c5) |
JOIN t_ref r |
ON m.c3 = r.c4 |
WHERE (r.c4 = m.c4 OR m.c5 = r.c4); |
|
|
SELECT 'S2_no_index_vals' AS stage, |
GROUP_CONCAT(
|
DISTINCT (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END) |
ORDER BY (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END) |
) AS vals |
FROM t_main m IGNORE INDEX (idx_c4, idx_c5) |
JOIN t_ref r |
ON m.c3 = r.c4 |
WHERE (r.c4 = m.c4 OR m.c5 = r.c4); |
Observed Results
The reproducer produced the following results:
```text
S1_orig_count = 4
S1_orig_vals = -1,0,1
S2_orig_count = 126
S2_orig_vals = 0
S2_branch_a_count = 0
S2_branch_a_vals = NULL
S2_branch_b_count = 504
S2_branch_b_vals = -1,0,1
S2_no_index_count = 504
S2_no_index_vals = -1,0,1
```
The corresponding `EXPLAIN` output shows a plan change:
S1:
```text
-------------------------------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
-------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | m | ALL | idx_c4,idx_c5 | NULL | NULL | NULL | 4 | |
| 1 | SIMPLE | r | ALL | NULL | NULL | NULL | NULL | 4 | Using where; Using join buffer (flat, BNL join) |
-------------------------------------------------------------------------------------------------------+
```
S2:
```text
------------------------------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | r | ALL | NULL | NULL | NULL | NULL | 4 | |
| 1 | SIMPLE | m | ALL | idx_c4,idx_c5 | NULL | NULL | NULL | 504 | Range checked for each record (index map: 0x6) |
------------------------------------------------------------------------------------------------------+
```
Diagnostic Analysis
This is a wrong-result bug because in S2:
- branch A, `r.c4 = m.c4`, returns 0 rows
- branch B, `m.c5 = r.c4`, returns 504 rows
Therefore the disjunction:
```sql
(r.c4 = m.c4 OR m.c5 = r.c4)
```
must be exactly equivalent to branch B alone, and must also return 504 rows.
Instead, MariaDB returns only 126 rows for the `OR` query under the natural
post-`ANALYZE TABLE` plan.
The bug is even more visible in the projected values:
- branch B returns `-1,0,1`
- the original `OR` query returns only `0`
So the optimizer is not just dropping rows; it is selectively dropping the
rows that produce `-1` and `1`, keeping only a subset of branch B.
The `IGNORE INDEX` control restores the full 504 rows and the full value set
`-1,0,1`, confirming that the data itself is correct and the wrong result is
specific to the natural indexed/`Range checked for each record` plan.
Expected Behavior
Since branch A is empty in S2, the original query:
```sql
(r.c4 = m.c4 OR m.c5 = r.c4)
```
must produce exactly the same result set as branch B:
```sql
m.c5 = r.c4
```
So in this reproducer, the correct S2 result is:
- row count: `504`
- value set: `-1,0,1`
regardless of whether the optimizer uses table scan, indexed access, or
`Range checked for each record`.
Attachments
Issue Links
- duplicates
-
MDEV-39553 `DATE = VARCHAR` filter loses existing matches after `ANALYZE TABLE`, keeping only the literal `'0000-00-00'` row
-
- Confirmed
-