Details
-
Bug
-
Status: Confirmed (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.11, 11.4, 11.8, 12.3, 11.8.6
-
MariaDB Version: 11.8.6-MariaDB
Client Version: 15.2 for linux-systemd (x86_64)
OS: Ubuntu 20.04.4 LTS
CPU: x86_64 (Intel(R) Xeon(R) Platinum 8358P)
Description
Problem Description
A "Wrong Result" bug exists in MariaDB for an implicit `DATE = VARCHAR`
comparison in a join filter.
In the baseline state S1, the query returns matches for four distinct
right-side ids:
- `9000002`
- `9000003`
- `9000006`
- `9000007`
Then S2 is obtained from S1 by append-only inserts into the left table,
followed by `ANALYZE TABLE`. In S2, the same query keeps only id `9000006`,
and all other previously matched ids disappear.
Because S2 is derived from S1 by append-only inserts on the left side, every
right-side id matched in S1 must still be matched in S2. The disappearance of
`9000002`, `9000003`, and `9000007` is therefore a monotonicity violation.
The result can be restored by either:
- making the suspect predicate explicit with `r.c5 = CAST(m.c3 AS DATE)`
- avoiding the left-side indexes with `IGNORE INDEX`
This shows that the wrong result is plan-dependent and tied to the implicit
`DATE = VARCHAR` coercion path chosen after statistics change.
Step-by-step Reproduction
SET SESSION sql_mode = ''; |
SET SESSION max_recursive_iterations = 10000; |
|
|
DROP DATABASE IF EXISTS repro_mdb_date_varchar_filter2; |
CREATE DATABASE repro_mdb_date_varchar_filter2; |
USE repro_mdb_date_varchar_filter2; |
|
|
DROP TABLE IF EXISTS t_main; |
DROP TABLE IF EXISTS t_ref; |
|
|
CREATE TABLE t_main ( |
id INT NOT NULL PRIMARY KEY, |
c2 VARCHAR(64) NOT NULL, |
c3 VARCHAR(64) NOT NULL, |
c5 DATE NOT NULL, |
KEY idx_c3 (c3), |
KEY idx_c2 (c2), |
KEY idx_c3_c5 (c3, c5) |
) ENGINE=InnoDB;
|
|
|
CREATE TABLE t_ref ( |
id INT NOT NULL PRIMARY KEY, |
c4 VARCHAR(64) NOT NULL, |
c5 DATE NOT NULL, |
c10 ENUM('value1','value2','value3') NULL |
) ENGINE=InnoDB;
|
|
|
-- [Phase S1: baseline state]
|
-- c2 is exact zero-date text, c3 is invalid date text, c5 is zero-date.
|
INSERT INTO t_main (id, c2, c3, c5) VALUES |
(1, '0000-00-00', 'not-a-date', '0000-00-00'); |
|
|
-- c4 mixes invalid strings and one exact zero-date string.
|
-- c5 is zero-date for all rows.
|
INSERT INTO t_ref (id, c4, c5, c10) VALUES |
(9000002, ' ajkj-i_5a2zstzp', '0000-00-00', 'value1'), |
(9000003, 'ymxlb1f3ew', '0000-00-00', 'value3'), |
(9000006, '0000-00-00', '0000-00-00', 'value1'), |
(9000007, '0 192jhz/iq8gwh/h/', '0000-00-00', 'value3'); |
|
|
ANALYZE TABLE t_main, t_ref; |
|
|
-- Query S1: returns all 4 right-side ids
|
SELECT 'S1_count' AS stage, COUNT(*) AS cnt |
FROM t_main m |
JOIN t_ref r |
ON m.c5 = r.c4 |
WHERE r.c10 LIKE 'va%' |
AND r.c5 = m.c2 |
AND r.c5 = m.c3; |
|
|
SELECT 'S1_ids' AS stage, GROUP_CONCAT(DISTINCT r.id ORDER BY r.id) AS ids |
FROM t_main m |
JOIN t_ref r |
ON m.c5 = r.c4 |
WHERE r.c10 LIKE 'va%' |
AND r.c5 = m.c2 |
AND r.c5 = m.c3; |
|
|
EXPLAIN
|
SELECT COUNT(*) |
FROM t_main m |
JOIN t_ref r |
ON m.c5 = r.c4 |
WHERE r.c10 LIKE 'va%' |
AND r.c5 = m.c2 |
AND r.c5 = m.c3; |
|
|
-- [Phase S2: append-only expansion]
|
INSERT INTO t_main (id, c2, c3, c5) |
WITH RECURSIVE seq(n) AS ( |
SELECT 2 |
UNION ALL |
SELECT n + 1 FROM seq WHERE n < 500 |
)
|
SELECT
|
n,
|
'0000-00-00', |
'not-a-date', |
'0000-00-00' |
FROM seq; |
|
|
ANALYZE TABLE t_main; |
|
|
-- Query S2: keeps only the literal zero-date text row
|
SELECT 'S2_count' AS stage, COUNT(*) AS cnt |
FROM t_main m |
JOIN t_ref r |
ON m.c5 = r.c4 |
WHERE r.c10 LIKE 'va%' |
AND r.c5 = m.c2 |
AND r.c5 = m.c3; |
|
|
SELECT 'S2_ids' AS stage, GROUP_CONCAT(DISTINCT r.id ORDER BY r.id) AS ids |
FROM t_main m |
JOIN t_ref r |
ON m.c5 = r.c4 |
WHERE r.c10 LIKE 'va%' |
AND r.c5 = m.c2 |
AND r.c5 = m.c3; |
|
|
EXPLAIN
|
SELECT COUNT(*) |
FROM t_main m |
JOIN t_ref r |
ON m.c5 = r.c4 |
WHERE r.c10 LIKE 'va%' |
AND r.c5 = m.c2 |
AND r.c5 = m.c3; |
|
|
-- Diagnostic 1: explicit cast on the suspect predicate restores correctness
|
SELECT 'S2_cast_c3_count' AS stage, COUNT(*) AS cnt |
FROM t_main m |
JOIN t_ref r |
ON m.c5 = r.c4 |
WHERE r.c10 LIKE 'va%' |
AND r.c5 = m.c2 |
AND r.c5 = CAST(m.c3 AS DATE); |
|
|
SELECT 'S2_cast_c3_ids' AS stage, GROUP_CONCAT(DISTINCT r.id ORDER BY r.id) AS ids |
FROM t_main m |
JOIN t_ref r |
ON m.c5 = r.c4 |
WHERE r.c10 LIKE 'va%' |
AND r.c5 = m.c2 |
AND r.c5 = CAST(m.c3 AS DATE); |
|
|
-- Diagnostic 2: avoiding the left-side indexes also restores correctness
|
SELECT 'S2_no_index_count' AS stage, COUNT(*) AS cnt |
FROM t_main m IGNORE INDEX (idx_c3, idx_c2, idx_c3_c5) |
JOIN t_ref r |
ON m.c5 = r.c4 |
WHERE r.c10 LIKE 'va%' |
AND r.c5 = m.c2 |
AND r.c5 = m.c3; |
|
|
SELECT 'S2_no_index_ids' AS stage, GROUP_CONCAT(DISTINCT r.id ORDER BY r.id) AS ids |
FROM t_main m IGNORE INDEX (idx_c3, idx_c2, idx_c3_c5) |
JOIN t_ref r |
ON m.c5 = r.c4 |
WHERE r.c10 LIKE 'va%' |
AND r.c5 = m.c2 |
AND r.c5 = m.c3; |
Observed Results
The replay produced the following results:
S1_count = 4
|
S1_ids = 9000002,9000003,9000006,9000007
|
|
|
S2_count = 500
|
S2_ids = 9000006
|
|
|
S2_cast_c3_count = 2000
|
S2_cast_c3_ids = 9000002,9000003,9000006,9000007
|
|
|
S2_no_index_count = 2000
|
S2_no_index_ids = 9000002,9000003,9000006,9000007
|
The corresponding `EXPLAIN` output shows a plan change:
S1:
+----+-------------+-------+------+-------------------------+------+---------+------+------+-------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+----+-------------+-------+------+-------------------------+------+---------+------+------+-------------------------------------------------+
|
| 1 | SIMPLE | m | ALL | idx_c3,idx_c2,idx_c3_c5 | NULL | NULL | NULL | 1 | |
|
| 1 | SIMPLE | r | ALL | NULL | NULL | NULL | NULL | 4 | Using where; Using join buffer (flat, BNL join) |
|
+----+-------------+-------+------+-------------------------+------+---------+------+------+-------------------------------------------------+
|
S2:
+----+-------------+-------+------+-------------------------+------+---------+------+------+------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+----+-------------+-------+------+-------------------------+------+---------+------+------+------------------------------------------------+
|
| 1 | SIMPLE | r | ALL | NULL | NULL | NULL | NULL | 4 | Using where |
|
| 1 | SIMPLE | m | ALL | idx_c3,idx_c2,idx_c3_c5 | NULL | NULL | NULL | 500 | Range checked for each record (index map: 0xE) |
|
+----+-------------+-------+------+-------------------------+------+---------+------+------+------------------------------------------------+
|
Diagnostic Analysis
This is a wrong-result bug because S2 is derived from S1 by append-only
inserts into `t_main`, so every right-side id matched in S1 must still be
matched in S2.
Instead:
- S1 matches ids `9000002,9000003,9000006,9000007`
- S2 keeps only id `9000006`
So previously existing matches disappear after `ANALYZE TABLE`.
The diagnostics isolate the failing predicate to:
```sql
r.c5 = m.c3
```
where:
- `r.c5` is `DATE`
- `m.c3` is `VARCHAR`
Two controls confirm this:
1. Rewriting only that predicate as `r.c5 = CAST(m.c3 AS DATE)` restores all
four right-side ids.
2. Avoiding the left-side indexes also restores all four right-side ids.
So the wrong result is specific to the implicit `DATE = VARCHAR` comparison in
the S2 access path, not to the underlying data.
The failure is selective: the S2 plan still keeps the literal
`'0000-00-00'` text row (`9000006`), but drops rows whose `VARCHAR` text is an
invalid date-like string and should coerce to zero-date under MariaDB's normal
comparison rules.
Expected Behavior
The query must not lose previously matched right-side ids after append-only
inserts and `ANALYZE TABLE`.
In this reproducer, S2 should still contain all ids from S1:
9000002,9000003,9000006,9000007
|
With 500 matching left rows in S2, the correct row count is 2000, regardless
of whether the optimizer uses a table scan, indexed access, or `Range checked
for each record`.
Attachments
Issue Links
- is duplicated by
-
MDEV-39552 Join returns 0 rows after ANALYZE due to inconsistent implicit casting in VARCHAR = BIGINT comparison
-
- Closed
-
-
MDEV-39554 `(VARCHAR = INT OR DATE = VARCHAR)` returns fewer rows than `DATE = VARCHAR` alone after `ANALYZE TABLE`
-
- Closed
-
-
MDEV-39555 `DATE = MEDIUMTEXT` join loses existing matches after `ANALYZE TABLE`,and explicit `CAST(... AS DATE)` does not restore them
-
- Closed
-