Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
12.3.1
-
None
-
None
-
ubuntu22.04
Description
Summary
Index range access on VARCHAR(1) with utf8mb4_0900_ai_ci can return fewer rows than the same predicate evaluated on a full table scan, when the range constant is longer than the column and collation equates a stored character to that constant (e.g. U+00DF "ß" and 'ss').
Affects
- Version observed: MariaDB 12.3.1 (local build; reproducible with any instance showing the behaviour below).
- Component (suggested): Optimizer — range / index access; possibly Character sets (collation interaction).
Expected behaviour
Under utf8mb4_0900_ai_ci, \u00df (ß) compares equal to 'ss'. Therefore for x BETWEEN 'r' AND 'ss', the row x='ß' must qualify.
Whether the optimizer uses the secondary index or a full scan, the result set must be identical.
Actual behaviour
- With default use of KEY( x ): COUNT( * ) = 0 (row missing).
- With IGNORE INDEX ( x ) (full scan): COUNT( * ) = 1 (correct).
EXPLAIN shows type=range on x, so the wrong result is tied to the range path.
Minimal reproducible example
DROP DATABASE IF EXISTS mdev_min_repro; |
CREATE DATABASE mdev_min_repro; |
USE mdev_min_repro; |
|
|
CREATE TABLE t( |
x VARCHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci, |
KEY (x) |
);
|
INSERT INTO t VALUES ('ß'); |
|
|
SELECT COUNT(*) AS cnt_key FROM t WHERE x BETWEEN 'r' AND 'ss'; |
SELECT COUNT(*) AS cnt_ignore FROM t IGNORE INDEX (x) WHERE x BETWEEN 'r' AND 'ss'; |
- Expected: both counts = 1.
- Observed: cnt_key = 0, cnt_ignore = 1.
Suspected cause (brief)
Range construction appears to truncate the upper bound string to fit VARCHAR(1) (e.g. 'ss' → 's'). That prefix boundary is not equivalent to the full constant under utf8mb4_0900_ai_ci (ß vs "ss"), so rows that should match are excluded when the index range is used.