Details
-
Bug
-
Status: Confirmed (View Workflow)
-
Major
-
Resolution: Unresolved
-
12.3, 11.8.5
-
ubuntu 24.04
Description
Problem
When a table has a STORED generated column defined as TRIM(c4), and a prefix index col(1) is created on that column, a range scan query using WHERE g > '-804207268' returns only 1 row instead of the correct 2 rows. The row with g = 'ٛRIRS/*' (where the first character is U+0659, ARABIC ZAH ABOVE) is incorrectly skipped.
The full index on the same column correctly returns 2 rows, confirming this is a prefix-index-specific range scan issue.
Root Cause Analysis
The bug requires all of the following conditions to trigger:
- A STORED generated column (e.g., TRIM(c4))
- A prefix index on the generated column (e.g., g(1))
- The generated column value starts with a Unicode combining character (U+0659, category Mn = Mark, Nonspacing)
- A range scan predicate (e.g., >, <, BETWEEN) on that column
The issue appears to be in how the prefix index range scan handles combining characters during key comparison. The prefix index stores only the first character of the generated column value. When that first character is a Unicode combining mark, the InnoDB range scan incorrectly determines that the prefix is not within the scan range, causing the row to be skipped.
Minimal Reproduction
DROP DATABASE IF EXISTS test_db0_min; |
CREATE DATABASE test_db0_min; |
USE test_db0_min; |
|
|
CREATE TABLE t (id INT, c4 VARCHAR(64) NOT NULL, g VARCHAR(64) AS (TRIM(c4)) STORED); |
INSERT INTO t (id, c4) VALUES |
(1, 'A'), |
(2, '-804207268'), |
(3, 'ٛRIRS/*'); |
|
|
CREATE INDEX idx_full ON t (g); |
CREATE INDEX idx_prefix ON t (g(1)); |
ANALYZE TABLE t; |
|
|
-- Correct result (full index): 2
|
SELECT 'full index (correct)' AS method, COUNT(*) AS cnt FROM t FORCE INDEX(idx_full) WHERE g > '-804207268'; |
|
|
-- Bug (prefix index): should be 2, returns 1
|
SELECT 'prefix index (BUG)' AS method, COUNT(*) AS cnt FROM t FORCE INDEX(idx_prefix) WHERE g > '-804207268'; |
|
|
-- Show the missing row
|
SELECT 'missing row' AS info, id, c4, g, HEX(g) FROM t WHERE id = 3; |
Expected Result
method cnt
|
full index (correct) 2
|
|
|
method cnt
|
prefix index (BUG) 2
|
|
|
info id c4 g HEX(g)
|
missing row 3 ٛRIRS/* ٛRIRS/* D999524952532F2A
|
Actual Result
method cnt
|
full index (correct) 2
|
|
|
method cnt
|
prefix index (BUG) 1 <-- WRONG: should be 2
|
|
|
info id c4 g HEX(g)
|
missing row 3 ٛRIRS/* ٛRIRS/* D999524952532F2A
|
The row with id=3 (g = 'ٛRIRS/*') satisfies the predicate g > '-804207268' but is incorrectly skipped when the prefix index idx_prefix is used.
Workaround
Using a full index instead of a prefix index avoids this issue:
CREATE INDEX idx_full ON t (g); |
Or increasing the prefix length to cover the full value:
CREATE INDEX idx_prefix ON t (g(64)); |
Additional Notes
- The issue is specific to prefix indexes on STORED generated columns where the value begins with a Unicode combining character
- This affects data integrity: any query using the prefix index for range scans may silently return incomplete results
Attachments
Issue Links
- relates to
-
MDEV-20884 Different resultset when using prefix index and utf8mb4
-
- Confirmed
-