[MDEV-39853] Prefix index range scan on STORED generated column incorrectly skips rows containing Unicode combining characters - Jira

XML

Word

Printable

Details

Type: Bug
Status: Confirmed (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 12.3, 11.8.5
Fix Version/s: 11.8, 12.3
Component/s: Character Sets, Server, Storage Engine - InnoDB
Labels:
- prefix-index
- virtual_columns
Environment:
ubuntu 24.04

Description

Problem

When a table has a STORED generated column defined as TRIM(c4), and a prefix index col(1) is created on that column, a range scan query using WHERE g > '-804207268' returns only 1 row instead of the correct 2 rows. The row with g = 'ٛRIRS/*' (where the first character is U+0659, ARABIC ZAH ABOVE) is incorrectly skipped.

The full index on the same column correctly returns 2 rows, confirming this is a prefix-index-specific range scan issue.

Root Cause Analysis

The bug requires all of the following conditions to trigger:

A STORED generated column (e.g., TRIM(c4))
A prefix index on the generated column (e.g., g(1))
The generated column value starts with a Unicode combining character (U+0659, category Mn = Mark, Nonspacing)
A range scan predicate (e.g., >, <, BETWEEN) on that column

The issue appears to be in how the prefix index range scan handles combining characters during key comparison. The prefix index stores only the first character of the generated column value. When that first character is a Unicode combining mark, the InnoDB range scan incorrectly determines that the prefix is not within the scan range, causing the row to be skipped.

Minimal Reproduction

DROP DATABASE IF EXISTS test_db0_min;

CREATE DATABASE test_db0_min;

USE test_db0_min;

CREATE TABLE t (id INT, c4 VARCHAR(64) NOT NULL, g VARCHAR(64) AS (TRIM(c4)) STORED);

INSERT INTO t (id, c4) VALUES

  (1, 'A'),

  (2, '-804207268'),

  (3, 'ٛRIRS/*');

CREATE INDEX idx_full ON t (g);

CREATE INDEX idx_prefix ON t (g(1));

ANALYZE TABLE t;

-- Correct result (full index): 2

SELECT 'full index (correct)' AS method, COUNT(*) AS cnt FROM t FORCE INDEX(idx_full) WHERE g > '-804207268';

-- Bug (prefix index): should be 2, returns 1

SELECT 'prefix index (BUG)' AS method, COUNT(*) AS cnt FROM t FORCE INDEX(idx_prefix) WHERE g > '-804207268';

-- Show the missing row

SELECT 'missing row' AS info, id, c4, g, HEX(g) FROM t WHERE id = 3;

Expected Result

method                cnt

full index (correct)    2

method                cnt

prefix index (BUG)      2

info        id  c4        g         HEX(g)

missing row   3  ٛRIRS/*  ٛRIRS/*  D999524952532F2A

Actual Result

method                cnt

full index (correct)    2

method                cnt

prefix index (BUG)      1   <-- WRONG: should be 2

info        id  c4        g         HEX(g)

missing row   3  ٛRIRS/*  ٛRIRS/*  D999524952532F2A

The row with id=3 (g = 'ٛRIRS/*') satisfies the predicate g > '-804207268' but is incorrectly skipped when the prefix index idx_prefix is used.

Workaround

Using a full index instead of a prefix index avoids this issue:

CREATE INDEX idx_full ON t (g);

Or increasing the prefix length to cover the full value:

CREATE INDEX idx_prefix ON t (g(64));

Additional Notes

The issue is specific to prefix indexes on STORED generated columns where the value begins with a Unicode combining character
This affects data integrity: any query using the prefix index for range scans may silently return incomplete results

Attachments

Issue Links

relates to

MDEV-20884 Different resultset when using prefix index and utf8mb4

Confirmed

Activity

People

Assignee:: Alexander Barkov

Reporter:: Zeyan Li

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2026-06-04 14:42

Updated:: 2026-06-08 14:32

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.