Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39853

Prefix index range scan on STORED generated column incorrectly skips rows containing Unicode combining characters

    XMLWordPrintable

Details

    Description

      Problem

      When a table has a STORED generated column defined as TRIM(c4), and a prefix index col(1) is created on that column, a range scan query using WHERE g > '-804207268' returns only 1 row instead of the correct 2 rows. The row with g = 'ٛRIRS/*' (where the first character is U+0659, ARABIC ZAH ABOVE) is incorrectly skipped.

      The full index on the same column correctly returns 2 rows, confirming this is a prefix-index-specific range scan issue.

      Root Cause Analysis

      The bug requires all of the following conditions to trigger:

      1. A STORED generated column (e.g., TRIM(c4))
      2. A prefix index on the generated column (e.g., g(1))
      3. The generated column value starts with a Unicode combining character (U+0659, category Mn = Mark, Nonspacing)
      4. A range scan predicate (e.g., >, <, BETWEEN) on that column

      The issue appears to be in how the prefix index range scan handles combining characters during key comparison. The prefix index stores only the first character of the generated column value. When that first character is a Unicode combining mark, the InnoDB range scan incorrectly determines that the prefix is not within the scan range, causing the row to be skipped.

      Minimal Reproduction

      DROP DATABASE IF EXISTS test_db0_min;
      CREATE DATABASE test_db0_min;
      USE test_db0_min;
       
      CREATE TABLE t (id INT, c4 VARCHAR(64) NOT NULL, g VARCHAR(64) AS (TRIM(c4)) STORED);
      INSERT INTO t (id, c4) VALUES
        (1, 'A'),
        (2, '-804207268'),
        (3, 'ٛRIRS/*');
       
      CREATE INDEX idx_full ON t (g);
      CREATE INDEX idx_prefix ON t (g(1));
      ANALYZE TABLE t;
       
      -- Correct result (full index): 2
      SELECT 'full index (correct)' AS method, COUNT(*) AS cnt FROM t FORCE INDEX(idx_full) WHERE g > '-804207268';
       
      -- Bug (prefix index): should be 2, returns 1
      SELECT 'prefix index (BUG)' AS method, COUNT(*) AS cnt FROM t FORCE INDEX(idx_prefix) WHERE g > '-804207268';
       
      -- Show the missing row
      SELECT 'missing row' AS info, id, c4, g, HEX(g) FROM t WHERE id = 3;
      

      Expected Result

      method                cnt
      full index (correct)    2
       
      method                cnt
      prefix index (BUG)      2
       
      info        id  c4        g         HEX(g)
      missing row   3  ٛRIRS/*  ٛRIRS/*  D999524952532F2A
      

      Actual Result

      method                cnt
      full index (correct)    2
       
      method                cnt
      prefix index (BUG)      1   <-- WRONG: should be 2
       
      info        id  c4        g         HEX(g)
      missing row   3  ٛRIRS/*  ٛRIRS/*  D999524952532F2A
      

      The row with id=3 (g = 'ٛRIRS/*') satisfies the predicate g > '-804207268' but is incorrectly skipped when the prefix index idx_prefix is used.

      Workaround

      Using a full index instead of a prefix index avoids this issue:

      CREATE INDEX idx_full ON t (g);
      

      Or increasing the prefix length to cover the full value:

      CREATE INDEX idx_prefix ON t (g(64));
      

      Additional Notes

      • The issue is specific to prefix indexes on STORED generated columns where the value begins with a Unicode combining character
      • This affects data integrity: any query using the prefix index for range scans may silently return incomplete results

      Attachments

        Issue Links

          Activity

            People

              bar Alexander Barkov
              ZyanNo1 Zeyan Li
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.