Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39193

Wrong results: index range vs full scan on utf8mb4_0900_ai_ci VARCHAR(1) when bound string is longer than column (ß / 'ss')

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 12.3.1
    • None
    • Optimizer
    • None
    • ubuntu22.04

    Description

      Summary

      Index range access on VARCHAR(1) with utf8mb4_0900_ai_ci can return fewer rows than the same predicate evaluated on a full table scan, when the range constant is longer than the column and collation equates a stored character to that constant (e.g. U+00DF "ß" and 'ss').

      Affects

      • Version observed: MariaDB 12.3.1 (local build; reproducible with any instance showing the behaviour below).
      • Component (suggested): Optimizer — range / index access; possibly Character sets (collation interaction).

      Expected behaviour

      Under utf8mb4_0900_ai_ci, \u00df (ß) compares equal to 'ss'. Therefore for x BETWEEN 'r' AND 'ss', the row x='ß' must qualify.

      Whether the optimizer uses the secondary index or a full scan, the result set must be identical.

      Actual behaviour

      • With default use of KEY( x ): COUNT( * ) = 0 (row missing).
      • With IGNORE INDEX ( x ) (full scan): COUNT( * ) = 1 (correct).

      EXPLAIN shows type=range on x, so the wrong result is tied to the range path.

      Minimal reproducible example

      DROP DATABASE IF EXISTS mdev_min_repro;
      CREATE DATABASE mdev_min_repro;
      USE mdev_min_repro;
       
      CREATE TABLE t(
        x VARCHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
        KEY (x)
      );
      INSERT INTO t VALUES ('ß');
       
      SELECT COUNT(*) AS cnt_key FROM t WHERE x BETWEEN 'r' AND 'ss';
      SELECT COUNT(*) AS cnt_ignore FROM t IGNORE INDEX (x) WHERE x BETWEEN 'r' AND 'ss';
      

      • Expected: both counts = 1.
      • Observed: cnt_key = 0, cnt_ignore = 1.

      Suspected cause (brief)

      Range construction appears to truncate the upper bound string to fit VARCHAR(1) (e.g. 'ss''s'). That prefix boundary is not equivalent to the full constant under utf8mb4_0900_ai_ci (ß vs "ss"), so rows that should match are excluded when the index range is used.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mu mu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.