Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39555

`DATE = MEDIUMTEXT` join loses existing matches after `ANALYZE TABLE`,and explicit `CAST(... AS DATE)` does not restore them

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • 10.11, 11.4, 11.8, 12.3, 11.8.6
    • N/A
    • Data types, Optimizer
    • MariaDB Version: 11.8.6-MariaDB
      Client Version: 15.2 for linux-systemd (x86_64)
      OS: Ubuntu 20.04.4 LTS
      CPU: x86_64 (Intel(R) Xeon(R) Platinum 8358P)
    • Not for Release Notes

    Description

      Problem Description
      A "Wrong Result" bug exists in MariaDB for a join between:

      • `t_main.d` of type `DATE`
      • `t_ref.mt` of type `MEDIUMTEXT`

      In the baseline state S1, the query returns 8 rows and 8 distinct right-side
      tags.

      Then S2 is obtained from S1 by append-only inserts into the left table,
      followed by `ANALYZE TABLE`. In S2, the same query returns 500 rows, but all
      rows come from only one right-side tag:

      • `:5-bru_e`

      The other 7 tags that matched in S1 disappear entirely, even though S2 is
      derived from S1 only by appending left-side rows. That is a monotonicity
      violation and therefore a wrong-result bug.

      Two diagnostics further narrow the issue:

      • rewriting the join as `m.d = CAST(r.mt AS DATE)` does *not* restore the
        missing tags
      • avoiding the left-side indexes with `IGNORE INDEX` *does* restore all 8
        tags

      This shows the bug is plan-dependent and is broader than a simple "implicit
      cast omitted" issue. The indexed execution path remains wrong even when the
      text-to-date conversion is made explicit.

      Step-by-step Reproduction

      SET SESSION sql_mode = '';
      SET SESSION max_recursive_iterations = 10000;
       
      DROP DATABASE IF EXISTS repro_mdb_date_mediumtext_join;
      CREATE DATABASE repro_mdb_date_mediumtext_join;
      USE repro_mdb_date_mediumtext_join;
       
      DROP TABLE IF EXISTS t_main;
      DROP TABLE IF EXISTS t_ref;
       
      CREATE TABLE t_main (
          id INT NOT NULL PRIMARY KEY,
          c2 VARCHAR(255) NOT NULL,
          c3 VARCHAR(255) NULL,
          k  INT NULL,
          d  DATE NOT NULL,
          c6 VARCHAR(10) NOT NULL,
          KEY idx_k (k),
          KEY idx_c2 (c2(64)),
          KEY idx_c3 (c3(64)),
          KEY idx_k_d (k, d)
      ) ENGINE=InnoDB;
       
      CREATE TABLE t_ref (
          id  INT NOT NULL PRIMARY KEY,
          tag VARCHAR(64) NOT NULL,
          mt  MEDIUMTEXT NULL
      ) ENGINE=InnoDB;
       
      -- [Phase S1: baseline state]
      INSERT INTO t_main (id, c2, c3, k, d, c6) VALUES
      (1, 'not-a-date', ' 1', 6, '0000-00-00', 'u59-myefohvf1');
       
      INSERT INTO t_ref (id, tag, mt) VALUES
      (9000000, 'bi4n5oob77',          '2:kz-f:8p_ek7du0yf6-24:9h/g70jbm2al7'),
      (9000001, ':5-bru_e',            '0000-00-00'),
      (9000002, 'rr2bu/drxg01vjr',     '-1'),
      (9000003, '2023-01-01',          'aowrto-k5n: ut1wlwcwa'),
      (9000004, 'xjigzyoch7b _',       ':zil dlq5g9'),
      (9000005, 'qcjy__95mec/xbflit',  '0'),
      (9000006, ' 1',                  '_-y91hqjnrylcdavvm1og:cwdw0v0w-yt7kfb5aw'),
      (9000007, 'not-a-date',          '1');
       
      ANALYZE TABLE t_main, t_ref;
       
      SELECT 'S1_count' AS stage, COUNT(*) AS cnt
      FROM t_main m
      JOIN t_ref  r
        ON m.d = r.mt
      WHERE m.k <= r.id;
       
      SELECT 'S1_tags' AS stage,
             GROUP_CONCAT(DISTINCT r.tag ORDER BY r.tag SEPARATOR ' | ') AS tags
      FROM t_main m
      JOIN t_ref  r
        ON m.d = r.mt
      WHERE m.k <= r.id;
       
      EXPLAIN
      SELECT COUNT(*)
      FROM t_main m
      JOIN t_ref  r
        ON m.d = r.mt
      WHERE m.k <= r.id;
       
      -- [Phase S2: append-only expansion]
      INSERT INTO t_main (id, c2, c3, k, d, c6)
      WITH RECURSIVE seq(n) AS (
          SELECT 2
          UNION ALL
          SELECT n + 1 FROM seq WHERE n < 500
      )
      SELECT
          n,
          'not-a-date',
          ' 1',
          CASE MOD(n, 4)
              WHEN 0 THEN 27
              WHEN 1 THEN 7
              WHEN 2 THEN 6
              ELSE 3
          END,
          '0000-00-00',
          'u59-myefohvf1'
      FROM seq;
       
      ANALYZE TABLE t_main;
       
      SELECT 'S2_count' AS stage, COUNT(*) AS cnt
      FROM t_main m
      JOIN t_ref  r
        ON m.d = r.mt
      WHERE m.k <= r.id;
       
      SELECT 'S2_tags' AS stage,
             GROUP_CONCAT(DISTINCT r.tag ORDER BY r.tag SEPARATOR ' | ') AS tags
      FROM t_main m
      JOIN t_ref  r
        ON m.d = r.mt
      WHERE m.k <= r.id;
       
      EXPLAIN
      SELECT COUNT(*)
      FROM t_main m
      JOIN t_ref  r
        ON m.d = r.mt
      WHERE m.k <= r.id;
       
      -- Diagnostic A: explicit cast on the MEDIUMTEXT side
      SELECT 'S2_cast_count' AS stage, COUNT(*) AS cnt
      FROM t_main m
      JOIN t_ref  r
        ON m.d = CAST(r.mt AS DATE)
      WHERE m.k <= r.id;
       
      SELECT 'S2_cast_tags' AS stage,
             GROUP_CONCAT(DISTINCT r.tag ORDER BY r.tag SEPARATOR ' | ') AS tags
      FROM t_main m
      JOIN t_ref  r
        ON m.d = CAST(r.mt AS DATE)
      WHERE m.k <= r.id;
       
      -- Diagnostic B: avoid left-side indexes
      SELECT 'S2_no_index_count' AS stage, COUNT(*) AS cnt
      FROM t_main m IGNORE INDEX (idx_k, idx_c2, idx_c3, idx_k_d)
      JOIN t_ref  r
        ON m.d = r.mt
      WHERE m.k <= r.id;
       
      SELECT 'S2_no_index_tags' AS stage,
             GROUP_CONCAT(DISTINCT r.tag ORDER BY r.tag SEPARATOR ' | ') AS tags
      FROM t_main m IGNORE INDEX (idx_k, idx_c2, idx_c3, idx_k_d)
      JOIN t_ref  r
        ON m.d = r.mt
      WHERE m.k <= r.id;
      

      Observed Results
      The reproducer produced the following results:

      ```text
      S1_count = 8
      S1_tags = 1 | :5-bru_e | 2023-01-01 | bi4n5oob77 |
      not-a-date | qcjy__95mec/xbflit | rr2bu/drxg01vjr |
      xjigzyoch7b _

      S2_count = 500
      S2_tags = :5-bru_e

      S2_cast_count = 500
      S2_cast_tags = :5-bru_e

      S2_no_index_count = 4000
      S2_no_index_tags = 1 | :5-bru_e | 2023-01-01 | bi4n5oob77 |
      not-a-date | qcjy__95mec/xbflit | rr2bu/drxg01vjr |
      xjigzyoch7b _
      ```

      The corresponding `EXPLAIN` output shows a plan change:

      S1:
      ```text
      ----------------------------------------------------------------------------------------------------------+

      id select_type table type possible_keys key key_len ref rows Extra

      ----------------------------------------------------------------------------------------------------------+

      1 SIMPLE m index idx_k,idx_k_d idx_k_d 8 NULL 1 Using index
      1 SIMPLE r ALL PRIMARY NULL NULL NULL 8 Range checked for each record (index map: 0x1)

      ----------------------------------------------------------------------------------------------------------+
      ```

      S2:
      ```text
      -------------------------------------------------------------------------------------------------------+

      id select_type table type possible_keys key key_len ref rows Extra

      -------------------------------------------------------------------------------------------------------+

      1 SIMPLE r ALL PRIMARY NULL NULL NULL 8  
      1 SIMPLE m ALL idx_k,idx_k_d NULL NULL NULL 500 Range checked for each record (index map: 0x12)

      -------------------------------------------------------------------------------------------------------+
      ```

      Diagnostic Analysis
      This is a wrong-result bug because S2 is obtained from S1 only by appending
      rows to `t_main`. Therefore every right-side tag that matched in S1 must still
      match in S2.

      Instead:

      • S1 matches 8 distinct tags
      • S2 keeps only `:5-bru_e`

      So 7 previously matched right-side tags disappear after `ANALYZE TABLE`.

      The bug is not fixed by making the join explicit:

      ```sql
      ON m.d = CAST(r.mt AS DATE)
      ```

      That still returns only `:5-bru_e`.

      However, avoiding the left-side indexes with:

      ```sql
      IGNORE INDEX (idx_k, idx_c2, idx_c3, idx_k_d)
      ```

      restores all 8 tags and increases the row count from 500 to 4000.

      This shows the defect is plan-dependent and tied to the indexed execution path
      chosen after statistics change, not merely to omission of an implicit
      conversion.

      Impact
      MariaDB can silently drop valid join matches for `DATE = MEDIUMTEXT`
      comparisons after `ANALYZE TABLE` and append-only growth, while still
      returning a plausible-looking non-empty result set.

      Because even `CAST(MEDIUMTEXT AS DATE)` remains wrong under the natural S2
      plan, applications cannot reliably avoid the issue just by rewriting the
      predicate with an explicit cast.

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              Dreambreaker yaoruifei
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.