Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39553

`DATE = VARCHAR` filter loses existing matches after `ANALYZE TABLE`, keeping only the literal `'0000-00-00'` row

    XMLWordPrintable

Details

    • Bug
    • Status: Confirmed (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.11, 11.4, 11.8, 12.3, 11.8.6
    • 10.11, 11.4, 11.8, 12.3
    • Data types, Optimizer
    • MariaDB Version: 11.8.6-MariaDB
      Client Version: 15.2 for linux-systemd (x86_64)
      OS: Ubuntu 20.04.4 LTS
      CPU: x86_64 (Intel(R) Xeon(R) Platinum 8358P)

    Description

      Problem Description
      A "Wrong Result" bug exists in MariaDB for an implicit `DATE = VARCHAR`
      comparison in a join filter.

      In the baseline state S1, the query returns matches for four distinct
      right-side ids:

      • `9000002`
      • `9000003`
      • `9000006`
      • `9000007`

      Then S2 is obtained from S1 by append-only inserts into the left table,
      followed by `ANALYZE TABLE`. In S2, the same query keeps only id `9000006`,
      and all other previously matched ids disappear.

      Because S2 is derived from S1 by append-only inserts on the left side, every
      right-side id matched in S1 must still be matched in S2. The disappearance of
      `9000002`, `9000003`, and `9000007` is therefore a monotonicity violation.

      The result can be restored by either:

      • making the suspect predicate explicit with `r.c5 = CAST(m.c3 AS DATE)`
      • avoiding the left-side indexes with `IGNORE INDEX`

      This shows that the wrong result is plan-dependent and tied to the implicit
      `DATE = VARCHAR` coercion path chosen after statistics change.

      Step-by-step Reproduction

      SET SESSION sql_mode = '';
      SET SESSION max_recursive_iterations = 10000;
       
      DROP DATABASE IF EXISTS repro_mdb_date_varchar_filter2;
      CREATE DATABASE repro_mdb_date_varchar_filter2;
      USE repro_mdb_date_varchar_filter2;
       
      DROP TABLE IF EXISTS t_main;
      DROP TABLE IF EXISTS t_ref;
       
      CREATE TABLE t_main (
          id INT NOT NULL PRIMARY KEY,
          c2 VARCHAR(64) NOT NULL,
          c3 VARCHAR(64) NOT NULL,
          c5 DATE NOT NULL,
          KEY idx_c3 (c3),
          KEY idx_c2 (c2),
          KEY idx_c3_c5 (c3, c5)
      ) ENGINE=InnoDB;
       
      CREATE TABLE t_ref (
          id INT NOT NULL PRIMARY KEY,
          c4 VARCHAR(64) NOT NULL,
          c5 DATE NOT NULL,
          c10 ENUM('value1','value2','value3') NULL
      ) ENGINE=InnoDB;
       
      -- [Phase S1: baseline state]
      -- c2 is exact zero-date text, c3 is invalid date text, c5 is zero-date.
      INSERT INTO t_main (id, c2, c3, c5) VALUES
      (1, '0000-00-00', 'not-a-date', '0000-00-00');
       
      -- c4 mixes invalid strings and one exact zero-date string.
      -- c5 is zero-date for all rows.
      INSERT INTO t_ref (id, c4, c5, c10) VALUES
      (9000002, ' ajkj-i_5a2zstzp',   '0000-00-00', 'value1'),
      (9000003, 'ymxlb1f3ew',         '0000-00-00', 'value3'),
      (9000006, '0000-00-00',         '0000-00-00', 'value1'),
      (9000007, '0 192jhz/iq8gwh/h/', '0000-00-00', 'value3');
       
      ANALYZE TABLE t_main, t_ref;
       
      -- Query S1: returns all 4 right-side ids
      SELECT 'S1_count' AS stage, COUNT(*) AS cnt
      FROM t_main m
      JOIN t_ref  r
        ON m.c5 = r.c4
      WHERE r.c10 LIKE 'va%'
        AND r.c5 = m.c2
        AND r.c5 = m.c3;
       
      SELECT 'S1_ids' AS stage, GROUP_CONCAT(DISTINCT r.id ORDER BY r.id) AS ids
      FROM t_main m
      JOIN t_ref  r
        ON m.c5 = r.c4
      WHERE r.c10 LIKE 'va%'
        AND r.c5 = m.c2
        AND r.c5 = m.c3;
       
      EXPLAIN
      SELECT COUNT(*)
      FROM t_main m
      JOIN t_ref  r
        ON m.c5 = r.c4
      WHERE r.c10 LIKE 'va%'
        AND r.c5 = m.c2
        AND r.c5 = m.c3;
       
      -- [Phase S2: append-only expansion]
      INSERT INTO t_main (id, c2, c3, c5)
      WITH RECURSIVE seq(n) AS (
          SELECT 2
          UNION ALL
          SELECT n + 1 FROM seq WHERE n < 500
      )
      SELECT
          n,
          '0000-00-00',
          'not-a-date',
          '0000-00-00'
      FROM seq;
       
      ANALYZE TABLE t_main;
       
      -- Query S2: keeps only the literal zero-date text row
      SELECT 'S2_count' AS stage, COUNT(*) AS cnt
      FROM t_main m
      JOIN t_ref  r
        ON m.c5 = r.c4
      WHERE r.c10 LIKE 'va%'
        AND r.c5 = m.c2
        AND r.c5 = m.c3;
       
      SELECT 'S2_ids' AS stage, GROUP_CONCAT(DISTINCT r.id ORDER BY r.id) AS ids
      FROM t_main m
      JOIN t_ref  r
        ON m.c5 = r.c4
      WHERE r.c10 LIKE 'va%'
        AND r.c5 = m.c2
        AND r.c5 = m.c3;
       
      EXPLAIN
      SELECT COUNT(*)
      FROM t_main m
      JOIN t_ref  r
        ON m.c5 = r.c4
      WHERE r.c10 LIKE 'va%'
        AND r.c5 = m.c2
        AND r.c5 = m.c3;
       
      -- Diagnostic 1: explicit cast on the suspect predicate restores correctness
      SELECT 'S2_cast_c3_count' AS stage, COUNT(*) AS cnt
      FROM t_main m
      JOIN t_ref  r
        ON m.c5 = r.c4
      WHERE r.c10 LIKE 'va%'
        AND r.c5 = m.c2
        AND r.c5 = CAST(m.c3 AS DATE);
       
      SELECT 'S2_cast_c3_ids' AS stage, GROUP_CONCAT(DISTINCT r.id ORDER BY r.id) AS ids
      FROM t_main m
      JOIN t_ref  r
        ON m.c5 = r.c4
      WHERE r.c10 LIKE 'va%'
        AND r.c5 = m.c2
        AND r.c5 = CAST(m.c3 AS DATE);
       
      -- Diagnostic 2: avoiding the left-side indexes also restores correctness
      SELECT 'S2_no_index_count' AS stage, COUNT(*) AS cnt
      FROM t_main m IGNORE INDEX (idx_c3, idx_c2, idx_c3_c5)
      JOIN t_ref  r
        ON m.c5 = r.c4
      WHERE r.c10 LIKE 'va%'
        AND r.c5 = m.c2
        AND r.c5 = m.c3;
       
      SELECT 'S2_no_index_ids' AS stage, GROUP_CONCAT(DISTINCT r.id ORDER BY r.id) AS ids
      FROM t_main m IGNORE INDEX (idx_c3, idx_c2, idx_c3_c5)
      JOIN t_ref  r
        ON m.c5 = r.c4
      WHERE r.c10 LIKE 'va%'
        AND r.c5 = m.c2
        AND r.c5 = m.c3;
      

      Observed Results
      The replay produced the following results:

      S1_count           = 4
      S1_ids             = 9000002,9000003,9000006,9000007
       
      S2_count           = 500
      S2_ids             = 9000006
       
      S2_cast_c3_count   = 2000
      S2_cast_c3_ids     = 9000002,9000003,9000006,9000007
       
      S2_no_index_count  = 2000
      S2_no_index_ids    = 9000002,9000003,9000006,9000007
      

      The corresponding `EXPLAIN` output shows a plan change:

      S1:

      +----+-------------+-------+------+-------------------------+------+---------+------+------+-------------------------------------------------+
      | id | select_type | table | type | possible_keys           | key  | key_len | ref  | rows | Extra                                           |
      +----+-------------+-------+------+-------------------------+------+---------+------+------+-------------------------------------------------+
      | 1  | SIMPLE      | m     | ALL  | idx_c3,idx_c2,idx_c3_c5 | NULL | NULL    | NULL | 1    |                                                 |
      | 1  | SIMPLE      | r     | ALL  | NULL                    | NULL | NULL    | NULL | 4    | Using where; Using join buffer (flat, BNL join) |
      +----+-------------+-------+------+-------------------------+------+---------+------+------+-------------------------------------------------+
      

      S2:

      +----+-------------+-------+------+-------------------------+------+---------+------+------+------------------------------------------------+
      | id | select_type | table | type | possible_keys           | key  | key_len | ref  | rows | Extra                                          |
      +----+-------------+-------+------+-------------------------+------+---------+------+------+------------------------------------------------+
      | 1  | SIMPLE      | r     | ALL  | NULL                    | NULL | NULL    | NULL | 4    | Using where                                    |
      | 1  | SIMPLE      | m     | ALL  | idx_c3,idx_c2,idx_c3_c5 | NULL | NULL    | NULL | 500  | Range checked for each record (index map: 0xE) |
      +----+-------------+-------+------+-------------------------+------+---------+------+------+------------------------------------------------+
      

      Diagnostic Analysis
      This is a wrong-result bug because S2 is derived from S1 by append-only
      inserts into `t_main`, so every right-side id matched in S1 must still be
      matched in S2.

      Instead:

      • S1 matches ids `9000002,9000003,9000006,9000007`
      • S2 keeps only id `9000006`

      So previously existing matches disappear after `ANALYZE TABLE`.

      The diagnostics isolate the failing predicate to:

      ```sql
      r.c5 = m.c3
      ```

      where:

      • `r.c5` is `DATE`
      • `m.c3` is `VARCHAR`

      Two controls confirm this:

      1. Rewriting only that predicate as `r.c5 = CAST(m.c3 AS DATE)` restores all
      four right-side ids.
      2. Avoiding the left-side indexes also restores all four right-side ids.

      So the wrong result is specific to the implicit `DATE = VARCHAR` comparison in
      the S2 access path, not to the underlying data.

      The failure is selective: the S2 plan still keeps the literal
      `'0000-00-00'` text row (`9000006`), but drops rows whose `VARCHAR` text is an
      invalid date-like string and should coerce to zero-date under MariaDB's normal
      comparison rules.

      Expected Behavior
      The query must not lose previously matched right-side ids after append-only
      inserts and `ANALYZE TABLE`.

      In this reproducer, S2 should still contain all ids from S1:

      9000002,9000003,9000006,9000007
      

      With 500 matching left rows in S2, the correct row count is 2000, regardless
      of whether the optimizer uses a table scan, indexed access, or `Range checked
      for each record`.

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              Dreambreaker yaoruifei
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.