Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39554

`(VARCHAR = INT OR DATE = VARCHAR)` returns fewer rows than `DATE = VARCHAR` alone after `ANALYZE TABLE`

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • 10.11, 11.4, 11.8, 12.3, 11.8.6
    • N/A
    • Data types, Optimizer
    • MariaDB Version: 11.8.6-MariaDB
      Client Version: 15.2 for linux-systemd (x86_64)
      OS: Ubuntu 20.04.4 LTS
      CPU: x86_64 (Intel(R) Xeon(R) Platinum 8358P)
    • Not for Release Notes

    Description

      Problem Description
      A "Wrong Result" bug exists in MariaDB for an `OR` predicate combining two
      mixed-type comparisons:

      • `VARCHAR = INT`
      • `DATE = VARCHAR`

      In the baseline state S1, the query returns the expected rows and values.

      Then S2 is obtained from S1 by append-only inserts into the left table,
      followed by `ANALYZE TABLE`. In S2:

      • branch A alone, `r.c4 = m.c4`, returns 0 rows
      • branch B alone, `m.c5 = r.c4`, returns 504 rows
      • but `(r.c4 = m.c4 OR m.c5 = r.c4)` returns only 126 rows

      This is logically impossible. Since branch A is empty, `(A OR B)` must be
      exactly equal to `B`.

      The result is restored by avoiding the left-side indexes with
      `IGNORE INDEX`, which returns the full 504 rows.

      This shows that the wrong result is plan-dependent and tied to the optimizer
      path chosen after statistics change.

      Step-by-step Reproduction

       
      SET SESSION sql_mode = '';
      SET SESSION max_recursive_iterations = 10000;
       
      DROP DATABASE IF EXISTS repro_mdb_or_mixed_cast;
      CREATE DATABASE repro_mdb_or_mixed_cast;
      USE repro_mdb_or_mixed_cast;
       
      DROP TABLE IF EXISTS t_main;
      DROP TABLE IF EXISTS t_ref;
       
      CREATE TABLE t_main (
          id INT NOT NULL PRIMARY KEY,
          c2 VARCHAR(32) NOT NULL,
          c3 VARCHAR(64) NOT NULL,
          c4 INT NULL,
          c5 DATE NOT NULL,
          KEY idx_c4 (c4),
          KEY idx_c5 (c5)
      ) ENGINE=InnoDB;
       
      CREATE TABLE t_ref (
          id  INT NOT NULL PRIMARY KEY,
          c4  VARCHAR(64) NOT NULL,
          c15 DOUBLE NULL
      ) ENGINE=InnoDB;
       
      -- [Phase S1: baseline state]
      -- Branch A is intended to be dead: c4 never equals the numeric coercion of t_ref.c4.
      -- Branch B is intended to be live: c5 = '0000-00-00' should match these strings by implicit DATE coercion.
      INSERT INTO t_main (id, c2, c3, c4, c5) VALUES
      (1, 'z', 'not-a-date', 14, '0000-00-00'),
      (2, 'y', '0000-00-00', 15, '0000-00-00'),
      (3, 'x', '0',          16, '0000-00-00'),
      (4, 'w', '01e0',       17, '0000-00-00');
       
      INSERT INTO t_ref (id, c4, c15) VALUES
      (9000001, 'not-a-date',  1.0),
      (9000002, '0000-00-00',  0.0),
      (9000003, '0',           1.0),
      (9000004, '01e0',       -1.0);
       
      ANALYZE TABLE t_main, t_ref;
       
      SELECT 'S1_orig_count' AS stage, COUNT(*) AS cnt
      FROM t_main m
      JOIN t_ref  r
        ON m.c3 = r.c4
      WHERE (r.c4 = m.c4 OR m.c5 = r.c4);
       
      SELECT 'S1_orig_vals' AS stage,
             GROUP_CONCAT(
               DISTINCT (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END)
               ORDER BY (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END)
             ) AS vals
      FROM t_main m
      JOIN t_ref  r
        ON m.c3 = r.c4
      WHERE (r.c4 = m.c4 OR m.c5 = r.c4);
       
      EXPLAIN
      SELECT COUNT(*)
      FROM t_main m
      JOIN t_ref  r
        ON m.c3 = r.c4
      WHERE (r.c4 = m.c4 OR m.c5 = r.c4);
       
      -- [Phase S2: append-only expansion]
      INSERT INTO t_main (id, c2, c3, c4, c5)
      WITH RECURSIVE seq(n) AS (
          SELECT 1
          UNION ALL
          SELECT n + 1 FROM seq WHERE n < 500
      )
      SELECT
          1000 + n,
          CASE MOD(n, 4)
              WHEN 0 THEN 'zzzz'
              WHEN 1 THEN 'yyyy'
              WHEN 2 THEN 'xxxx'
              ELSE 'wwww'
          END,
          CASE MOD(n, 4)
              WHEN 0 THEN 'not-a-date'
              WHEN 1 THEN '0000-00-00'
              WHEN 2 THEN '0'
              ELSE '01e0'
          END,
          14 + MOD(n, 4),
          '0000-00-00'
      FROM seq;
       
      ANALYZE TABLE t_main;
       
      -- Original query
      SELECT 'S2_orig_count' AS stage, COUNT(*) AS cnt
      FROM t_main m
      JOIN t_ref  r
        ON m.c3 = r.c4
      WHERE (r.c4 = m.c4 OR m.c5 = r.c4);
       
      SELECT 'S2_orig_vals' AS stage,
             GROUP_CONCAT(
               DISTINCT (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END)
               ORDER BY (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END)
             ) AS vals
      FROM t_main m
      JOIN t_ref  r
        ON m.c3 = r.c4
      WHERE (r.c4 = m.c4 OR m.c5 = r.c4);
       
      EXPLAIN
      SELECT COUNT(*)
      FROM t_main m
      JOIN t_ref  r
        ON m.c3 = r.c4
      WHERE (r.c4 = m.c4 OR m.c5 = r.c4);
       
      -- Branch A only: VARCHAR = INT
      SELECT 'S2_branch_a_count' AS stage, COUNT(*) AS cnt
      FROM t_main m
      JOIN t_ref  r
        ON m.c3 = r.c4
      WHERE r.c4 = m.c4;
       
      SELECT 'S2_branch_a_vals' AS stage,
             GROUP_CONCAT(
               DISTINCT (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END)
               ORDER BY (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END)
             ) AS vals
      FROM t_main m
      JOIN t_ref  r
        ON m.c3 = r.c4
      WHERE r.c4 = m.c4;
       
      -- Branch B only: DATE = VARCHAR
      SELECT 'S2_branch_b_count' AS stage, COUNT(*) AS cnt
      FROM t_main m
      JOIN t_ref  r
        ON m.c3 = r.c4
      WHERE m.c5 = r.c4;
       
      SELECT 'S2_branch_b_vals' AS stage,
             GROUP_CONCAT(
               DISTINCT (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END)
               ORDER BY (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END)
             ) AS vals
      FROM t_main m
      JOIN t_ref  r
        ON m.c3 = r.c4
      WHERE m.c5 = r.c4;
       
      -- Original query without left-side indexes
      SELECT 'S2_no_index_count' AS stage, COUNT(*) AS cnt
      FROM t_main m IGNORE INDEX (idx_c4, idx_c5)
      JOIN t_ref  r
        ON m.c3 = r.c4
      WHERE (r.c4 = m.c4 OR m.c5 = r.c4);
       
      SELECT 'S2_no_index_vals' AS stage,
             GROUP_CONCAT(
               DISTINCT (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END)
               ORDER BY (CASE WHEN r.c15 >= -3 THEN r.c15 ELSE -3 END)
             ) AS vals
      FROM t_main m IGNORE INDEX (idx_c4, idx_c5)
      JOIN t_ref  r
        ON m.c3 = r.c4
      WHERE (r.c4 = m.c4 OR m.c5 = r.c4);
      

      Observed Results
      The reproducer produced the following results:

      ```text
      S1_orig_count = 4
      S1_orig_vals = -1,0,1

      S2_orig_count = 126
      S2_orig_vals = 0

      S2_branch_a_count = 0
      S2_branch_a_vals = NULL

      S2_branch_b_count = 504
      S2_branch_b_vals = -1,0,1

      S2_no_index_count = 504
      S2_no_index_vals = -1,0,1
      ```

      The corresponding `EXPLAIN` output shows a plan change:

      S1:
      ```text
      -------------------------------------------------------------------------------------------------------+

      id select_type table type possible_keys key key_len ref rows Extra

      -------------------------------------------------------------------------------------------------------+

      1 SIMPLE m ALL idx_c4,idx_c5 NULL NULL NULL 4  
      1 SIMPLE r ALL NULL NULL NULL NULL 4 Using where; Using join buffer (flat, BNL join)

      -------------------------------------------------------------------------------------------------------+
      ```

      S2:
      ```text
      ------------------------------------------------------------------------------------------------------+

      id select_type table type possible_keys key key_len ref rows Extra

      ------------------------------------------------------------------------------------------------------+

      1 SIMPLE r ALL NULL NULL NULL NULL 4  
      1 SIMPLE m ALL idx_c4,idx_c5 NULL NULL NULL 504 Range checked for each record (index map: 0x6)

      ------------------------------------------------------------------------------------------------------+
      ```

      Diagnostic Analysis
      This is a wrong-result bug because in S2:

      • branch A, `r.c4 = m.c4`, returns 0 rows
      • branch B, `m.c5 = r.c4`, returns 504 rows

      Therefore the disjunction:

      ```sql
      (r.c4 = m.c4 OR m.c5 = r.c4)
      ```

      must be exactly equivalent to branch B alone, and must also return 504 rows.

      Instead, MariaDB returns only 126 rows for the `OR` query under the natural
      post-`ANALYZE TABLE` plan.

      The bug is even more visible in the projected values:

      • branch B returns `-1,0,1`
      • the original `OR` query returns only `0`

      So the optimizer is not just dropping rows; it is selectively dropping the
      rows that produce `-1` and `1`, keeping only a subset of branch B.

      The `IGNORE INDEX` control restores the full 504 rows and the full value set
      `-1,0,1`, confirming that the data itself is correct and the wrong result is
      specific to the natural indexed/`Range checked for each record` plan.

      Expected Behavior
      Since branch A is empty in S2, the original query:

      ```sql
      (r.c4 = m.c4 OR m.c5 = r.c4)
      ```

      must produce exactly the same result set as branch B:

      ```sql
      m.c5 = r.c4
      ```

      So in this reproducer, the correct S2 result is:

      • row count: `504`
      • value set: `-1,0,1`

      regardless of whether the optimizer uses table scan, indexed access, or
      `Range checked for each record`.

      Attachments

        Issue Links

          Activity

            People

              shipjain Shipra Jain
              Dreambreaker yaoruifei
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.