Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39062

Optimizer incorrectly falls back to full table scan instead of PRIMARY KEY range scan

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 11.4
    • 11.4.11
    • Optimizer
    • None
    • Not for Release Notes

    Description

      CREATE TABLE TED (
        ID      VARCHAR(20) NOT NULL,
        DATE    VARCHAR(8)  NOT NULL,
        COMP     VARCHAR(10) NOT NULL DEFAULT 'S000',
        PRIMARY KEY (ID, DATE, COMP)
      ) ENGINE=InnoDB;
       
      CREATE TABLE TEM (
        ID          VARCHAR(20) NOT NULL,
        COMP         VARCHAR(10) NOT NULL DEFAULT 'S000',
        STRT_DATE VARCHAR(8)  NOT NULL,
        END_DATE  VARCHAR(8)  NOT NULL,
        PRIMARY KEY (ID, EFFCT_STRT_DATE)
      ) ENGINE=InnoDB;
       
      EXPLAIN
      SELECT *
      FROM TED
      JOIN TEM
        ON TEM.ID  = TED.ID
       AND TEM.COMP = TED.COMP
       AND TED.DATE BETWEEN TEM.STRT_DATE AND TEM.END_DATE
      WHERE TED.ID   = '10004093'
        AND TED.DATE BETWEEN '20251221' AND '20261220';
      

      In 10.6, the query correctly uses PRIMARY KEY range scan. The optimizer trace would show

      - attaching_conditions_to_tables: Constant condition exists on TED.ID → PRIMARY composite range (10004093,20251221) <= (ID,DATE) <= (10004093,20261220) successfully constructed → range scan
      

      After upgrading to 11.4.9, the same query falls back to a full table scan. The optimizer trace would show

      - attaching_conditions_to_tables: No constant condition on TED.ID → PRIMARY composite range cannot be constructed → Falls back to Full Table Scan
      

      The new cost model introduced in MariaDB 11.0+ (and inherited by 11.4) changed how the optimizer evaluates costs.

      Moving from rule-based to cost-based decisions for things like range vs table scan, index selection for ORDER BY, etc. (https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/query-optimizer/the-optimizer-cost-model-from-mariadb-11-0).

      However, the new cost model is about how costs are calculated, not about how constants are propagated through equality chains. The problem here is in substitute_best_equal, which decides which table gets the constant - that's a logical optimization step, not a cost decision.

      1. rows_estimation - initial range analysis per table
      2. considered_execution_plans - join order
      3. substitute_best_equal - picks which side of multiple equal(10004093, TED.ID, TEM.ID) gets the constant
      4. attaching_conditions_to_tables - re-analyzes ranges with the substituted conditions

      In 10.6, step 3 propagated the constant 10004093 to both TED and TEM. In step 4, TED had ID = 10004093 AND DATE BETWEEN ..., giving the composite PRIMARY range (10004093, 20251221) <= (ID, DATE) <= (10004093, 20261220) -> range scan

      In 11.4, step 3 assigns the constant only to TEM and leaves TED with just the column reference TED.ID = TEM.ID. In step 4, TED only has DATE BETWEEN ... - without the leading ID constant, PRIMARY can't be used. The only date-leading index (N02) estimates a high number of rows and gets rejected on cost. Result: full table scan.

      The new cost model documentation explicitly states it changed things like:
      1. Cost calculations for disk reads, index lookups, row fetches
      2. Rule-based to cost-based decisions for ORDER BY/GROUP BY index usage
      3. Removal of the old 10% key lookup cap
      4. Removal of InnoDB's 50% range cap

      None of these changes should affect which table receives a constant during equality propagation. The constant should be propagated to both sides.

      Attachments

        Activity

          People

            mariadb-pavithrapandith Pavithra Pandith
            anngtr An Nguyen-Trinh
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.