Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-37571

Investigate later hint resolution

    XMLWordPrintable

Details

    • Task
    • Status: In Progress (View Workflow)
    • Major
    • Resolution: Unresolved
    • 12.3
    • Optimizer
    • None
    • Q4/2025 Server Development

    Description

      Original setting

      Investigate resolving hints later to solve chicken/egg problem where hints are needed for derived tables before all VIEWs are opened. The basic premise is that we open VIEWs (which are treated as derived tables) after lexing the parent select. This select and its query blocks are numbered. It may specify hints that have an effect within VIEWs (like [NO_]MERGE) but we won't know until after we open VIEWs. So we need to reconsider derived merge hints again after opening VIEWs. (In effect, we make a decision regarding the hint treatment during parent query lexing, and then have to revisit it upon opening a VIEW, or at the very least defer such decisions until after we open VIEWs).

      From Slack:

      Parsing
        open tables
          LOC1: (somewhere inside we check for OPTIMIZER_SWITCH_DERIVED_MERGE  and hint with the same meainig) (edited) 
            > here we learn that some tables are views and that they have child select #N , #N+1, #N+2 ...
      Name resolution
        somewhere here we check for DERIVED_MERGE again (?)
        Maybe here we could "reconsider" the decision made at LOC1
       
      Defer resolution until
        JOIN object construction ?
        fix_fields for items ?
      

      Second attempt

      Starting points:

      • At the moment, hints cannot control any part of the query plan that comes from constructs inside VIEWs.
      • The desired workflow is:
        • One gets a query using VIEWs.
        • One looks at EXPLAIN output and wants to change something.
        • One uses select numbers from EXPLAIN in top-level query to tweak the query optimization. Changing VIEW definitions is not required. (And in the main query, we just add a hint at the start. No need to place any QB_NAME somewhere inside the query)

      A long-ish text with examples:

      create table t1 (
        a int, b int, 
        index i1(a), index i2(b)
      );
      insert into t1 select seq, seq from seq_1_to_1000;
      

      First, make sure the hint works:

      explain select * from t1 where a<3 or b<3;
      +------+-------------+-------+-------------+---------------+-------+---------+------+------+--------------------------------------+
      | id   | select_type | table | type        | possible_keys | key   | key_len | ref  | rows | Extra                                |
      +------+-------------+-------+-------------+---------------+-------+---------+------+------+--------------------------------------+
      |    1 | SIMPLE      | t1    | index_merge | i1,i2         | i1,i2 | 5,5     | NULL | 4    | Using sort_union(i1,i2); Using where |
      +------+-------------+-------+-------------+---------------+-------+---------+------+------+--------------------------------------+
      

      explain select /*+ NO_INDEX_MERGE(t1 i1,i2) */ * from t1 where a<3 or b<3;
      +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | i1,i2         | NULL | NULL    | NULL | 1000 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
      

      Ok, it does.

      Make sure it works for derived table:

      explain select * 
      from (select * from t1 where a<3 or b<3 limit 100) T;
      +------+-------------+------------+-------------+---------------+-------+---------+------+------+--------------------------------------+
      | id   | select_type | table      | type        | possible_keys | key   | key_len | ref  | rows | Extra                                |
      +------+-------------+------------+-------------+---------------+-------+---------+------+------+--------------------------------------+
      |    1 | PRIMARY     | <derived2> | ALL         | NULL          | NULL  | NULL    | NULL | 4    |                                      |
      |    2 | DERIVED     | t1         | index_merge | i1,i2         | i1,i2 | 5,5     | NULL | 4    | Using sort_union(i1,i2); Using where |
      +------+-------------+------------+-------------+---------------+-------+---------+------+------+--------------------------------------+
      

      explain select /*+ NO_INDEX_MERGE(t1@`select#2` i1,i2) */ * 
      from 
      (select * from t1 where a<3 or b<3 limit 100) T;
      +------+-------------+------------+------+---------------+------+---------+------+------+-------------+
      | id   | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra       |
      +------+-------------+------------+------+---------------+------+---------+------+------+-------------+
      |    1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL | 100  |             |
      |    2 | DERIVED     | t1         | ALL  | i1,i2         | NULL | NULL    | NULL | 1000 | Using where |
      +------+-------------+------------+------+---------------+------+---------+------+------+-------------+
      

      Now, let's create a VIEW like a derived table:

      create view v1 as 
      select * from t1 where a <3 or b <3 limit 100;
      

      Hint no longer works:

      explain select /*+ NO_INDEX_MERGE(t1@`select#2` i1,i2) */ * 
      from v1;
      ...
      Warning (Code 4220): Query block name `select#2` is not found for NO_INDEX_MERGE hint
      

      So, should we just move the hint resolution to happen after the VIEWs are opened and connected to the select_lex chain?

      Hints use pre-merge numbering

      One effect of hint resolution being done so early is that the hints use pre-merge numbering.
      Here we use t1@`select#2` despite that EXPLAIN shows the table in select#1 :

      explain select /*+ NO_INDEX_MERGE(t1@`select#2` i1,i2) */ * 
      from 
      (select * from t1 where a<3 or b<3 ) T;
      +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
      +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | i1,i2         | NULL | NULL    | NULL | 1000 | Using where |
      +------+-------------+-------+------+---------------+------+---------+------+------+-------------+
      

      If we use post-merge numbering, hint will fail to resolve:

      explain select /*+ NO_INDEX_MERGE(t1@`select#1` i1,i2) */ * 
      from 
      (select * from t1 where a<3 or b<3 ) T;
      ...
      Warning (Code 4221): Unresolved table name `t1`@`select#1` for NO_INDEX_MERGE hint
      

      This means currently the number is not "like in EXPLAINs".

      MySQL also uses pre-merge numbering: https://dbfiddle.uk/0-gUGRPg

      Merging can create table naming ambiguities

      In SQL, each table in a SELECT must have a unique alias.
      However, derived and semi-join merge can put tables with the same alias into the same select.
      Example with semi-join:

      explain 
      select * from t1 
      where b in ( select b from t1 where a<3 or b < 3);
      +------+-------------+-------+-------------+---------------+-------+---------+-----------+------+-------------------------------------------------------+
      | id   | select_type | table | type        | possible_keys | key   | key_len | ref       | rows | Extra                                                 |
      +------+-------------+-------+-------------+---------------+-------+---------+-----------+------+-------------------------------------------------------+
      |    1 | PRIMARY     | t1    | index_merge | i1,i2         | i1,i2 | 5,5     | NULL      | 4    | Using sort_union(i1,i2); Using where; Start temporary |
      |    1 | PRIMARY     | t1    | ref         | i2            | i2    | 5       | test.t1.b | 1    | End temporary                                         |
      +------+-------------+-------+-------------+---------------+-------+---------+-----------+------+-------------------------------------------------------+
      

      Example with derived table:

      explain 
      select count(*) from t1, (select b from t1 where a<3 or b < 3) T ;
      +------+-------------+-------+-------------+---------------+-------+---------+------+------+-------------------------------------------------+
      | id   | select_type | table | type        | possible_keys | key   | key_len | ref  | rows | Extra                                           |
      +------+-------------+-------+-------------+---------------+-------+---------+------+------+-------------------------------------------------+
      |    1 | SIMPLE      | t1    | index_merge | i1,i2         | i1,i2 | 5,5     | NULL | 4    | Using sort_union(i1,i2); Using where            |
      |    1 | SIMPLE      | t1    | index       | NULL          | i1    | 5       | NULL | 1000 | Using index; Using join buffer (flat, BNL join) |
      +------+-------------+-------+-------------+---------------+-------+---------+------+------+-------------------------------------------------+
      

      What to do

      So should hints use pre-merge or post-merge select numbers?

      Use pre-merge numbers

      + This is compatible with MySQL
      - For a user, it might be hard to find the pre-merge number in large queries/views.
      The rule is something like: For a VIEW, one must find the select_number of the select $SL where the VIEW reference was in the upper query. Find if $SL had any UNION peers. Then, add $SL->select_number to the "local" number of table in the VIEW.

      Solution: Print the pre-merge number in query_block in EXPLAIN FORMAT=JSON.

      Use post-merge numbers

      + Usability: take EXPLAIN, look at select number, use that in hint.
      - This would be incompatible with MySQL
      - Cannot use post-merge numbers for NO_MERGE hint? ( NO_MERGE hint is checked really early now)
      - Merging may create table naming conflicts.

      Solutions:

      • use a different notation for post-merge select numbers (like "sel#N" instead of "select#n").
        (Note that we shouldn't use SEL_N that TiDB uses as their numbers are pre-merge (right? TODO check))
      • use some notation for conflicting table names "tableX#n" ?

      CTEs work ok

      (for historical purposes: initial discussion was here: https://jira.mariadb.org/browse/MDEV-37260?focusedCommentId=311448&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-311448 , the complaint raised there was partially fixed).

      CTEs can be referred by their derived-like select number. (Technically, CTE occurs in query text before the top-level query, but then renumbering is done which puts the top-level query first).

      create table ten(a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
      create table one_k(a int);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
       
      create table t1 ( a int, b int);
      insert into t1 select a, a from ten;
      create table t2 as select * from t1;
      alter table t2 add index(a);
      

      explain  
      with 
        TBL1 as (select * from t2 where a < 2 limit 10)
      select    
        /*+ NO_INDEX(t2@`select#2` a) NO_INDEX(t2@`select#3` a) */  *
      from t1, TBL1 as T_1, TBL1 as T_2;
      

      +------+-------------+------------+------+---------------+------+---------+------+------+-------------------------------------------+
      | id   | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra                                     |
      +------+-------------+------------+------+---------------+------+---------+------+------+-------------------------------------------+
      |    1 | PRIMARY     | t1         | ALL  | NULL          | NULL | NULL    | NULL | 10   |                                           |
      |    1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL | 10   | Using join buffer (flat, BNL join)        |
      |    1 | PRIMARY     | <derived3> | ALL  | NULL          | NULL | NULL    | NULL | 10   | Using join buffer (incremental, BNL join) |
      |    3 | DERIVED     | t2         | ALL  | NULL          | NULL | NULL    | NULL | 10   | Using where                               |
      |    2 | DERIVED     | t2         | ALL  | NULL          | NULL | NULL    | NULL | 10   | Using where                               |
      +------+-------------+------------+------+---------------+------+---------+------+------+-------------------------------------------+
      

      Attachments

        Issue Links

          Activity

            People

              oleg.smirnov Oleg Smirnov
              Gosselin Dave Gosselin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.