Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33704

Item_equal still exists at JOIN::make_aggr_tables_info()

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Fix
    • 10.4(EOL)
    • N/A
    • Optimizer
    • None

    Description

      According to [1], no Item_equal should exist at JOIN::make_aggr_tables_info().

      [1] https://mariadb.slack.com/archives/C021E77G7K2/p1710685372525459

      However, this is not the case during the execution select statement in the following test:

      10.4 d912a6369c6f7f8ba233ac88436d59f6e420c368

      CREATE TABLE `t1` (`c` INT(10) UNSIGNED NOT NULL, `b` VARCHAR(255) NOT NULL , PRIMARY KEY (`c`) USING BTREE ) ENGINE=MYISAM;
      CREATE TABLE `t2` (`a` INT(10) UNSIGNED NOT NULL, `c` INT(10) UNSIGNED NOT NULL ) ENGINE=MYISAM;
      INSERT INTO t1 VALUES(1,'oooo');
      INSERT INTO t2 VALUES(1,1);
      SELECT t2.a,t1.c FRoM t2 LEFT join t1 ON (t2.c = t1.c) WHERE t2.a = 1;
      drop table t1, t2;
      

      To see this, put a breakpoint at JOIN::make_aggr_tables_info(), and when it is hit, print the value of join_list->elem(0)->on_expr:

      (rr) p join_list->elem(0)->on_expr
      $4 = (Item_equal *) 0x52b000065fd8
      

      Attachments

        Issue Links

          Activity

            Note that this is a very edge case: all tables in the join are constant:

            MariaDB [test]> explain extended SELECT t2.a,t1.c FRoM t2 LEFT join t1 ON (t2.c = t1.c) WHERE t2.a = 1;
            +------+-------------+-------+--------+---------------+------+---------+------+------+----------+-------+
            | id   | select_type | table | type   | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
            +------+-------------+-------+--------+---------------+------+---------+------+------+----------+-------+
            |    1 | SIMPLE      | t2    | system | NULL          | NULL | NULL    | NULL | 1    |   100.00 |       |
            |    1 | SIMPLE      | t1    | system | PRIMARY       | NULL | NULL    | NULL | 1    |   100.00 |       |
            +------+-------------+-------+--------+---------------+------+---------+------+------+----------+-------+
            2 rows in set, 1 warning (27.379 sec)
            

            MariaDB [test]> show warnings;
            +-------+------+---------------------------------------------------+
            | Level | Code | Message                                           |
            +-------+------+---------------------------------------------------+
            | Note  | 1003 | select 1 AS `a`,1 AS `c` from `test`.`t1` where 1 |
            +-------+------+---------------------------------------------------+
            1 row in set (0.000 sec)
            

            If I add some rows into t2 (leaving t1 still empty):

            MariaDB [test]> insert into t2 values (1,1),(2,2);
            Query OK, 2 rows affected (0.000 sec)
            Records: 2  Duplicates: 0  Warnings: 0
             
            MariaDB [test]> explain extended SELECT t2.a,t1.c FRoM t2 LEFT join t1 ON (t2.c = t1.c) WHERE t2.a = 1;
            

            Then I see the execution enter this code:

                /*
                  Perform the optimization on fields evaluation mentioned above
                  for all on expressions.
                */
                for (tab= first_linear_tab(this, WITH_BUSH_ROOTS, WITHOUT_CONST_TABLES); tab;
                     tab= next_linear_tab(this, tab, WITH_BUSH_ROOTS))
                {
                  if (*tab->on_expr_ref)
                  {
                    *tab->on_expr_ref= substitute_for_best_equal_field(thd, NO_PARTICULAR_TAB,
                                                                       *tab->on_expr_ref,
                                                                       tab->cond_equal,
                                                                       map2table, true);
            

            (gdb) p dbug_print_item(*tab->on_expr_ref)
              $8 = 0x5555577d3c80 <dbug_item_print_buf> "multiple equal(t2.c, t1.c)"
            (gdb) next
            (gdb) next
            (gdb) next
            (gdb) next
            (gdb) p dbug_print_item(*tab->on_expr_ref)
              $9 = 0x5555577d3c80 <dbug_item_print_buf> "t1.c = t2.c"
            

            and then

              Thread 36 "mysqld" hit Breakpoint 2, JOIN::make_aggr_tables_info (this=0x7ffee8018588) at /home/psergey/dev-git2/10.4-look2/sql/sql_select.cc:328
            (gdb) p join_list->elem(0)->on_expr
              $10 = (Item *) 0x7ffee80982e8
            (gdb) set print obj
            (gdb) p join_list->elem(0)->on_expr
              $11 = (Item_func_eq *) 0x7ffee80982e8
            

            psergei Sergei Petrunia added a comment - Note that this is a very edge case: all tables in the join are constant: MariaDB [test]> explain extended SELECT t2.a,t1.c FRoM t2 LEFT join t1 ON (t2.c = t1.c) WHERE t2.a = 1; +------+-------------+-------+--------+---------------+------+---------+------+------+----------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+--------+---------------+------+---------+------+------+----------+-------+ | 1 | SIMPLE | t2 | system | NULL | NULL | NULL | NULL | 1 | 100.00 | | | 1 | SIMPLE | t1 | system | PRIMARY | NULL | NULL | NULL | 1 | 100.00 | | +------+-------------+-------+--------+---------------+------+---------+------+------+----------+-------+ 2 rows in set, 1 warning (27.379 sec) MariaDB [test]> show warnings; +-------+------+---------------------------------------------------+ | Level | Code | Message | +-------+------+---------------------------------------------------+ | Note | 1003 | select 1 AS `a`,1 AS `c` from `test`.`t1` where 1 | +-------+------+---------------------------------------------------+ 1 row in set (0.000 sec) If I add some rows into t2 (leaving t1 still empty): MariaDB [test]> insert into t2 values (1,1),(2,2); Query OK, 2 rows affected (0.000 sec) Records: 2 Duplicates: 0 Warnings: 0   MariaDB [test]> explain extended SELECT t2.a,t1.c FRoM t2 LEFT join t1 ON (t2.c = t1.c) WHERE t2.a = 1; Then I see the execution enter this code: /* Perform the optimization on fields evaluation mentioned above for all on expressions. */ for (tab= first_linear_tab( this , WITH_BUSH_ROOTS, WITHOUT_CONST_TABLES); tab; tab= next_linear_tab( this , tab, WITH_BUSH_ROOTS)) { if (*tab->on_expr_ref) { *tab->on_expr_ref= substitute_for_best_equal_field(thd, NO_PARTICULAR_TAB, *tab->on_expr_ref, tab->cond_equal, map2table, true ); (gdb) p dbug_print_item(*tab->on_expr_ref) $8 = 0x5555577d3c80 <dbug_item_print_buf> "multiple equal(t2.c, t1.c)" (gdb) next (gdb) next (gdb) next (gdb) next (gdb) p dbug_print_item(*tab->on_expr_ref) $9 = 0x5555577d3c80 <dbug_item_print_buf> "t1.c = t2.c" and then Thread 36 "mysqld" hit Breakpoint 2, JOIN::make_aggr_tables_info (this=0x7ffee8018588) at /home/psergey/dev-git2/10.4-look2/sql/sql_select.cc:328 (gdb) p join_list->elem(0)->on_expr $10 = (Item *) 0x7ffee80982e8 (gdb) set print obj (gdb) p join_list->elem(0)->on_expr $11 = (Item_func_eq *) 0x7ffee80982e8

            ycp, do you have an example where not all tables are marked as constant?

            For the case where all tables are constant, one can argue whether there is a point in trying to use a group_by handler. I think we can bypass it in this case, as the optimizer has already read all constant tables and doesn't need anything from the storage engine. I can review such patch if you make it.

            psergei Sergei Petrunia added a comment - ycp , do you have an example where not all tables are marked as constant? For the case where all tables are constant, one can argue whether there is a point in trying to use a group_by handler. I think we can bypass it in this case, as the optimizer has already read all constant tables and doesn't need anything from the storage engine. I can review such patch if you make it.
            ycp Yuchen Pei added a comment - - edited

            > do you have an example where not all tables are marked as constant?

            No I don't. Is it then not worth finding out what happened in this edge case so that there's still Item_equal left after optimization and rectify it?

            > For the case where all tables are constant, one can argue whether there is a point in trying to use a group_by handler. I think we can bypass it in this case, as the optimizer has already read all constant tables and doesn't need anything from the storage engine. I can review such patch if you make it.

            I'll do that - opened MDEV-33742.

            Meanwhile, it puzzles me how it is determined that the table is const.

            Here's the test with spider tables (omitting the creation of t1 and t2 which is probably irrelevant):

            CREATE TABLE `t1_spider` (`c` INT(10) UNSIGNED NOT NULL, `b` VARCHAR(255) NOT NULL , PRIMARY KEY (`c`) USING BTREE ) COMMENT='wrapper "mysql",srv "srv", table "t1"' ENGINE=SPIDER;
            CREATE TABLE `t2_spider` (`a` INT(10) UNSIGNED NOT NULL, `c` INT(10) UNSIGNED NOT NULL
            , PRIMARY KEY (`a`) USING BTREE
            ) COMMENT='wrapper "mysql",srv "srv",table "t2"' ENGINE=SPIDER;
            INSERT INTO t1_spider VALUES(1,'oooo');
            INSERT INTO t2_spider VALUES(1,1);
            SELECT t2_spider.a,t1_spider.c FRoM t2_spider LEFT join t1_spider ON (t2_spider.c = t1_spider.c) WHERE t2_spider.a = 1;

            I notice that the table t2_spider is marked constant ((0) below) without spider sending any queries to the remote table. Only then the sql layer asks spider to find the value of the field ((1) below). psergei, do you know how it is determined that the table is const below?

            // in make_join_statistics()
            	      if (base_const_ref == base_eq_part &&
                              !has_expensive_keyparts &&
                              !((outer_join & table->map) &&
                                (*s->on_expr_ref)->is_expensive()))
            	      {					// Found everything for ref.
            	        int tmp;
            	        ref_changed = 1;
            	        s->type= JT_CONST;    // (0)
            	        join->const_table_map|=table->map;
            	        set_position(join,const_count++,s,start_keyuse);
                            /* create_ref_for_key will set s->table->const_table */
            	        if (create_ref_for_key(join, s, start_keyuse, FALSE,
            				       found_const_table_map))
                              goto error;
            	        if ((tmp=join_read_const_table(join->thd, s,
                                                           join->positions+const_count-1))) // (1)

            ycp Yuchen Pei added a comment - - edited > do you have an example where not all tables are marked as constant? No I don't. Is it then not worth finding out what happened in this edge case so that there's still Item_equal left after optimization and rectify it? > For the case where all tables are constant, one can argue whether there is a point in trying to use a group_by handler. I think we can bypass it in this case, as the optimizer has already read all constant tables and doesn't need anything from the storage engine. I can review such patch if you make it. I'll do that - opened MDEV-33742 . Meanwhile, it puzzles me how it is determined that the table is const. Here's the test with spider tables (omitting the creation of t1 and t2 which is probably irrelevant): CREATE TABLE `t1_spider` (`c` INT (10) UNSIGNED NOT NULL , `b` VARCHAR (255) NOT NULL , PRIMARY KEY (`c`) USING BTREE ) COMMENT= 'wrapper "mysql",srv "srv", table "t1"' ENGINE=SPIDER; CREATE TABLE `t2_spider` (`a` INT (10) UNSIGNED NOT NULL , `c` INT (10) UNSIGNED NOT NULL , PRIMARY KEY (`a`) USING BTREE ) COMMENT= 'wrapper "mysql",srv "srv",table "t2"' ENGINE=SPIDER; INSERT INTO t1_spider VALUES (1, 'oooo' ); INSERT INTO t2_spider VALUES (1,1); SELECT t2_spider.a,t1_spider.c FRoM t2_spider LEFT join t1_spider ON (t2_spider.c = t1_spider.c) WHERE t2_spider.a = 1; I notice that the table t2_spider is marked constant ((0) below) without spider sending any queries to the remote table. Only then the sql layer asks spider to find the value of the field ((1) below). psergei , do you know how it is determined that the table is const below? // in make_join_statistics() if (base_const_ref == base_eq_part && !has_expensive_keyparts && !((outer_join & table->map) && (*s->on_expr_ref)->is_expensive())) { // Found everything for ref. int tmp; ref_changed = 1; s->type= JT_CONST; // (0) join->const_table_map|=table->map; set_position(join,const_count++,s,start_keyuse); /* create_ref_for_key will set s->table->const_table */ if (create_ref_for_key(join, s, start_keyuse, FALSE, found_const_table_map)) goto error; if ((tmp=join_read_const_table(join->thd, s, join->positions+const_count-1))) // (1)

            No I don't. Is it then not worth finding out what happened in this edge case so that there's still Item_equal left after optimization and rectify it?

            No, it's not worth it. When all tables are constant, the execution plan is essentially "produce nothing". If we are not going to run outer joins, there is no point in coming up with an ON expression.

            (What we should do is to have a better isolation of what is in the query plan vs what was discarded. Unfortunately this is hard to do given the approaches/attitudes adopted here).

            Let me look at other questions posed...

            psergei Sergei Petrunia added a comment - No I don't. Is it then not worth finding out what happened in this edge case so that there's still Item_equal left after optimization and rectify it? No, it's not worth it. When all tables are constant, the execution plan is essentially "produce nothing". If we are not going to run outer joins, there is no point in coming up with an ON expression. (What we should do is to have a better isolation of what is in the query plan vs what was discarded. Unfortunately this is hard to do given the approaches/attitudes adopted here). Let me look at other questions posed...

            I notice that the table t2_spider is marked constant ((0) below) without spider sending any queries to the remote table. Only then the sql layer asks spider to find the value of the field ((1) below). Sergei Petrunia, do you know how it is determined that the table is const below?

            Ok looking at your example. Pretty-printing:

            CREATE TABLE t1_spider (
              c INT(10) UNSIGNED NOT NULL, 
              b VARCHAR(255) NOT NULL , 
              PRIMARY KEY (c) USING BTREE 
            ) COMMENT='wrapper "mysql",srv "srv", table "t1"' ENGINE=SPIDER;
             
            CREATE TABLE t2_spider (
              a INT(10) UNSIGNED NOT NULL, 
              c INT(10) UNSIGNED NOT NULL
            , PRIMARY KEY (a) USING BTREE
            ) COMMENT='wrapper "mysql",srv "srv",table "t2"' ENGINE=SPIDER;
            INSERT INTO t1_spider VALUES(1,'oooo');
            INSERT INTO t2_spider VALUES(1,1);
            

            SELECT 
              t2_spider.a,t1_spider.c 
            FRoM 
              t2_spider LEFT join t1_spider ON (t2_spider.c = t1_spider.c) 
            WHERE 
              t2_spider.a = 1;
            

            The outer table is t2_spider. It has PRIMARY KEY(a), and we have WHERE t2_spider.a=1.
            The optimizer concludes t2 has at most one row with a=1 and reads that row ( with join_read_const_table() call)

            t2_spider.c is now a constant, we know its value.
            Now looking at t1_spider. the ON expression has t1_spider.c =t2.c, and t1_spider has a PRIMARY KEY(c), so there will be at most one row and we read its value. ( with join_read_const_table() call, again)

            Now, all tables in the join are constant. We won't need do actually execute the join.

            psergei Sergei Petrunia added a comment - I notice that the table t2_spider is marked constant ((0) below) without spider sending any queries to the remote table. Only then the sql layer asks spider to find the value of the field ((1) below). Sergei Petrunia, do you know how it is determined that the table is const below? Ok looking at your example. Pretty-printing: CREATE TABLE t1_spider ( c INT (10) UNSIGNED NOT NULL , b VARCHAR (255) NOT NULL , PRIMARY KEY (c) USING BTREE ) COMMENT= 'wrapper "mysql",srv "srv", table "t1"' ENGINE=SPIDER;   CREATE TABLE t2_spider ( a INT (10) UNSIGNED NOT NULL , c INT (10) UNSIGNED NOT NULL , PRIMARY KEY (a) USING BTREE ) COMMENT= 'wrapper "mysql",srv "srv",table "t2"' ENGINE=SPIDER; INSERT INTO t1_spider VALUES (1, 'oooo' ); INSERT INTO t2_spider VALUES (1,1); SELECT t2_spider.a,t1_spider.c FRoM t2_spider LEFT join t1_spider ON (t2_spider.c = t1_spider.c) WHERE t2_spider.a = 1; The outer table is t2_spider. It has PRIMARY KEY(a), and we have WHERE t2_spider.a=1 . The optimizer concludes t2 has at most one row with a=1 and reads that row ( with join_read_const_table() call) t2_spider.c is now a constant, we know its value. Now looking at t1_spider. the ON expression has t1_spider.c =t2.c, and t1_spider has a PRIMARY KEY(c), so there will be at most one row and we read its value. ( with join_read_const_table() call, again) Now, all tables in the join are constant. We won't need do actually execute the join.
            ycp Yuchen Pei added a comment -

            Thanks psergei for confirming there's no need to fix this issue, and for the explanation on how both tables in the example are noted as constant.

            ycp Yuchen Pei added a comment - Thanks psergei for confirming there's no need to fix this issue, and for the explanation on how both tables in the example are noted as constant.

            People

              ycp Yuchen Pei
              ycp Yuchen Pei
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.