Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28622

Item_subselect eliminated flag set but Item still evaluated/used.

Details

    Description

      poc:

      CREATE TABLE v896 ( CONSTRAINT v901 PRIMARY KEY ( v900 , v899 ) , v897 INTEGER , v898 BIGINT , v899 INT , v900 VARCHAR ( 1 ) , CONSTRAINT v902 UNIQUE INDEX v903 ( v897 , v899 ) ) ;
       CREATE VIEW v904 ( v911 ) AS SELECT EXISTS ( SELECT 1 ) FROM ( WITH v906 AS ( SELECT v897 % 27 != 52 FROM ( SELECT 81 , 15 , v897 FROM v896 WHERE v899 = 78 ) AS v905 GROUP BY v897 ) SELECT -1 * -128 FROM ( SELECT DISTINCT v900 , 'x' FROM v896 ) AS v907 NATURAL JOIN v906 AS v908 , v906 AS v909 NATURAL JOIN v906 ) AS v910 NATURAL JOIN v896 ;
       SELECT DISTINCT ( SELECT v911 FROM v904 WHERE ( v911 ) NOT IN ( SELECT 8 < v911 AND v911 = 0 FROM v904 AS v912 NATURAL JOIN v904 WHERE v911 != -2147483648 GROUP BY v911 ) ) * 63 , ( v911 = -128 OR v911 > 'x' ) FROM v904 WHERE v911 = 40 AND ( v911 = 26 OR v911 = -1 OR v911 = -1 ) LIMIT 1 OFFSET 1 ;
       DROP TABLE v896 ;
       INSERT INTO x VALUES ( -32768 ) ;
      

      output:
      SUMMARY: AddressSanitizer: SEGV /sql/item_subselect.cc:2996 in Item_exists_subselect::exists2in_processor(void*)

      The full error log is in the attachment.

      Attachments

        Issue Links

          Activity

            Looking at

            commit 7b43a0d42d0a53197114594694c4704c23318acc
            Author: Rex <rex.johnston@mariadb.com>
            Date:   Tue Jun 27 15:51:10 2023 +1100
             
                MDEV-28622 Item_subselect eliminated flag set but Item still evaluated/used.
                
                Subquery elimination by optimizer not taken into account elsewhere.
            
            

            Let's simplify the testcase, replace one use "v1" with a reguar table t2:

            create table t2 ( i int);
            SELECT 1 FROM t2 WHERE i IN (SELECT i + 0 FROM v1 WHERE i = -1 GROUP BY i);

            The above works with the patch.
            But if I modify the testcase a bit, I still get the same crash:

            create table t2 ( i int);
            SELECT 1 FROM t2 WHERE i IN (SELECT i + 0 FROM v1 WHERE i = -1 GROUP BY i+1);
            

            psergei Sergei Petrunia added a comment - Looking at commit 7b43a0d42d0a53197114594694c4704c23318acc Author: Rex <rex.johnston@mariadb.com> Date: Tue Jun 27 15:51:10 2023 +1100   MDEV-28622 Item_subselect eliminated flag set but Item still evaluated/used. Subquery elimination by optimizer not taken into account elsewhere. Let's simplify the testcase, replace one use "v1" with a reguar table t2: create table t2 ( i int); SELECT 1 FROM t2 WHERE i IN (SELECT i + 0 FROM v1 WHERE i = -1 GROUP BY i); The above works with the patch. But if I modify the testcase a bit, I still get the same crash: create table t2 ( i int); SELECT 1 FROM t2 WHERE i IN (SELECT i + 0 FROM v1 WHERE i = -1 GROUP BY i+1);
            psergei Sergei Petrunia added a comment - - edited

            Trying to describe the problem

            SELECT 1 FROM t2 WHERE i IN (SELECT i + 0 FROM v1 WHERE i = -1 GROUP BY i);
            

            This query creates the following data structures

            "i" in "SELECT i+0" is an Item_direct_view_ref object. Denote it $item_ref1.
            $item_ref1->ref points to an element in the view's TABLE_LIST::field_translation array.

            The same for "GROUP BY i". "i" is another Item_direct_view_ref $item_ref2. It points
            to the same element in TABLE_LIST::field_translation.

            The element in TABLE_LIST::field_translation points to an Item_exists_subselect object.

            remove_redundant_subquery_clauses() finds "GROUP BY i" in the

            i IN (SELECT i = 0 FROM v1 WHERE i = -1 GROUP BY i)
            

            and eliminates it. The elimination walks through $item_ref2->ref and marks the Item_exists_subselect as eliminated.

            Then, semi-join conversion starts. It looks at

            "i IN (SELECT i = 0 ...)"
            

            and tries to create the IN-equality from it. Somewhere here we fail, because we're
            trying to operate on an item_ref1->ref[0] which is the eliminated Item_exists_subselect.

            Rex's fix

            Rex's patch makes remove_redundant_subquery_clauses() not to mark GROUP BY expression
            as eliminated if the said expression is an Item_direct_view_ref.

            But it doesn't work if the said expression is a function which contains
            Item_direct_view_ref objects somewhere inside it...

            Thoughts

            • Should we make mark_as_eliminated() to not follow the pointer in Item_direct_view_ref::ref ?
            • What if TABLE_LIST::field_translation[i] also had a reference counter which counted the number of "live" references to the item it is pointing to. Then, Item_direct_view_ref should only eliminate (*ref) if we're eliminating the last reference. (TODO: is this even possible with current item->walk(&Item::eliminate_subselect_processor) call? It seems it will walk into *ref unconditionally?)
            psergei Sergei Petrunia added a comment - - edited Trying to describe the problem SELECT 1 FROM t2 WHERE i IN (SELECT i + 0 FROM v1 WHERE i = -1 GROUP BY i); This query creates the following data structures "i" in "SELECT i+0" is an Item_direct_view_ref object. Denote it $item_ref1. $item_ref1->ref points to an element in the view's TABLE_LIST::field_translation array. The same for "GROUP BY i". "i" is another Item_direct_view_ref $item_ref2. It points to the same element in TABLE_LIST::field_translation. The element in TABLE_LIST::field_translation points to an Item_exists_subselect object. remove_redundant_subquery_clauses() finds "GROUP BY i" in the i IN (SELECT i = 0 FROM v1 WHERE i = -1 GROUP BY i) and eliminates it. The elimination walks through $item_ref2->ref and marks the Item_exists_subselect as eliminated. Then, semi-join conversion starts. It looks at "i IN (SELECT i = 0 ...)" and tries to create the IN-equality from it. Somewhere here we fail, because we're trying to operate on an item_ref1->ref [0] which is the eliminated Item_exists_subselect. Rex's fix Rex's patch makes remove_redundant_subquery_clauses() not to mark GROUP BY expression as eliminated if the said expression is an Item_direct_view_ref. But it doesn't work if the said expression is a function which contains Item_direct_view_ref objects somewhere inside it... Thoughts Should we make mark_as_eliminated() to not follow the pointer in Item_direct_view_ref::ref ? What if TABLE_LIST::field_translation [i] also had a reference counter which counted the number of "live" references to the item it is pointing to. Then, Item_direct_view_ref should only eliminate (*ref) if we're eliminating the last reference. (TODO: is this even possible with current item->walk(&Item::eliminate_subselect_processor) call? It seems it will walk into *ref unconditionally?)

            Take aways from discussion with Sanja:

            It seems the right way to fix it is to make the

            item->walk(&Item::eliminate_subselect_processor);
            

            call to not walk into Item_direct_view_ref::ref.

            The problem is that currently walk() method walks the arguments before invoking the "processor" for the item itself.

            class Item_direct_view_ref {
             
                bool walk(Item_processor processor, bool walk_subquery, void *arg)
                { 
                  return (*ref)->walk(processor, walk_subquery, arg) ||                                                                          
                         (this->*processor)(arg);
                } 
            

            psergei Sergei Petrunia added a comment - Take aways from discussion with Sanja: It seems the right way to fix it is to make the item->walk(&Item::eliminate_subselect_processor); call to not walk into Item_direct_view_ref::ref . The problem is that currently walk() method walks the arguments before invoking the "processor" for the item itself. class Item_direct_view_ref {   bool walk(Item_processor processor, bool walk_subquery, void *arg) { return (* ref )->walk(processor, walk_subquery, arg) || ( this ->*processor)(arg); }
            psergei Sergei Petrunia added a comment - - edited

            .. as noted by Igor, in MySQL 8, walk method has a feature to call the processor before walking the arguments:

              bool walk(Item_processor processor, enum_walk walk, uchar *arg) override {
                /*
                  Item_ident processors like aggregate_check*() use
                  enum_walk::PREFIX|enum_walk::POSTFIX and depend on the processor being
                  called twice then.
                */
                return ((walk & enum_walk::PREFIX) && (this->*processor)(arg)) ||
                       ((walk & enum_walk::POSTFIX) && (this->*processor)(arg));
              }
            

            and for Item_ref it is:

              bool walk(Item_processor processor, enum_walk walk, uchar *arg) override {
                // Unresolved items may have m_ref_item = nullptr
                return ((walk & enum_walk::PREFIX) && (this->*processor)(arg)) ||
                       (m_ref_item != nullptr ? ref_item()->walk(processor, walk, arg)
                                              : false) ||
                       ((walk & enum_walk::POSTFIX) && (this->*processor)(arg));
              }
            

            I'm not sure why one can't simply have two processor functions - a prefix one and a postfix one.

            psergei Sergei Petrunia added a comment - - edited .. as noted by Igor, in MySQL 8, walk method has a feature to call the processor before walking the arguments: bool walk(Item_processor processor, enum_walk walk, uchar *arg) override { /* Item_ident processors like aggregate_check*() use enum_walk::PREFIX|enum_walk::POSTFIX and depend on the processor being called twice then. */ return ((walk & enum_walk::PREFIX) && (this->*processor)(arg)) || ((walk & enum_walk::POSTFIX) && (this->*processor)(arg)); } and for Item_ref it is: bool walk(Item_processor processor, enum_walk walk, uchar *arg) override { // Unresolved items may have m_ref_item = nullptr return ((walk & enum_walk::PREFIX) && ( this ->*processor)(arg)) || (m_ref_item != nullptr ? ref_item()->walk(processor, walk, arg) : false ) || ((walk & enum_walk::POSTFIX) && ( this ->*processor)(arg)); } I'm not sure why one can't simply have two processor functions - a prefix one and a postfix one.
            alice Alice Sherepa added a comment -

            please check the test from MDEV-32406 after fixing

            alice Alice Sherepa added a comment - please check the test from MDEV-32406 after fixing

            People

              Johnston Rex Johnston
              nobody Shihao Wen
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.