Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32273

Broken queries are passed to the group by handler for execution

Details

    • Bug
    • Status: Open (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 11.0(EOL)
    • 10.11, 11.4, 11.8
    • None
    • None

    Description

      Probably affects other versions 10.4+.

      Consider the following query (see also [1] for a more detailed account):

      select * from t1 left join t2 on t1.a = t2.a right join t3 on t3.a = t1.a;
      

      The optimizer transforms it to

      select t1.a AS a,t2.a AS a,t3.a AS a from t3 left join (t1 left join t2 on(t2.a = t3.a)) on(t1.a = t3.a) where 1;
      

      This query is unexecutable because t3.a is not a valid column in t1 left join t2.

      But the group by handler, if one is created, is expected to execute
      this query:

        do_select(JOIN *join, Procedure *procedure)
        {
        ...
          if (join->pushdown_query)
          {
        ...
            /* The storage engine will take care of the group by query result */
            int res= join->pushdown_query->execute(join);
        
            if (res)
              DBUG_RETURN(res);
        ...
          }
      

      According to psergei, this is a more general problem that the
      optimizer does not in general try to conserve the validity of the
      query, and it could be impossible to try to fix that.

      [1] https://jira.mariadb.org/browse/MDEV-26247?focusedCommentId=270547&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-270547

      The group by handler is created at a later stage when the query has
      already been modified by the optimizer. So the group by handler
      creation function could not simply save a copy of an executable query.

      bool JOIN::make_aggr_tables_info()
      {
      ...
          if (ht && ht->create_group_by)
          {
          ...
            Query query= {&all_fields, select_distinct || group_optimized_away,
                          tables_list, conds,
                          group_list, order ? order : group_list, having,
                          &select_lex->master_unit()->lim};
            group_by_handler *gbh= ht->create_group_by(thd, &query);
       
            if (gbh)
            {
              if (!(pushdown_query= new (thd->mem_root) Pushdown_query(select_lex,
                                                                       gbh)))
                DBUG_RETURN(1);
      

      Another, less problematic case is the introduction of semi join into
      the query, for example

      SELECT a, b FROM t2 WHERE b IN (SELECT b FROM t2 WHERE a > 10);

      is transformed to

      select t1.a AS a,t1.b AS b from t1 semi join (t2) where t1.b = t2.b and t2.a > 10;

      This is less of an issue because it could be detected during handler
      creation stage and the creation function could simply return NULL,
      causing the execution to not go through the handler.

      One possibility of fixing this issue is to move the creation to
      somewhere earlier. But accordingly to psergei, there does not exist
      a point in the middle of JOIN::optimize() where the query is
      guaranteed to be still valid. So if we were to move the creation it
      has to be at the beginning of JOIN::optimize(), in which case the
      storage engine could store a valid query for execution at execute
      stage. This is what psergei suggested. However this would mean all
      the optimisation effort is wasted. Another issue with this idea is
      what is the difference between this version of group by handler and
      the select handler?

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment -

            This issue is important because if it is left unaddressed, and there
            remains no contract between the sql layer and storage engine regarding
            the validity of the queries passed to group by handlers, any changes
            in query transformation before the execution stage could break storage
            engines using group by handlers.

            First, what should be the "components" of this bug? psergei said it
            should be considered a more general sql layer bug rather than an
            optimizer bug so I guess we can't put optimizer. Second, who should be
            assigned this bug?

            serg

            ycp Yuchen Pei added a comment - This issue is important because if it is left unaddressed, and there remains no contract between the sql layer and storage engine regarding the validity of the queries passed to group by handlers, any changes in query transformation before the execution stage could break storage engines using group by handlers. First, what should be the "components" of this bug? psergei said it should be considered a more general sql layer bug rather than an optimizer bug so I guess we can't put optimizer. Second, who should be assigned this bug? serg

            People

              ycp Yuchen Pei
              ycp Yuchen Pei
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.