Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22377

Subquery in an UPDATE query uses full scan instead of range

Details

    Description

      Here is the testcase (with totally synthetic data) :

      CREATE TABLE t1 (
        key1 varchar(30) NOT NULL,
        col1 int(11) NOT NULL,
        filler char(100)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
       
      insert into t1 
      select 
        seq, seq, seq
      from seq_1_to_100000;
      

      CREATE TABLE t10 (
        key1 varchar(30) NOT NULL,
        col1 int,
        filler char(100),
        PRIMARY KEY (key1)
      ) ENGINE=InnoDB CHARSET=utf8;
       
      insert into t10
      select 
       seq, seq, seq from seq_1_to_100000;
      

      CREATE TABLE t11 (
        key1 varchar(30) NOT NULL,
        filler char(100),
        PRIMARY KEY (key1)
      ) ENGINE=InnoDB CHARSET=utf8;
       
      insert into t11
      select 
       seq, seq from seq_1_to_100000;
      

      explain
      select * from t1 hist
      where
        hist.col1 NOT IN (SELECT tn.col1 
                          FROM t10 tn JOIN t11 tms ON tms.key1 = tn.key1 
                          WHERE 
                             tn.key1 IN ('1','2','3','4','5','6','7','8','9','10')
                              ) \G
      

      explain update t1 hist
      set filler='aaa'
      WHERE 
        key1 IN ('1','2','3','4','5','6','7','8','9','10') AND 
        hist.col1 NOT IN (SELECT tn.col1
                          FROM t10 tn JOIN t11 tms ON tms.key1 = tn.key1 
                          WHERE tn.key1 IN ('1','2','3','4','5','6','7','8','9','10')
                         ) \G
      

      EXPLAIN output for SELECT:

      *************************** 1. row ***************************
                 id: 1
        select_type: PRIMARY
              table: hist
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 99268
              Extra: Using where
      *************************** 2. row ***************************
                 id: 2
        select_type: MATERIALIZED
              table: tms
               type: range
      possible_keys: PRIMARY
                key: PRIMARY
            key_len: 92
                ref: NULL
               rows: 10
              Extra: Using where; Using index
      *************************** 3. row ***************************
                 id: 2
        select_type: MATERIALIZED
              table: tn
               type: eq_ref
      possible_keys: PRIMARY
                key: PRIMARY
            key_len: 92
                ref: j5.tms.key1
               rows: 1
              Extra: 
      

      Note that table tms uses range access.

      EXPLAIN output for the UPDATE:

      *************************** 1. row ***************************
                 id: 1
        select_type: PRIMARY
              table: hist
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 99268
              Extra: Using where
      *************************** 2. row ***************************
                 id: 2
        select_type: DEPENDENT SUBQUERY
              table: tms
               type: index
      possible_keys: PRIMARY
                key: PRIMARY
            key_len: 92
                ref: NULL
               rows: 85462
              Extra: Using where; Using index
      *************************** 3. row ***************************
                 id: 2
        select_type: DEPENDENT SUBQUERY
              table: tn
               type: eq_ref
      possible_keys: PRIMARY
                key: PRIMARY
            key_len: 92
                ref: j5.tms.key1
               rows: 1
              Extra: Using where
      3 rows in set (0.001 sec)
      

      Now, table "tms" uses full index scan even if it could easily use the same range access.

      Note: the issue goes away if the tables have 10K rows instead of 100K.

      Attachments

        Issue Links

          Activity

            igor Igor Babaev (Inactive) added a comment - - edited

            So we have to set m_cond_equal for the new WHERE condition. It can be done with the following change:

            #if 0
                  List_iterator<Item_equal> li(join_arg->cond_equal->current_level);
                  Item_equal *elem;
                  while ((elem= li++))
                  {
                    and_args->push_back(elem, thd->mem_root);
                  }
            #else
                  ((Item_cond_and *) (join_arg->conds))->m_cond_equal=
                                                         *join_arg->cond_equal;   
                  and_args->append((List<Item> *)&join_arg->cond_equal->current_level);
            

            Note that we prefer to append the whole sublist of multiple equalities to the AND list rather than to add multiple equalities one by one. It allows us not to reset the value of join_arg->cond_equal->current_level.

            igor Igor Babaev (Inactive) added a comment - - edited So we have to set m_cond_equal for the new WHERE condition. It can be done with the following change: #if 0 List_iterator<Item_equal> li(join_arg->cond_equal->current_level); Item_equal *elem; while ((elem= li++)) { and_args->push_back(elem, thd->mem_root); } #else ((Item_cond_and *) (join_arg->conds))->m_cond_equal= *join_arg->cond_equal; and_args->append((List<Item> *)&join_arg->cond_equal->current_level); Note that we prefer to append the whole sublist of multiple equalities to the AND list rather than to add multiple equalities one by one. It allows us not to reset the value of join_arg->cond_equal->current_level.

            This bug might affect any query with a subquery that is subject to in-to-exists transformation either because this is an only option to evaluate the predicand containing this subquery (as in the case of UPDATE) or because the setting of the optimizer switch prohibits to use any other ways to do it.
            This bug must be fixed in 10.2.

            igor Igor Babaev (Inactive) added a comment - This bug might affect any query with a subquery that is subject to in-to-exists transformation either because this is an only option to evaluate the predicand containing this subquery (as in the case of UPDATE) or because the setting of the optimizer switch prohibits to use any other ways to do it. This bug must be fixed in 10.2.
            psergei Sergei Petrunia added a comment - igor , made it into a commit: bb-10.2-mdev22377 , http://lists.askmonty.org/pipermail/commits/2022-February/014877.html

            igor, here's a patch that uses SELECT in the testcase: http://lists.askmonty.org/pipermail/commits/2022-February/014878.html . Also I've pushed it to bb-10.2-mdev22377

            psergei Sergei Petrunia added a comment - igor , here's a patch that uses SELECT in the testcase: http://lists.askmonty.org/pipermail/commits/2022-February/014878.html . Also I've pushed it to bb-10.2-mdev22377

            ok to push into 10.2

            igor Igor Babaev (Inactive) added a comment - ok to push into 10.2

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.