Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-317

CHEAP SQ: A query with EXISTS subquery, aggregate function in subquery, ORDER BY and LIMIT takes much longer than on main tree if optimizer_prune_level=0

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Won't Fix
    • None
    • 5.5.27
    • None
    • None

    Description

      Note: with the provided test case, the problem is only reproducible when optimizer_prune_level=0. If you decide that it's acceptable and isn't worth fixing, I won't object, but I'm filing it so you could check if it's expected.

      The following query

      SELECT a FROM t1, t2, t3 AS t3_alias
      WHERE b <= ( SELECT SUM(a) FROM t1 )
        AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c )
      ORDER BY a LIMIT 1;

      takes ~18 times longer on MDEV-193 tree comparing to main/5.5 revno 3426 (and to 3402).

      Reproducible with MyISAM and Aria; with InnoDB it's sporadic.
      Reproducible with the default optimizer_switch.

      EXPLAIN on MDEV-193:

       
      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	t1	index	NULL	a	5	NULL	40	100.00	Using index; Using temporary; Using filesort
      1	PRIMARY	t2	range	b	b	5	NULL	96	100.00	Using where; Using index; Using join buffer (flat, BNL join)
      1	PRIMARY	t3_alias	index	NULL	c	5	NULL	40	100.00	Using where; Using index; Using join buffer (incremental, BNL join)
      3	DEPENDENT SUBQUERY	t3	index	c	c	5	NULL	40	100.00	Using where; Using index
      2	SUBQUERY	t1	index	NULL	a	5	NULL	40	100.00	Using index
      Warnings:
      Note	1276	Field or reference 'test.t3_alias.c' of SELECT #3 was resolved in SELECT #1
      Note	1003	select `test`.`t1`.`a` AS `a` from `test`.`t1` join `test`.`t2` join `test`.`t3` `t3_alias` where ((`test`.`t2`.`b` <= (select sum(`test`.`t1`.`a`) from `test`.`t1`)) and <expr_cache><`test`.`t3_alias`.`c`>(exists(select 1 from `test`.`t3` where (`test`.`t3`.`c` > `test`.`t3_alias`.`c`)))) order by `test`.`t1`.`a` limit 1

      EXPLAIN on maria/5.5:

       
      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	t1	index	NULL	a	5	NULL	40	100.00	Using index; Using temporary; Using filesort
      1	PRIMARY	t3_alias	index	NULL	c	5	NULL	40	100.00	Using where; Using index; Using join buffer (flat, BNL join)
      1	PRIMARY	t2	index	b	b	5	NULL	100	100.00	Using where; Using index; Using join buffer (incremental, BNL join)
      3	DEPENDENT SUBQUERY	t3	index	c	c	5	NULL	40	100.00	Using where; Using index
      2	SUBQUERY	t1	index	NULL	a	5	NULL	40	100.00	Using index
      Warnings:
      Note	1276	Field or reference 'test.t3_alias.c' of SELECT #3 was resolved in SELECT #1
      Note	1003	select `test`.`t1`.`a` AS `a` from `test`.`t1` join `test`.`t2` join `test`.`t3` `t3_alias` where ((`test`.`t2`.`b` <= (select sum(`test`.`t1`.`a`) from `test`.`t1`)) and <expr_cache><`test`.`t3_alias`.`c`>(exists(select 1 from `test`.`t3` where (`test`.`t3`.`c` > `test`.`t3_alias`.`c`)))) order by `test`.`t1`.`a` limit 1

      Test case:

      SET optimizer_prune_level = 0;
       
      CREATE TABLE t1 (a INT, KEY(a)) ENGINE=MyISAM;
      INSERT INTO t1 VALUES
        (0),(8),(1),(8),(9),(24),(6),(1),(6),(2),
        (4),(8),(4),(4),(7),(4),(1),(9),(4),(8);
      INSERT INTO t1 SELECT * FROM t1;
       
      CREATE TABLE t2 (b INT, KEY(b)) ENGINE=MyISAM;
      INSERT INTO t2 VALUES
        (4),(8),(0),(0),(0),(7),(7),(5),(3),(188),
        (4),(9),(6),(1),(5),(6),(2),(4),(231),(4),
        (3),(3),(7),(6),(7),(9),(4),(4),(2),(1),(2),
        (194),(2),(3),(8),(4),(9),(4),(5),(5),(9),
        (3),(8),(0),(98),(3),(1),(0),(189),(8),(3),
        (3),(9),(6),(8),(3),(9),(5),(9),(2),(2),(5),
        (8),(6),(9),(0),(3),(6),(5),(8),(2),(120),
        (25),(1),(3),(1),(3),(153),(5),(9),(1),(8),
        (7),(6),(2),(4),(7),(3),(8),(4),(6),(1),(7),
        (1),(0),(2),(7),(2),(1),(5);
       
      CREATE TABLE t3 (c INT, KEY(c)) ENGINE=MyISAM;
      INSERT INTO t3 VALUES
        (7),(0),(9),(3),(4),(2),(5),(3),(1),(3),(6),
        (7),(5),(1),(204),(224),(9),(5),(0),(3);
      INSERT INTO t3 SELECT * FROM t3;
       
      SELECT a FROM t1, t2, t3 AS t3_alias
      WHERE b <= ( SELECT SUM(a) FROM t1 )
        AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c )
      ORDER BY a LIMIT 1;

      Attachments

        Issue Links

          Activity

            This is not a bug, but rather a deficiency of the current optimizer that is
            completely unrelated to mdev-193. It is a regression only with respect to
            MariaDB 5.5/5.3. Detailed explanation follows.

            1. What happens in mdev-193

            The patch for mdev-193 makes MariaDB 5.5 behave like any MySQL version, or
            MariaDB 5.2, namely - it allows constant subqueries to be evaluated during
            optimization, and the resulting value to be used by the optimizer.

            Specifically in this test case, the subquery (SELECT SUM(a) FROM t1) is
            evaluated by the range optimizer, and the final query plan uses "range"
            access to table "t2".

            If we set "optimizer_prune_level = 0" (that is, not pruning, consider all
            plans), the optimizer determines that the optimal query plan is:

            ------------------------------------------------------------------------------------------------------------------------------------------------

            id select_type table type possible_keys key key_len ref rows filtered Extra

            ------------------------------------------------------------------------------------------------------------------------------------------------

            1 PRIMARY t1 index NULL a 5 NULL 40 100.00 Using index
            1 PRIMARY t2 range b b 5 NULL 96 100.00 Using where; Using index; Using join buffer (flat, BNL join)
            1 PRIMARY t3_alias index NULL c 5 NULL 40 100.00 Using where; Using index; Using join buffer (incremental, BNL join)
            3 DEPENDENT SUBQUERY t3 index c c 5 NULL 40 100.00 Using where; Using index
            2 SUBQUERY t1 index NULL a 5 NULL 40 100.00 Using index

            ------------------------------------------------------------------------------------------------------------------------------------------------
            join->best_read = 32396

            If we set "optimizer_prune_level = 1" (that is prune some plans), the optimizer
            prunes the above plan, and finds a worse plan with respect to its cost model:

            ------------------------------------------------------------------------------------------------------------------------------------------------

            id select_type table type possible_keys key key_len ref rows filtered Extra

            ------------------------------------------------------------------------------------------------------------------------------------------------

            1 PRIMARY t1 index NULL a 5 NULL 40 100.00 Using index
            1 PRIMARY t3_alias index NULL c 5 NULL 40 100.00 Using where; Using index; Using join buffer (flat, BNL join)
            1 PRIMARY t2 range b b 5 NULL 96 100.00 Using where; Using index; Using join buffer (incremental, BNL join)
            3 DEPENDENT SUBQUERY t3 index c c 5 NULL 40 100.00 Using where; Using index
            2 SUBQUERY t1 index NULL a 5 NULL 40 100.00 Using index

            ------------------------------------------------------------------------------------------------------------------------------------------------
            join->best_read = 66869 (~ two times higher cost)

            However, the second plan in reality is much faster. The reason for this is that
            table "t3_alias" appears in second place in the plan instead of in third, as a
            result the correlated EXISTS subquery is executed fewer times (1531 times). The
            first "optimal" plan puts table "t3_alias" in third place in the join, which
            results in the EXISTS subquery being executed 152011 times. At the same time
            the result of the constant subquery is cached, so it doesn't matter much how
            many times it is evaluated.

            2. Exposing the same problem in the main 5.5 branch

            If we substitute manually the aggregate subquery with its constant value, we can
            reproduce the problem in 5.5 via the following test case:

            – better plan
            SET optimizer_prune_level = 1;

            EXPLAIN
            SELECT count FROM t1, t2, t3 AS t3_alias
            WHERE b <= 236
            AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c )
            ORDER BY a LIMIT 1;

            – worse plan
            SET optimizer_prune_level = 0;

            EXPLAIN
            SELECT count FROM t1, t2, t3 AS t3_alias
            WHERE b <= 236
            AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c )
            ORDER BY a LIMIT 1;

            timour Timour Katchaounov (Inactive) added a comment - This is not a bug, but rather a deficiency of the current optimizer that is completely unrelated to mdev-193. It is a regression only with respect to MariaDB 5.5/5.3. Detailed explanation follows. 1. What happens in mdev-193 The patch for mdev-193 makes MariaDB 5.5 behave like any MySQL version, or MariaDB 5.2, namely - it allows constant subqueries to be evaluated during optimization, and the resulting value to be used by the optimizer. Specifically in this test case, the subquery (SELECT SUM(a) FROM t1) is evaluated by the range optimizer, and the final query plan uses "range" access to table "t2". If we set "optimizer_prune_level = 0" (that is, not pruning, consider all plans), the optimizer determines that the optimal query plan is: ----- ------------------ -------- ----- ------------- ---- ------- ---- ---- -------- -------------------------------------------------------------------- id select_type table type possible_keys key key_len ref rows filtered Extra ----- ------------------ -------- ----- ------------- ---- ------- ---- ---- -------- -------------------------------------------------------------------- 1 PRIMARY t1 index NULL a 5 NULL 40 100.00 Using index 1 PRIMARY t2 range b b 5 NULL 96 100.00 Using where; Using index; Using join buffer (flat, BNL join) 1 PRIMARY t3_alias index NULL c 5 NULL 40 100.00 Using where; Using index; Using join buffer (incremental, BNL join) 3 DEPENDENT SUBQUERY t3 index c c 5 NULL 40 100.00 Using where; Using index 2 SUBQUERY t1 index NULL a 5 NULL 40 100.00 Using index ----- ------------------ -------- ----- ------------- ---- ------- ---- ---- -------- -------------------------------------------------------------------- join->best_read = 32396 If we set "optimizer_prune_level = 1" (that is prune some plans), the optimizer prunes the above plan, and finds a worse plan with respect to its cost model: ----- ------------------ -------- ----- ------------- ---- ------- ---- ---- -------- -------------------------------------------------------------------- id select_type table type possible_keys key key_len ref rows filtered Extra ----- ------------------ -------- ----- ------------- ---- ------- ---- ---- -------- -------------------------------------------------------------------- 1 PRIMARY t1 index NULL a 5 NULL 40 100.00 Using index 1 PRIMARY t3_alias index NULL c 5 NULL 40 100.00 Using where; Using index; Using join buffer (flat, BNL join) 1 PRIMARY t2 range b b 5 NULL 96 100.00 Using where; Using index; Using join buffer (incremental, BNL join) 3 DEPENDENT SUBQUERY t3 index c c 5 NULL 40 100.00 Using where; Using index 2 SUBQUERY t1 index NULL a 5 NULL 40 100.00 Using index ----- ------------------ -------- ----- ------------- ---- ------- ---- ---- -------- -------------------------------------------------------------------- join->best_read = 66869 (~ two times higher cost) However, the second plan in reality is much faster. The reason for this is that table "t3_alias" appears in second place in the plan instead of in third, as a result the correlated EXISTS subquery is executed fewer times (1531 times). The first "optimal" plan puts table "t3_alias" in third place in the join, which results in the EXISTS subquery being executed 152011 times. At the same time the result of the constant subquery is cached, so it doesn't matter much how many times it is evaluated. 2. Exposing the same problem in the main 5.5 branch If we substitute manually the aggregate subquery with its constant value, we can reproduce the problem in 5.5 via the following test case: – better plan SET optimizer_prune_level = 1; EXPLAIN SELECT count FROM t1, t2, t3 AS t3_alias WHERE b <= 236 AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c ) ORDER BY a LIMIT 1; – worse plan SET optimizer_prune_level = 0; EXPLAIN SELECT count FROM t1, t2, t3 AS t3_alias WHERE b <= 236 AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c ) ORDER BY a LIMIT 1;

            As explained in the last comment this is not a bug, but a deficiency of the current optimizer.
            This problem was filed as a new task MDEV-338.

            timour Timour Katchaounov (Inactive) added a comment - As explained in the last comment this is not a bug, but a deficiency of the current optimizer. This problem was filed as a new task MDEV-338 .

            People

              timour Timour Katchaounov (Inactive)
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.