[MDEV-20109] Optimizer ignores distinct key created for materialized semi-join subquery when searching for best execution plan Created: 2019-07-20  Updated: 2020-08-25  Resolved: 2019-09-17

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: None
Fix Version/s: 10.3.18, 10.4.8

Type: Bug Priority: Critical
Reporter: Igor Babaev Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by MDEV-20482 MyISAM & Aria very slow when IN predi... Closed
Problem/Incident
causes MDEV-20105 Case for bringing in_subquery_convers... Closed
Relates
relates to MDEV-20646 10.3.18 is slower than 10.3.17 Closed

 Description   

Consider the following database tables:

create table t0 (a int); 
insert into t0 values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
create table t1 (a int);
insert into t1 select (t0.a-1)*10 + (t.a-1) + 1 from t0 t, t0;
create table t2 (a int);
insert into t2 values (12), (88), (47), (33), (28);

The following query uses very inefficient execution plan without look-ups into the materialized subquery:

select * from t1 where a in (select a from t2 group by a); 

MariaDB [test]> explain extended select * from t1 where a in (select a from t2 group by a);
+------+--------------+-------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
| id   | select_type  | table       | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |
+------+--------------+-------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
|    1 | PRIMARY      | <subquery2> | ALL  | distinct_key  | NULL | NULL    | NULL |    5 |   100.00 |                                                 |
|    1 | PRIMARY      | t1          | ALL  | NULL          | NULL | NULL    | NULL |  100 |   100.00 | Using where; Using join buffer (flat, BNL join) |
|    2 | MATERIALIZED | t2          | ALL  | NULL          | NULL | NULL    | NULL |    5 |   100.00 |                                                 |
+------+--------------+-------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
 
MariaDB [test]> show warnings;
+-------+------+------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                          |
+-------+------+------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) where (`test`.`t1`.`a` = `test`.`t2`.`a`) |
+-------+------+------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

The same problem we observe for the mergeable semi-join subquery

MariaDB [test]> explain extended select * from t1 where a in (select a from t2);
+------+--------------+-------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
| id   | select_type  | table       | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |
+------+--------------+-------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
|    1 | PRIMARY      | <subquery2> | ALL  | distinct_key  | NULL | NULL    | NULL |    5 |   100.00 |                                                 |
|    1 | PRIMARY      | t1          | ALL  | NULL          | NULL | NULL    | NULL |  100 |   100.00 | Using where; Using join buffer (flat, BNL join) |
|    2 | MATERIALIZED | t2          | ALL  | NULL          | NULL | NULL    | NULL |    5 |   100.00 |                                                 |
+------+--------------+-------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+

This becomes a real problem when t1 and t2 are 100 times bigger.



 Comments   
Comment by Varun Gupta (Inactive) [ 2019-07-23 ]

Debugging the case 1, with GROUP BY

select * from t1 where a in (select a from t2 group by a);

In the function remove_redundant_subquery_clauses(select_lex), we remove the GROUP BY clause, here is the code snippet

  /*
    Remove GROUP BY if there are no aggregate functions and no HAVING
    clause
  */
  if (subq_select_lex->group_list.elements &&
      !subq_select_lex->with_sum_func && !subq_select_lex->join->having)
  {
    for (ORDER *ord= subq_select_lex->group_list.first; ord; ord= ord->next)
    {
      (*ord->item)->walk(&Item::eliminate_subselect_processor, FALSE, NULL);
    }
    subq_select_lex->join->group_list= NULL;
    subq_select_lex->group_list.empty();
    DBUG_PRINT("info", ("GROUP BY removed"));
  }

So the first case is also converted to a semi-join. We can look this in the optimizer trace to be sure too

"join_preparation": {
              "select_id": 2,
              "steps": [
                {
                  "transformation": {
                    "select_id": 2,
                    "from": "IN (SELECT)",
                    "to": "materialization",
                    "sjm_scan_allowed": true,
                    "possible": true
                  }
                },
                {
                  "transformation": {
                    "select_id": 2,
                    "from": "IN (SELECT)",
                    "to": "semijoin",
                    "chosen": true
                  }
                },

Comment by Varun Gupta (Inactive) [ 2019-07-23 ]

lets make sure we consider the case of JTBM
the query is:

select * from t1 where a in (select max(a) from t2 group by a);

We do have a keyuse, looking into the optimizer trace:

          {
            "ref_optimizer_key_uses": [
              {
                "table": "<subquery2>",
                "field": "max(a)",
                "equals": "t1.a",
                "null_rejecting": true
              }
            ]
          },

and the choice of scanning the <subquery2> or making a lookup in the <subquery2> is cost based

Comment by Varun Gupta (Inactive) [ 2019-08-01 ]

Ok I did some digging while fixing issue for MDEV-8306:

MariaDB [test]> explain extended select * from t0 where a in (select a from t2);
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------+
| id   | select_type  | table       | type   | possible_keys | key          | key_len | ref  | rows | filtered | Extra |
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------+
|    1 | PRIMARY      | t0          | ALL    | NULL          | NULL         | NULL    | NULL | 10   |   100.00 |       |
|    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 4       | func | 1    |   100.00 |       |
|    2 | MATERIALIZED | t2          | ALL    | NULL          | NULL         | NULL    | NULL | 10   |   100.00 |       |
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------+

The data used here is:

create table t0 (a int);
insert into t0 values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
create table t2 (a int);
insert into t2 values (12), (88), (47), (33), (28);
insert into t2 values (12), (88), (47), (33), (28);

So the lookups are made in the materialized table for semi-joins too and they are not dependent on the fact that a keyuse should be present.
In the function setup_sj_materialization_part2 , we create the ref access that would be used to make lookup into the materialized table
Also in best_extension_by_limited_search, there is a function advance_sj_state which after adding a table, tries to see if a semi-join strategy is possible and then there is a cost based approach to pick the best semi-join strategy.

Comment by Sergei Petrunia [ 2019-08-05 ]

This becomes a real problem when t1 and t2 are 100 times bigger.

Trying to do that:

create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
 
create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
 
create table t1big like t1;
create table t2big like t2;
insert into t1big select A.a +1000*B.a from one_k A, ten B;
insert into t2big select floor(rand()*1000) from one_k limit 500;

 explain select * from t1big where a in (select a from t2big group by a);
+------+--------------+-------------+------+---------------+------+---------+------+-------+-------------------------------------------------+
| id   | select_type  | table       | type | possible_keys | key  | key_len | ref  | rows  | Extra                                           |
+------+--------------+-------------+------+---------------+------+---------+------+-------+-------------------------------------------------+
|    1 | PRIMARY      | <subquery2> | ALL  | distinct_key  | NULL | NULL    | NULL | 500   |                                                 |
|    1 | PRIMARY      | t1big       | ALL  | NULL          | NULL | NULL    | NULL | 10000 | Using where; Using join buffer (flat, BNL join) |
|    2 | MATERIALIZED | t2big       | ALL  | NULL          | NULL | NULL    | NULL | 500   |                                                 |
+------+--------------+-------------+------+---------------+------+---------+------+-------+-------------------------------------------------+

Looks fairly bad.
But if one disables the join buffering, the plan is good:

set join_cache_level=0;
explain select * from t1big where a in (select a from t2big group by a);
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+-------+
| id   | select_type  | table       | type   | possible_keys | key          | key_len | ref  | rows | Extra |
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+-------+
|    1 | PRIMARY      | t1big       | ALL    | NULL          | NULL         | NULL    | NULL | 9980 |       |
|    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 4       | func | 1    |       |
|    2 | MATERIALIZED | t2big       | ALL    | NULL          | NULL         | NULL    | NULL | 500  |       |
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+-------+

Comment by Sergei Petrunia [ 2019-08-08 ]

Trying on mysql-8.0:

mysql>  explain select * from t1big where a in (select a from t2big group by a);
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type  | table       | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                              |
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
|  1 | SIMPLE       | <subquery2> | NULL       | ALL  | NULL          | NULL | NULL    | NULL | NULL |   100.00 | NULL                                               |
|  1 | SIMPLE       | t1big       | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 9980 |    10.00 | Using where; Using join buffer (Block Nested Loop) |
|  2 | MATERIALIZED | t2big       | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  500 |   100.00 | NULL                                               |
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
3 rows in set, 1 warning (0.01 sec)

The same plan.

mysql> set optimizer_switch='block_nested_loop=off';
Query OK, 0 rows affected (0.00 sec)
 
mysql> explain select * from t1big where a in (select a from t2big group by a);
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type  | table       | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE       | <subquery2> | NULL       | ALL  | NULL          | NULL | NULL    | NULL | NULL |   100.00 | NULL        |
|  1 | SIMPLE       | t1big       | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 9980 |    10.00 | Using where |
|  2 | MATERIALIZED | t2big       | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  500 |   100.00 | NULL        |
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

that is, without join buffering, it still would not use the index of the materialized table

Comment by Sergei Petrunia [ 2019-08-08 ]

Reading MySQL's Optimizer Trace for this:

It starts empty prefix, puts table t2big there first:

  "considered_execution_plans": [
    {
      "plan_prefix": [],
      "table": "`t2big`",
      "best_access_path": { ...},
      "semijoin_strategy_choice": [...],

Then adds t1big:

      "rest_of_plan": [
        {
          "plan_prefix": [ "`t2big`" ],
          "table": "`t1big`",
          "best_access_path": { ... },
          "semijoin_strategy_choice": [

Considers MaterializeScan and DuplicatesWeedout (coosing the former:

          "semijoin_strategy_choice": [
            {
              "strategy": "MaterializeScan",
              ...
            },
            {
              "strategy": "DuplicatesWeedout",
              ...
            }
          ],
          "chosen": true
        }
      ]
    },

Then, it puts t1big first:

    {
      "plan_prefix": [ ],
      "table": "`t1big`",
      "best_access_path": {...},
      "semijoin_strategy_choice": [],
      "pruned_by_heuristic": true
    },
  ]

and prunes away this plan. The join order "t1big, t2big" is never considered.

Comment by Sergei Petrunia [ 2019-08-08 ]

In MariaDB, the trace shows that "t2big,t1big" order is considered:

                "rest_of_plan": [
                  {
                    "plan_prefix": ["t2big"],
                    "table": "t1big",
                    "best_access_path": {
                      "considered_access_paths": [
                        {
                          "access_type": "scan",
                          "resulting_rows": 9980,
                          "cost": 21,
                          "chosen": true
                        }
                      ]
                    }
                  }
                ]

What happens with semi-join optimization is not clear as semi-joins are not yet printed (and maybe some other necessary details are not printed, either).

Comment by Sergei Petrunia [ 2019-08-09 ]

Debugging how MariaDB optimizer processes this

Join order: t1big, t2big

t1big:
  current_record_count=9980
  current_read_time=2017  (= 21 cost===
So we get:
 + 1996 output size penalty)

t2big:
  current_record_count=4,990,000
  // position->read_time=1
  current_read_time = 
      read_time +            // 2017 
      position->read_time +   // 1
      current_record_count/TIME_FOR_COMPARE  // ~1M
  = 1,000,018

advance_sj_state() for "t1big, t2big"
// FirstMatch finds the plan applicable
// SJ-Materialization:
  mat_info->materialization_cost.total_cost()=26
  mat_info->lookup_cost.total_cost()= 0.05
 
  mat_read_time = 2542
  // ^^ this is 2017 + 26 + 9980 * lookup_cost
 
After that, we have:
  (gdb) print current_record_count
    $54 = 9980
  (gdb) print current_read_time
    $55 = 2542
  (gdb) p join->positions[idx].sj_strategy
    $57 = SJ_OPT_MATERIALIZE

Join order: t2big, t1big

t2big:
  current_record_count=500
  current_read_time=101  (= 1 cost + 100 output size penalty)

t1big:
  current_record_count=4,990,000
  // position->read_time=21
 
  current_read_time = 
      read_time +            // 10
      position->read_time +   // 21
      current_record_count/TIME_FOR_COMPARE  // ~1M
  = 998,122

advance_sj_state() for "t2big, t1big":
 
// SJ-Materialization:
mat_info->materialization_cost.total_cost()= 26
mat_info->scan_cost.total_cost()=25
 
prefix_cost= (sum of the above two)=51
prefix_rec_count=500
 
optimize_wo_join_buffering:
(gdb) p curpos.records_read
  $92 = 9980
(gdb) p curpos.read_time
  $93 = 21
 
(gdb) print prefix_rec_count
  $95 = 4990000
(gdb) print prefix_cost
  $96 = 72

back in best_extension_by_limited_search, we get:

(gdb) p *current_read_time
  $100 = 72
(gdb) p *current_record_count
  $101 = 4990000

Two things are wrong here:

  • subquery's fanout is not removed from current_record_count
  • there is no out_rows/TIME_FOR_COMPARE penalty.

Because of that, the cost comparison is not apples-to-apples.
(also note that the two errors work to cancel each other out here)

mysql> explain select * from t1big where a in (select a from t2big);
+------+--------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+
| id   | select_type  | table       | type | possible_keys | key  | key_len | ref  | rows | Extra                                           |
+------+--------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+
|    1 | PRIMARY      | <subquery2> | ALL  | distinct_key  | NULL | NULL    | NULL | 500  |                                                 |
|    1 | PRIMARY      | t1big       | ALL  | NULL          | NULL | NULL    | NULL | 9980 | Using where; Using join buffer (flat, BNL join) |
|    2 | MATERIALIZED | t2big       | ALL  | NULL          | NULL | NULL    | NULL | 500  |                                                 |
+------+--------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+

mysql> show status  like 'Last_query_cost%';
+-----------------+-----------+
| Variable_name   | Value     |
+-----------------+-----------+
| Last_query_cost | 71.999000 |
+-----------------+-----------+

Comment by Sergei Petrunia [ 2019-08-22 ]

Second variant of the patch (smaller):
http://lists.askmonty.org/pipermail/commits/2019-August/013944.html

Comment by Sergei Petrunia [ 2019-08-23 ]

Checked the patch against related fixes:

Testcase from MDEV-20083 is fixed

https://jira.mariadb.org/browse/MDEV-20083?focusedCommentId=131221&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-131221

MDEV-20105, Testcase 1 is fixed

https://jira.mariadb.org/browse/MDEV-20105?focusedCommentId=131291&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-131291 (a4.test) - this is fixed.

MDEV-20105, Test case 2 not fixed

https://jira.mariadb.org/browse/MDEV-20105?focusedCommentId=131292&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-131292 (a5.test)
This is not fixed:

id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
1       PRIMARY regression_1_478389     ALL     NULL    NULL    NULL    NULL    300000
1       PRIMARY <derived3>      ALL     NULL    NULL    NULL    NULL    5000    Using where; FirstMatch(regression_1_478389); Using join buffer (flat, BNL join)
3       DERIVED NULL    NULL    NULL    NULL    NULL    NULL    NULL    No tables used

"r_total_time_ms": 353638, (in MariaDB 10.2: "r_total_time_ms": 384.23)

Materialization is not used because constants are strings (as they are quoted)?

MDEV-20105, Test case 3 - not fixed

https://jira.mariadb.org/browse/MDEV-20105?focusedCommentId=131293&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-131293 (a6.test)

id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
1       PRIMARY a       ALL     PRIMARY NULL    NULL    NULL    300000
1       PRIMARY <derived3>      ALL     NULL    NULL    NULL    NULL    5000    Using where; FirstMatch(a); Using join buffer (flat, BNL join)
1       PRIMARY b       eq_ref  PRIMARY PRIMARY 4       test.a.id       1
1       PRIMARY <derived5>      ALL     NULL    NULL    NULL    NULL    5000    Using where; FirstMatch(b); Using join buffer (flat, BNL join)
5       DERIVED NULL    NULL    NULL    NULL    NULL    NULL    NULL    No tables used
3       DERIVED NULL    NULL    NULL    NULL    NULL    NULL    NULL    No tables used
    "r_total_time_ms": 367334,

In 10.2:

id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
1       SIMPLE  a       ALL     PRIMARY NULL    NULL    NULL    300000  Using where
1       SIMPLE  b       eq_ref  PRIMARY PRIMARY 4       test.a.id       1       Using where
    "r_total_time_ms": 502.61,

Materialization is not used because constants are strings (as they are quoted)?

MDEV-20105, Test case 4 - not fixed

https://jira.mariadb.org/browse/MDEV-20105?focusedCommentId=131294&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-131294 (a7.test)

id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
1       PRIMARY <derived3>      ALL     NULL    NULL    NULL    NULL    5000    Start temporary
1       PRIMARY a       eq_ref  PRIMARY PRIMARY 4       tvc_0._col_1    1       Using index condition; End temporary
1       PRIMARY b       eq_ref  PRIMARY PRIMARY 4       test.a.id       1
1       PRIMARY <derived5>      ALL     NULL    NULL    NULL    NULL    5000    Using where; FirstMatch(b); Using join buffer (flat, BNL join)
5       DERIVED NULL    NULL    NULL    NULL    NULL    NULL    NULL    No tables used
3       DERIVED NULL    NULL    NULL    NULL    NULL    NULL    NULL    No tables used
    "r_total_time_ms": 3683.4,

In 10.2 it was:

1       SIMPLE  a       range   PRIMARY PRIMARY 4       NULL    4996    Using index condition
1       SIMPLE  b       eq_ref  PRIMARY PRIMARY 4       test.a.id       1       Using where
    "r_total_time_ms": 131.5,

Materialization is not used because constants are strings (as they are quoted)?

Comment by Slawomir Pryczek [ 2019-08-26 ]

Hi Sergei, that's great news thanks for the comment. That basically should fix it all at least for us, as we're using only large integer predicate sets, we are using strings too but much lower amounts than 1000 in one IN(...).

I run these not-yet-fixed tests (2,3,4) on 10.4.7 to be sure conversion between int column type in the table and string IN (...) plays no role and yes, times differ insignificiantly with different CREATE TABLE with all columns being varchar (<500ms) just different data type inside IN(...) seems to result in much different performance (for unpatched server), so yes that seems to be the cause...

Is there some way of "forcing" (like FORCE INDEX xxx) of this materialization / unique key?

Thanks,
Slawomir.

Comment by Sergei Petrunia [ 2019-08-29 ]

... Addressed review input, and backported the patch to 10.3. Currently the code is being tested at: https://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.3-mdev20109

Comment by Sergei Petrunia [ 2019-08-29 ]

Hi pslawek83,

> Is there some way of "forcing" (like FORCE INDEX xxx) of this materialization / unique key?

Currently no, there's no way to force it.

Comment by Sergei Petrunia [ 2019-08-30 ]

Ok, the fix is pushed into the MariaDB 10.3 tree:

https://github.com/MariaDB/server/commit/ef76f81c982bdbcfa4797ce26224db9c016ddebd
https://github.com/MariaDB/server/commit/a379f151b18e4bc7e4c26be5f5e1254a59e27bfa

The patches should be applicable to the earlier versions of MariaDB as well. They are not pushed there, because those server versions have been stable for a long time so change in the query optimizer might not be welcome there.
On the other hand, this and related bugs show there is an issue in 10.3 (where in-predicate-to-subquery conversion feature was introduced) , so the patch goes into 10.3 (and not just into 10.4)

Comment by Sergei Petrunia [ 2019-09-03 ]

After this was merged to 10.4, the testcase was failing intermittently. Fixed it by collecting EITS stats for the tables (also fixed similar failure in other parts of the test )

http://lists.askmonty.org/pipermail/commits/2019-September/013964.html

Comment by Sergei Petrunia [ 2019-09-17 ]

Marking this MDEV is closed as the patch has been pushed. The linked issues will be closed after checking the quoted variant.

Generated at Thu Feb 08 08:56:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.