[MDEV-18378] Estimates for a non-merged join having implicit grouping is not showing 1 with optimizer_use_condition_selectivity >1 Created: 2019-01-25  Updated: 2023-04-27

Status: Stalled
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.0, 10.1, 10.2, 10.3, 10.4
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Varun Gupta (Inactive) Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: None


 Description   

CREATE TABLE t1 (a INT, KEY(a)) ENGINE=MyISAM;
INSERT INTO t1 VALUES (8),(0);
CREATE TABLE t2 (b INT, c VARCHAR(1)) ENGINE=MyISAM;
INSERT INTO t2 VALUES (4,'j'),(6,'v');
CREATE TABLE t3 (d VARCHAR(1)) ENGINE=MyISAM;
INSERT INTO t3 VALUES ('b'),('c');
 
MariaDB [test]>  set optimizer_use_condition_selectivity=1;
Query OK, 0 rows affected (0.003 sec)

MariaDB [test]> explain SELECT * FROM t1 WHERE a = (SELECT MAX(b) FROM t2 WHERE c IN (SELECT MAX(d) FROM t3)) OR a = 10;
+------+--------------+-------------+-------+---------------+------+---------+------+------+-------------------------------------------------+
| id   | select_type  | table       | type  | possible_keys | key  | key_len | ref  | rows | Extra                                           |
+------+--------------+-------------+-------+---------------+------+---------+------+------+-------------------------------------------------+
|    1 | PRIMARY      | t1          | index | a             | a    | 5       | NULL |    2 | Using where; Using index                        |
|    2 | SUBQUERY     | <subquery3> | ALL   | distinct_key  | NULL | NULL    | NULL |    1 |                                                 |
|    2 | SUBQUERY     | t2          | ALL   | NULL          | NULL | NULL    | NULL |    2 | Using where; Using join buffer (flat, BNL join) |
|    3 | MATERIALIZED | t3          | ALL   | NULL          | NULL | NULL    | NULL |    2 |                                                 |
+------+--------------+-------------+-------+---------------+------+---------+------+------+-------------------------------------------------+
4 rows in set (0.005 sec)

MariaDB [test]> set optimizer_use_condition_selectivity=4;
Query OK, 0 rows affected (0.000 sec)

MariaDB [test]> explain SELECT * FROM t1 WHERE a = (SELECT MAX(b) FROM t2 WHERE c IN (SELECT MAX(d) FROM t3)) OR a = 10;
+------+--------------+-------------+--------+---------------+--------------+---------+-----------+------+--------------------------+
| id   | select_type  | table       | type   | possible_keys | key          | key_len | ref       | rows | Extra                    |
+------+--------------+-------------+--------+---------------+--------------+---------+-----------+------+--------------------------+
|    1 | PRIMARY      | t1          | index  | a             | a            | 5       | NULL      |    2 | Using where; Using index |
|    2 | SUBQUERY     | t2          | ALL    | NULL          | NULL         | NULL    | NULL      |    2 | Using where              |
|    2 | SUBQUERY     | <subquery3> | eq_ref | distinct_key  | distinct_key | 4       | test.t2.c |    1 |                          |
|    3 | MATERIALIZED | t3          | ALL    | NULL          | NULL         | NULL    | NULL      |    2 |                          |
+------+--------------+-------------+--------+---------------+--------------+---------+-----------+------+--------------------------+
4 rows in set (0.003 sec)

The <subquery3> is materialised and has implicit grouping, so we are sure that it would have 1 record, but in the second case the estimate is 2.



 Comments   
Comment by Varun Gupta (Inactive) [ 2019-01-25 ]

Debugging for the case with optimizer_use_condition_selectivity=4

* thread #2, stop reason = step over
    frame #0: 0x0000000100e8eb71 mysqld`matching_candidates_in_table(s=0x000062900010f8c8, with_found_constraint=false, use_cond_selectivity=4) at sql_select.cc:6776
   6773	    TABLE *table= s->table;
   6774	    double sel= table->cond_selectivity;
   6775	    double table_records= (double)table->stat_records();
-> 6776	    dbl_records= table_records * sel;
   6777	    return dbl_records;
   6778	  }
   6779	
Target 0: (mysqld) stopped.
(lldb) p table->alias.ptr()
(const char *) $7 = 0x000062900010bf80 "<subquery3>"
(lldb) p table_records
(double) $8 = 2
(lldb) p table->cond_selectivity;
(double) $9 = 1
(lldb) 

So here we see that <subquery3> has records 2 and selectivity=1 so we return 2 rows from here.

Comment by Varun Gupta (Inactive) [ 2019-01-25 ]

Now for optimizer_use_condition_selectivity=1

Process 657 stopped
* thread #2, stop reason = step over
    frame #0: 0x0000000100e8ebd8 mysqld`matching_candidates_in_table(s=0x00006290001238c8, with_found_constraint=false, use_cond_selectivity=1) at sql_select.cc:6791
   6788	    This heuristic is supposed to force tables used in exprZ to be before
   6789	    this table in join order.
   6790	  */
-> 6791	  if (with_found_constraint)
   6792	    records-= records/4;
   6793	
   6794	    /*
Target 0: (mysqld) stopped.
(lldb) p records
(ha_rows) $12 = 1
(lldb) p s->found_records
(ha_rows) $13 = 1

So here we see that the <subquery3> has an estimate of 1 record.

Comment by Varun Gupta (Inactive) [ 2019-01-25 ]

Patch
http://lists.askmonty.org/pipermail/commits/2019-January/013317.html

Comment by Igor Babaev [ 2019-05-01 ]

Varun,
You covered changed the code only for the case when the subquery is filled at execution. Not every subquery with implicit grouping is filled at execution. So you need to cover the case with the subquery with implicit grouping is not filled at execution.

Comment by Varun Gupta (Inactive) [ 2019-05-13 ]

MariaDB [test]> set optimizer_switch='materialization=off';
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> explain SELECT * FROM t1 WHERE a = (SELECT MAX(b) FROM t2 WHERE c IN (SELECT MAX(d) FROM t3));
+------+--------------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id   | select_type        | table | type | possible_keys | key  | key_len | ref   | rows | Extra                    |
+------+--------------------+-------+------+---------------+------+---------+-------+------+--------------------------+
|    1 | PRIMARY            | t1    | ref  | a             | a    | 5       | const |    0 | Using where; Using index |
|    2 | SUBQUERY           | t2    | ALL  | NULL          | NULL | NULL    | NULL  |    2 | Using where              |
|    3 | DEPENDENT SUBQUERY | t3    | ALL  | NULL          | NULL | NULL    | NULL  |    2 |                          |
+------+--------------------+-------+------+---------------+------+---------+-------+------+--------------------------+
3 rows in set (0.00 sec)

For the query I switch off materialization, to cover the case when the subquery with implicit grouping is not filled at execution. So we use the IN -> EXISTS transformation here and the IN subquery is converted to EXISTS

SELECT * FROM t1
WHERE
a = (SELECT MAX(b) FROM t2 WHERE EXISTS (SELECT 1 FROM t3 having MAX(d) = t2.c))

So for <subquery3> we still would need to estimate all the rows for table t3

Comment by Varun Gupta (Inactive) [ 2019-05-13 ]

Ok after some investigations I see that for tables filled at execution we don't have any selectivities, we have a call to the function get_delayed_estimates(), which gives an estimate of the tables filled at execution.
I think we should use these estimates instead of stat.records

Comment by Varun Gupta (Inactive) [ 2019-05-14 ]

Patch
http://lists.askmonty.org/pipermail/commits/2019-May/013768.html

Generated at Thu Feb 08 08:43:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.