[MDEV-29682] Incorrect #rows computations in DuplicateWeedout strategy choice Created: 2022-10-02  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.7, 10.11
Fix Version/s: 10.11

Type: Bug Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Monty has pointed out that this example performs incorrect computations in DuplicateWeedout

create table t1(a int, b int);
insert into t1 select seq,seq from seq_0_to_3;
create table t2 (p int, a int);
insert into t2 select seq,seq from seq_1_to_10;
insert into t2 select seq,seq from seq_10_to_100;
explain extended select * from t1 where a in (select p from t2);

Reproducible both on 10.7-selectivity and clean 10.11.

The problem occurs when considering this query plan:

+------+-------------+-------+------+---------------+------+---------+------+------+----------+---------------------------------------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                                                           |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+---------------------------------------------------------------------------------+
|    1 | PRIMARY     | t1    | ALL  | NULL          | NULL | NULL    | NULL | 4    |   100.00 |                                                                                 |
|    1 | PRIMARY     | t2    | ALL  | NULL          | NULL | NULL    | NULL | 101  |   100.00 | Using where; Start temporary; End temporary; Using join buffer (flat, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+---------------------------------------------------------------------------------+

Computations in Duplicate_weedout_picker::check_qep go as follows:

(gdb) p first_tab
  $12 = 1
(gdb) p join->positions[first_tab].prefix_record_count
  $13 = 404

(gdb) print sj_outer_fanout
  $15 = 1
(gdb) print sj_inner_fanout
  $16 = 101

Then

      double full_lookup_cost=
               COST_MULT(join->positions[first_tab].prefix_record_count,
                         COST_MULT(sj_outer_fanout,
                                   sj_inner_fanout * one_lookup_cost));

That is, we are multiplying:

  • join->positions[first_tab].prefix_record_count = 404
  • sj_outer_fanout=1
  • sj_inner_fanout=101 – Multiplying by 101 again!
  • one_lookup_cost = 0.05 – seems ok

Generated at Thu Feb 08 10:10:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.