[MDEV-26156] Empty table with EITS statistics causes multiply-by-zero wipe-out in the join optimizer Created: 2021-07-15  Updated: 2021-07-27

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.5
Fix Version/s: 10.5

Type: Bug Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 1
Labels: None

Issue Links:
Relates

 Description   

create table t_empty (
  pk int primary key,
  col1 int
);
 
create table t1 (a int, b int);
insert into t1 select seq, seq from seq_1_to_1000;
 
create table t2 (
  a int,
  b int,
  key(a)
);
insert into t2 select a, a from t1;

explain
select * 
from 
  t1 left join t_empty T on(T.col1 = t1.a)
  join t2 on t2.a=t1.b;

+------+-------------+-------+------+---------------+------+---------+---------+------+-------------------------------------------------+               
| id   | select_type | table | type | possible_keys | key  | key_len | ref     | rows | Extra                                           |               
+------+-------------+-------+------+---------------+------+---------+---------+------+-------------------------------------------------+               
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL    | 1000 | Using where                                     |               
|    1 | SIMPLE      | T     | ALL  | NULL          | NULL | NULL    | NULL    | 1    | Using where; Using join buffer (flat, BNL join) |               
|    1 | SIMPLE      | t2    | ref  | a             | a    | 5       | j4.t1.b | 1    |                                                 |               
+------+-------------+-------+------+---------------+------+---------+---------+------+-------------------------------------------------+

Good.

analyze table t_empty persistent for all;

select * from mysql.table_stats
where db_name=database() and table_name='t_empty';
+---------+------------+-------------+
| db_name | table_name | cardinality |
+---------+------------+-------------+
| j4      | t_empty    |           0 |
+---------+------------+-------------+

re-run the same query:

explain
select * 
from 
  t1 left join t_empty T on(T.col1 = t1.a)
  join t2 on t2.a=t1.b;
+------+-------------+-------+------+---------------+------+---------+---------+------+-------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref     | rows | Extra                                           |
+------+-------------+-------+------+---------------+------+---------+---------+------+-------------------------------------------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL    | 1000 | Using where                                     |
|    1 | SIMPLE      | T     | ALL  | NULL          | NULL | NULL    | NULL    | 0    | Using where; Using join buffer (flat, BNL join) |
|    1 | SIMPLE      | t2    | ref  | a             | a    | 5       | j4.t1.b | 1    |                                                 |
+------+-------------+-------+------+---------------+------+---------+---------+------+-------------------------------------------------+

note that T.rows=0 now.
Let's check what effect this has on the join optimization:

        "plan_prefix": ["t1"],
        "table": "T",
        "best_access_path": {
          "considered_access_paths": [
            {
              "access_type": "scan",
              "resulting_rows": 0,
              "cost": 1000,
              "chosen": true
            }
          ],
          "chosen_access_method": {
            "type": "scan",
            "records": 0,
            "cost": 1000,
            "uses_join_buffering": false
          }
        },
        "rows_for_plan": 0,
        "cost_for_plan": 1204,
        "rest_of_plan": [
          {
            "plan_prefix": ["t1", "T"],
            "table": "t2",
            "best_access_path": {
              "considered_access_paths": [
                {
                  "access_type": "ref",
                  "index": "a",
                  "used_range_estimates": false,
                  "cause": "not available",
                  "rows": 1,
                  "cost": 0,
                  "chosen": true
                },

look for best_access_path at table t2. it has cost:0. This is because incoming record_count=0. This multiply-by-zero doesn't let the optimizer make a meaningful query plan choice for tables after the empty table.


Generated at Thu Feb 08 09:43:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.