[MDEV-30361] Wrong result with joins in WHERE IN subquery Created: 2023-01-08  Updated: 2023-12-07

Status: Stalled
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Igor Babaev
Resolution: Unresolved Votes: 0
Labels: 11.0-sel

Attachments: File mdev30361-dataset.sql     Text File mdev30361-join-explain-output.txt     File mdev30361-join-query.sql    

 Description   

Possibly the test case can be reduced further, I kept the structures close to the original dbt3, just removed unnecessary columns and rows.

CREATE DATABASE dbt3;
USE dbt3;
 
CREATE TABLE `customer` (
  `c_nationkey` int(11) DEFAULT NULL,
  KEY `i_c_nationkey` (`c_nationkey`)
) ENGINE=MyISAM;
 
INSERT INTO `customer` VALUES (18),(17),(8);
 
CREATE TABLE `lineitem` (
  `l_orderkey` int(11) NOT NULL DEFAULT 0,
  `l_partkey` int(11) DEFAULT NULL,
  `l_linenumber` int(11) NOT NULL DEFAULT 0,
  PRIMARY KEY (`l_orderkey`,`l_linenumber`)
) ENGINE=MyISAM;
 
INSERT INTO `lineitem` VALUES (1,1,1),(1,1,4),(3,1,1);
 
CREATE TABLE `nation` (
  `n_nationkey` int(11) NOT NULL,
  `n_regionkey` int(11) DEFAULT NULL,
  PRIMARY KEY (`n_nationkey`),
  KEY `i_n_regionkey` (`n_regionkey`)
) ENGINE=MyISAM;
 
INSERT INTO `nation` VALUES (17,1),(24,1);
 
CREATE TABLE `orders` (
  `o_orderkey` int(11) NOT NULL,
  `o_totalprice` double DEFAULT NULL,
  PRIMARY KEY (`o_orderkey`)
) ENGINE=MyISAM;
 
INSERT INTO `orders` VALUES
(1,125405.67),(2,36349.29),(3,150931.2),(4,28568.05),(5,80601.73);
 
CREATE TABLE `partsupp` (
  `ps_partkey` int(11) NOT NULL DEFAULT 0,
  `ps_suppkey` int(11) NOT NULL DEFAULT 0,
  PRIMARY KEY (`ps_partkey`,`ps_suppkey`)
) ENGINE=MyISAM;
 
INSERT INTO `partsupp` VALUES (1,1),(17,1);
 
CREATE TABLE `region` (
  `r_regionkey` int(11) NOT NULL,
  PRIMARY KEY (`r_regionkey`)
) ENGINE=MyISAM;
 
INSERT INTO `region` VALUES (0),(1),(2),(3),(4);
 
CREATE TABLE `supplier` (
  `s_suppkey` int(11) NOT NULL,
  `s_nationkey` int(11) DEFAULT NULL,
  `s_acctbal` double DEFAULT NULL,
  PRIMARY KEY (`s_suppkey`)
) ENGINE=MyISAM;
 
INSERT INTO `supplier` VALUES (1,17,5755.94);
 
SELECT n_nationkey FROM nation LEFT JOIN region ON ( r_regionkey = n_regionkey ) JOIN supplier ON ( s_nationkey = n_nationkey ) JOIN partsupp ON ( s_suppkey = ps_suppkey ) JOIN lineitem ON ( ps_partkey = l_partkey ) JOIN orders ON ( l_orderkey = o_orderkey ) JOIN customer WHERE o_totalprice = 151 OR c_nationkey = s_nationkey AND s_acctbal > 187;
 
CREATE TABLE t (a INT);
INSERT INTO t VALUES (1),(17),(2),(17);
 
SELECT * FROM t WHERE a IN (
  SELECT n_nationkey FROM nation LEFT JOIN region ON ( r_regionkey = n_regionkey ) JOIN supplier ON ( s_nationkey = n_nationkey ) JOIN partsupp ON ( s_suppkey = ps_suppkey ) JOIN lineitem ON ( ps_partkey = l_partkey ) JOIN orders ON ( l_orderkey = o_orderkey ) JOIN customer WHERE o_totalprice = 151 OR c_nationkey = s_nationkey AND s_acctbal > 187
);
 
# Cleanup
DROP DATABASE dbt3;

SELECT n_nationkey FROM nation LEFT JOIN region ON ( r_regionkey = n_regionkey ) JOIN supplier ON ( s_nationkey = n_nationkey ) JOIN partsupp ON ( s_suppkey = ps_suppkey ) JOIN lineitem ON ( ps_partkey = l_partkey ) JOIN orders ON ( l_orderkey = o_orderkey ) JOIN customer WHERE o_totalprice = 151 OR c_nationkey = s_nationkey AND s_acctbal > 187;
n_nationkey
17
17
17
CREATE TABLE t (a INT);
INSERT INTO t VALUES (1),(17),(2),(17);
SELECT * FROM t WHERE a IN (
SELECT n_nationkey FROM nation LEFT JOIN region ON ( r_regionkey = n_regionkey ) JOIN supplier ON ( s_nationkey = n_nationkey ) JOIN partsupp ON ( s_suppkey = ps_suppkey ) JOIN lineitem ON ( ps_partkey = l_partkey ) JOIN orders ON ( l_orderkey = o_orderkey ) JOIN customer WHERE o_totalprice = 151 OR c_nationkey = s_nationkey AND s_acctbal > 187
);
a

So, the first SELECT above returns rows with value 17.
The table t contains values 17.
However, when the same SELECT is used as IN subquery, the resulting query doesn't return anything.

Plan:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	supplier	system	PRIMARY	NULL	NULL	NULL	1	100.00	
1	PRIMARY	nation	const	PRIMARY	PRIMARY	4	const	1	100.00	
1	PRIMARY	region	const	PRIMARY	PRIMARY	4	const	1	100.00	Using index
1	PRIMARY	<subquery2>	ALL	distinct_key	NULL	NULL	NULL	1	100.00	Using where
1	PRIMARY	t	ALL	NULL	NULL	NULL	NULL	4	100.00	Using where; Using join buffer (flat, BNL join)
2	MATERIALIZED	lineitem	ALL	PRIMARY	NULL	NULL	NULL	3	100.00	Using where
2	MATERIALIZED	orders	eq_ref	PRIMARY	PRIMARY	4	dbt3.lineitem.l_orderkey	1	100.00	
2	MATERIALIZED	customer	index	i_c_nationkey	i_c_nationkey	5	NULL	3	100.00	Using where; Using index; Using join buffer (flat, BNL join)
2	MATERIALIZED	partsupp	eq_ref	PRIMARY	PRIMARY	8	dbt3.lineitem.l_partkey,const	1	100.00	Using index
Warnings:
Note	1003	select `dbt3`.`t`.`a` AS `a` from `dbt3`.`t` semi join (`dbt3`.`nation` left join `dbt3`.`region` on(1 = 1) join `dbt3`.`partsupp` join `dbt3`.`lineitem` join `dbt3`.`orders` join `dbt3`.`customer`) where `dbt3`.`t`.`a` = 17 and `dbt3`.`orders`.`o_orderkey` = `dbt3`.`lineitem`.`l_orderkey` and `dbt3`.`partsupp`.`ps_partkey` = `dbt3`.`lineitem`.`l_partkey` and `dbt3`.`partsupp`.`ps_suppkey` = 1 and (`dbt3`.`orders`.`o_totalprice` = 151 or `dbt3`.`customer`.`c_nationkey` = 17 and `dbt3`.`t`.`a` = 17)



 Comments   
Comment by Michael Widenius [ 2023-01-09 ]

I played with the query a bit.
If I change the t1 table to have only 3 values 1,17 and 2 then things works. Adding a row with value '17' changes the plan to
use materialization of <subquery2> and in this case the result is 0 rows.

The same happens if the last entry in t is '18.

Adding a key on 't' changes the plan to not use materialization or join_buffer and then things works.
I also tried with disabling join_cache. In this case materialization still happens and the bug is still there.

This indicates something is wrong in materialization

Comment by Sergei Petrunia [ 2023-01-11 ]

Take-aways from discussion with Monty:

The problem can be seen here in EXPLAIN FORMAT=JSON output:

          "table_name": "<subquery2>",
          "access_type": "ALL",
          "possible_keys": ["distinct_key"],
          "rows": 1,
          "filtered": 100,
          "attached_condition": "orders.o_totalprice = 151 or customer.c_nationkey = 17",

Here, the attached_condition refers to columns of tables that are inside the materialization nest.

One could argue that _nationkey is a part of IN-equality and could be accessible (but it's a part of a multiple-equality so we'll need to check which table's _nationkey the data is unpacked to).

But it's clear that o_totalprice should NOT be accessed when materialization is already done...

Comment by Sergei Petrunia [ 2023-01-11 ]

Note that EXPLAIN FORMAT=JSON does show that the bad-condition was checked inside the materialization nest:

      {
        "table": {
          "table_name": "<subquery2>",
          "access_type": "ALL",
          "possible_keys": ["distinct_key"],
          "rows": 1,
          "filtered": 100,
          "attached_condition": "orders.o_totalprice = 151 or customer.c_nationkey = 17",
          "materialized": {
            "unique": 1,
            "query_block": {
              "select_id": 2,
              "nested_loop": [

...

                {
                  "table": {
                    "table_name": "customer",
                    "access_type": "index",
                    "possible_keys": ["i_c_nationkey"],
                    "key": "i_c_nationkey",
                    "key_length": "5",
                    "used_key_parts": ["c_nationkey"],
                    "rows": 3,
                    "filtered": 100,
                    "attached_condition": "orders.o_totalprice = 151 or customer.c_nationkey = 17",
                    "using_index": true
                  }
                }
              ]
            } -- query block #2

Comment by Sergei Petrunia [ 2023-01-11 ]

Query:

SELECT * FROM t WHERE a IN (
  SELECT n_nationkey 
  FROM 
    nation                                            --- const
    LEFT JOIN region ON ( r_regionkey = n_regionkey ) --- const
    JOIN supplier ON ( s_nationkey = n_nationkey )    --- const
    JOIN partsupp ON ( s_suppkey = ps_suppkey ) 
    JOIN lineitem ON ( ps_partkey = l_partkey ) 
    JOIN orders ON ( l_orderkey = o_orderkey ) 
    JOIN customer 
  WHERE o_totalprice = 151 OR c_nationkey = s_nationkey AND s_acctbal > 187
  );

Comment by Sergei Petrunia [ 2023-01-11 ]

Where does

"orders.o_totalprice = 151 or customer.c_nationkey = 17"

come from?

Query's WHERE after rewrites:

"resulting_condition": 
  "(
     orders.o_totalprice = 151 or 
     (
       supplier.s_acctbal > 187 and 
       multiple equal(t.a, nation.n_nationkey, supplier.s_nationkey, customer.c_nationkey)
     )
   ) 
   and 
     multiple equal(t.a, nation.n_nationkey, supplier.s_nationkey) and 
     multiple equal(lineitem.l_orderkey, orders.o_orderkey) and 
     multiple equal(partsupp.ps_partkey, lineitem.l_partkey) and 
     multiple equal(supplier.s_suppkey, partsupp.ps_suppkey)
"

The original

  o_totalprice = 151 OR (c_nationkey = s_nationkey AND s_acctbal > 187)

became:

    orders.o_totalprice = 151 or 
    (
      supplier.s_acctbal > 187 and 
      multiple equal(t.a, nation.n_nationkey, supplier.s_nationkey, customer.c_nationkey)
    )

That is, "customer.c_nationkey = supplier.s_nationkey" became

    multiple equal(t.a, nation.n_nationkey, supplier.s_nationkey, customer.c_nationkey)

The two new members of the multiple equality are "inherited" from this top-level multiple-equality:

     multiple equal(t.a, nation.n_nationkey, supplier.s_nationkey) 

Comment by Sergei Petrunia [ 2023-01-11 ]

But why are pair-wise equalities generated from inherited members?

I think they should not be.
Consider this example (further TRIVIAL-EXAMPLE):

create table t (
  c1 int,
  c2 int,
  c3 int,
  c4 int,
  c10 int
);

explain format=json
select * from t 
where
  c1=c2 and c2=c3 and (c10 < 3333 or c3=c4);

shows:

  "attached_condition": 
   "t.c2 = t.c1 and t.c3 = t.c1 and (t.c10 < 3333 or t.c4 = t.c1)"

Note that the right side of the (... or ...) only includes "c4=c1" and doesn't include "c1=c2" or "c2=c3"....

Comment by Sergei Petrunia [ 2023-01-11 ]

eliminate_item_equal() does not remove the "inherited" members of the multiple equality, because the Item_equal object has upper_levels=NULL.

In substitute_for_best_equal_field() right before the call to eliminate_item_equal() we have:

(gdb) printf "%s\n", dbug_print_item(item_equal)
  multiple equal(17, customer.c_nationkey, t.a)
 
(gdb) p item_equal->upper_levels
  $70 = (COND_EQUAL *) 0x0

for comparison, in the TRIVIAL-EXAMPLE:

(gdb) p dbug_print_item(item_equal)
  $111 = 0x555557c202c0 <dbug_item_print_buf> "multiple equal(t.c1, t.c2, t.c3, t.c4)"
(gdb) p item_equal->upper_levels
  $113 = (COND_EQUAL *) 0x7fff980182b0
 
(gdb) p dbug_print_item(item_equal->upper_levels->current_level.elem(0))
  $127 = 0x555557c202c0 <dbug_item_print_buf> "multiple equal(t.c1, t.c2, t.c3)"

Comment by Sergei Petrunia [ 2023-01-11 ]

One can observe multiple equality to generate "redundant" conditions even without any semi-joins. I will post a query and full explain output below.

The important part:

        "table": {
          "table_name": "t",
          "access_type": "ALL",
          "rows": 4,
          "filtered": 100,
          "attached_condition": "t.a = 17 and (orders.o_totalprice = 151 or customer.c_nationkey = 17 and t.a = 17)"

Note the

  t.a=17 AND 
  ( .... OR (... AND t.a=17))

The first conjunct of the top-level AND (line #1) guarantees that t.a=17 is true. Why is t.a=17 also generated inside the OR on line #2?

igor, is this a bug of in multiple equality code or its intended property? If it's a bug, should we fix it in stable releases (or should we make SJ-Materialization code work with this behavior?)

Comment by Igor Babaev [ 2023-01-12 ]

psergei yes, there is a bug in the code of remove Item_cond::remove_eq_conds().
After this patch

@@ -16838,6 +16838,11 @@ Item_cond::remove_eq_conds(THD *thd, Item::cond_result *cond_value,
   {                                                // Remove list
     item= ((Item_cond*) cond)->argument_list()->head();
     ((Item_cond*) cond)->argument_list()->empty();
+    if (and_level &&
+        item->type() == Item::FUNC_ITEM &&
+        ((Item_cond*) item)->functype() == Item_func::MULT_EQUAL_FUNC)
+      ((Item_equal *) item)->upper_levels=
+        ((Item_cond_and *) this)->m_cond_equal.upper_levels;
     return item;
   }
   *cond_value= Item::COND_OK;

we have

 
MariaDB [test]> explain format=json SELECT *  FROM    t    JOIN nation  ON (t.a = nation.n_nationkey)    LEFT JOIN region ON ( r_regionkey = n_regionkey )   JOIN supplier ON ( s_nationkey = n_nationkey )   JOIN partsupp ON ( s_suppkey = ps_suppkey )    JOIN lineitem ON ( ps_partkey = l_partkey )    JOIN orders ON ( l_orderkey = o_orderkey )    JOIN customer  WHERE o_totalprice = 151 OR c_nationkey = s_nationkey AND s_acctbal > 187;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "supplier",
      "access_type": "system",
      "possible_keys": ["PRIMARY"],
      "rows": 1,
      "filtered": 100
    },
    "table": {
      "table_name": "nation",
      "access_type": "const",
      "possible_keys": ["PRIMARY"],
      "key": "PRIMARY",
      "key_length": "4",
      "used_key_parts": ["n_nationkey"],
      "ref": ["const"],
      "rows": 1,
      "filtered": 100
    },
    "table": {
      "table_name": "region",
      "access_type": "const",
      "possible_keys": ["PRIMARY"],
      "key": "PRIMARY",
      "key_length": "4",
      "used_key_parts": ["r_regionkey"],
      "ref": ["const"],
      "rows": 1,
      "filtered": 100,
      "using_index": true
    },
    "table": {
      "table_name": "lineitem",
      "access_type": "ALL",
      "possible_keys": ["PRIMARY"],
      "rows": 3,
      "filtered": 100,
      "attached_condition": "lineitem.l_partkey is not null"
    },
    "table": {
      "table_name": "partsupp",
      "access_type": "eq_ref",
      "possible_keys": ["PRIMARY"],
      "key": "PRIMARY",
      "key_length": "8",
      "used_key_parts": ["ps_partkey", "ps_suppkey"],
      "ref": ["test.lineitem.l_partkey", "const"],
      "rows": 1,
      "filtered": 100,
      "using_index": true
    },
    "table": {
      "table_name": "orders",
      "access_type": "eq_ref",
      "possible_keys": ["PRIMARY"],
      "key": "PRIMARY",
      "key_length": "4",
      "used_key_parts": ["o_orderkey"],
      "ref": ["test.lineitem.l_orderkey"],
      "rows": 1,
      "filtered": 100
    },
    "table": {
      "table_name": "customer",
      "access_type": "index",
      "possible_keys": ["i_c_nationkey"],
      "key": "i_c_nationkey",
      "key_length": "5",
      "used_key_parts": ["c_nationkey"],
      "rows": 3,
      "filtered": 100,
      "attached_condition": "orders.o_totalprice = 151 or customer.c_nationkey = 17",
      "using_index": true
    },
    "table": {
      "table_name": "t",
      "access_type": "ALL",
      "rows": 4,
      "filtered": 100,
      "attached_condition": "t.a = 17"
    }
  }
} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Comment by Sergei Petrunia [ 2023-01-14 ]

The original testcase is fixed by the above patch.

Comment by Sergei Petrunia [ 2023-01-18 ]

igor, so could you please take over and get the fix for the multiple equality code pushed?

Generated at Thu Feb 08 10:15:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.