[MDEV-16765] missing rows with condition on subselect Created: 2018-07-16  Updated: 2019-03-07  Resolved: 2018-08-20

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.3.8, 10.2, 10.3
Fix Version/s: 10.2.18

Type: Bug Priority: Major
Reporter: Benjamin Setzer Assignee: Galina Shalygina (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

debian/jessie


Attachments: File test_dump.sql    
Issue Links:
Relates
relates to MDEV-18383 Change of behaviour of OR in IF-condi... Closed

 Description   

I got some strange behaviour with subselect and missing rows.

following select produces 2 rows, and is_blacklisted equals 0 in both:

SELECT anon_1.shop_id AS shop_id,
       anon_1.is_blacklisted AS is_blacklisted
FROM
  (SELECT anon_2.shop_id AS shop_id,
          CASE
              WHEN (
                        (
                             (
                                 anon_2.shop_white_flag IS NOT NULL
                                 AND (anon_2.white_flags IS NULL
                                      OR anon_2.white_flags NOT LIKE concat(concat('%|', anon_2.shop_white_flag), '|%')))
                             )
                    ) THEN 1
              ELSE 0
          END AS is_blacklisted
   FROM
     (SELECT
             customer_data.shop_id AS shop_id,
             syndication_white_flags_1.code AS shop_white_flag,
             group_concat(DISTINCT concat('|', syndication_white_flags.code, '|')) AS white_flags
      FROM customer_data
      INNER JOIN syndication_partners
      LEFT OUTER JOIN syndication_partner_white_flags ON syndication_partner_white_flags.partner_id = syndication_partners.id
      LEFT OUTER JOIN syndication_white_flags ON syndication_white_flags.id = syndication_partner_white_flags.flag_id
      LEFT OUTER JOIN syndication_shop_white_flags ON syndication_shop_white_flags.customer_data_id = customer_data.id
      LEFT OUTER JOIN syndication_white_flags AS syndication_white_flags_1 ON syndication_white_flags_1.id = syndication_shop_white_flags.white_flag_id
      WHERE
        syndication_partners.id = 427
      GROUP BY customer_data.shop_id,
               syndication_white_flags_1.code
      ) AS anon_2
    ) AS anon_1
;

Now I add a condition on is_blacklisted and I only get 1 row. (I tried several versions, "IS false", "= 0", "IS NOT true", all with the same behaviour)

SELECT anon_1.shop_id AS shop_id,
       anon_1.is_blacklisted AS is_blacklisted
FROM
  (SELECT anon_2.shop_id AS shop_id,
          CASE
              WHEN (
                        (
                             (
                                 anon_2.shop_white_flag IS NOT NULL
                                 AND (anon_2.white_flags IS NULL
                                      OR anon_2.white_flags NOT LIKE concat(concat('%|', anon_2.shop_white_flag), '|%')))
                             )
                    ) THEN 1
              ELSE 0
          END AS is_blacklisted
   FROM
     (SELECT
             customer_data.shop_id AS shop_id,
             syndication_white_flags_1.code AS shop_white_flag,
             group_concat(DISTINCT concat('|', syndication_white_flags.code, '|')) AS white_flags
      FROM customer_data
      INNER JOIN syndication_partners
      LEFT OUTER JOIN syndication_partner_white_flags ON syndication_partner_white_flags.partner_id = syndication_partners.id
      LEFT OUTER JOIN syndication_white_flags ON syndication_white_flags.id = syndication_partner_white_flags.flag_id
      LEFT OUTER JOIN syndication_shop_white_flags ON syndication_shop_white_flags.customer_data_id = customer_data.id
      LEFT OUTER JOIN syndication_white_flags AS syndication_white_flags_1 ON syndication_white_flags_1.id = syndication_shop_white_flags.white_flag_id
      WHERE
        syndication_partners.id = 427
      GROUP BY customer_data.shop_id,
               syndication_white_flags_1.code
      ) AS anon_2
    ) AS anon_1
WHERE anon_1.is_blacklisted IS false
;

When I select into a temporary table, I get 2 rows again.

CREATE TEMPORARY TABLE foo SELECT anon_1.shop_id AS shop_id,
       anon_1.is_blacklisted AS is_blacklisted
FROM
  (SELECT anon_2.shop_id AS shop_id,
          CASE
              WHEN (
                        (
                             (
                                 anon_2.shop_white_flag IS NOT NULL
                                 AND (anon_2.white_flags IS NULL
                                      OR anon_2.white_flags NOT LIKE concat(concat('%|', anon_2.shop_white_flag), '|%')))
                             )
                    ) THEN 1
              ELSE 0
          END AS is_blacklisted
   FROM
     (SELECT
             customer_data.shop_id AS shop_id,
             syndication_white_flags_1.code AS shop_white_flag,
             group_concat(DISTINCT concat('|', syndication_white_flags.code, '|')) AS white_flags
      FROM customer_data
      INNER JOIN syndication_partners
      LEFT OUTER JOIN syndication_partner_white_flags ON syndication_partner_white_flags.partner_id = syndication_partners.id
      LEFT OUTER JOIN syndication_white_flags ON syndication_white_flags.id = syndication_partner_white_flags.flag_id
      LEFT OUTER JOIN syndication_shop_white_flags ON syndication_shop_white_flags.customer_data_id = customer_data.id
      LEFT OUTER JOIN syndication_white_flags AS syndication_white_flags_1 ON syndication_white_flags_1.id = syndication_shop_white_flags.white_flag_id
      WHERE
        syndication_partners.id = 427
      GROUP BY customer_data.shop_id,
               syndication_white_flags_1.code
      ) AS anon_2
    ) AS anon_1
WHERE anon_1.is_blacklisted IS false
;

If I remove the syndication_white_flags_1.code in the group_by, it works too.
This bug didn't exist in 10.0.35.
We skipped the versions in between, so I can't say if this bug already existed in 10.1 or 10.2

Possibly related to: https://bugs.mysql.com/bug.php?id=88300
I don't think it's the same bug, but it's the only thing I found related to subqueries and incorrect returnsets



 Comments   
Comment by Alice Sherepa [ 2018-07-17 ]

Thanks for the report!
Reproducible on 10.2, 10.3

CREATE TABLE t1 (id int, v1 varchar(25)) ;
INSERT INTO t1 VALUES (5,'test');
 
SELECT tmp1.ib
FROM (SELECT CASE WHEN (tmp2.wf IS NULL OR tmp2.wf NOT LIKE '%test%') THEN 1 ELSE 0 END AS ib
   FROM (SELECT 'test' AS wf FROM t1 GROUP BY t1.v1) AS tmp2) AS tmp1;
 
SELECT tmp1.ib
FROM (SELECT CASE WHEN (tmp2.wf IS NULL OR tmp2.wf NOT LIKE '%test%') THEN 1 ELSE 0 END AS ib
   FROM (SELECT 'test' AS wf FROM t1 GROUP BY t1.v1) AS tmp2) AS tmp1
WHERE tmp1.ib=0 ;

MariaDB [test]> SELECT tmp1.ib
    -> FROM (SELECT CASE WHEN (tmp2.wf IS NULL OR tmp2.wf NOT LIKE '%test%') THEN 1 ELSE 0 END AS ib
    ->    FROM (SELECT 'test' AS wf FROM t1 GROUP BY t1.v1) AS tmp2) AS tmp1;
+----+
| ib |
+----+
|  0 |
+----+
1 row in set (0.00 sec)
 
MariaDB [test]> SELECT tmp1.ib
    -> FROM (SELECT CASE WHEN (tmp2.wf IS NULL OR tmp2.wf NOT LIKE '%test%') THEN 1 ELSE 0 END AS ib
    ->    FROM (SELECT 'test' AS wf FROM t1 GROUP BY t1.v1) AS tmp2) AS tmp1
    -> WHERE tmp1.ib=0 ;
Empty set (0.00 sec)
 
MariaDB [test]> explain extended 
    -> SELECT tmp1.ib
    -> FROM (SELECT CASE WHEN (tmp2.wf IS NULL OR tmp2.wf NOT LIKE '%test%') THEN 1 ELSE 0 END AS ib
    ->    FROM (SELECT 'test' AS wf FROM t1 GROUP BY t1.v1) AS tmp2) AS tmp1
    -> WHERE tmp1.ib=0 ;
+------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+
| id   | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                        |
+------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+
|    1 | PRIMARY     | <derived3> | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using where                                  |
|    3 | DERIVED     | t1         | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using where; Using temporary; Using filesort |
+------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
 
Note (Code 1003): select case when (<cache>(`tmp2`.`wf` is null) or `tmp2`.`wf`  not like '%test%') then 1 else 0 end AS `ib` from (select 'test' AS `wf` from `test`.`t1` where case when (<cache>(`tmp2`.`wf` is null) or `tmp2`.`wf`  not like '%test%') then 1 else 0 end = 0 group by `test`.`t1`.`v1`) `tmp2` where case when (<cache>(`tmp2`.`wf` is null) or `tmp2`.`wf`  not like '%test%') then 1 else 0 end = 0
 
MariaDB [test]> explain extended  SELECT tmp1.ib FROM (SELECT CASE WHEN (tmp2.wf IS NULL OR tmp2.wf NOT LIKE '%test%') THEN 1 ELSE 0 END AS ib    FROM (SELECT 'test' AS wf FROM t1 GROUP BY t1.v1) AS tmp2) AS tmp1;
+------+-------------+------------+------+---------------+------+---------+------+------+----------+---------------------------------+
| id   | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                           |
+------+-------------+------------+------+---------------+------+---------+------+------+----------+---------------------------------+
|    1 | PRIMARY     | <derived3> | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 |                                 |
|    3 | DERIVED     | t1         | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using temporary; Using filesort |
+------+-------------+------------+------+---------------+------+---------+------+------+----------+---------------------------------+
2 rows in set, 1 warning (0.01 sec)
 
Note (Code 1003): select case when (`tmp2`.`wf` is null or `tmp2`.`wf`  not like '%test%') then 1 else 0 end AS `ib` from (select 'test' AS `wf` from `test`.`t1` group by `test`.`t1`.`v1`) `tmp2`

Comment by Igor Babaev [ 2018-07-19 ]

With

set optimizer_switch='condition_pushdown_for_derived=off';

I have:

MariaDB [test]> SELECT tmp1.ib FROM (SELECT CASE WHEN (tmp2.wf IS NULL OR tmp2.wf NOT LIKE '%test%') THEN 1 ELSE 0 END AS ib    FROM (SELECT 'test' AS wf FROM t1 GROUP BY t1.v1) AS tmp2) AS tmp1 WHERE tmp1.ib=0;
+----+
| ib |
+----+
|  0 |
+----+
1 row in set (0.00 sec)

Comment by Galina Shalygina (Inactive) [ 2018-07-29 ]

CREATE TABLE t1 (id int, v1 varchar(25)) ;
INSERT INTO t1 VALUES (5,'test');

MariaDB [test]> SELECT tmp1.ib
    -> FROM (SELECT CASE WHEN (tmp2.v1 IS NULL OR tmp2.v1 NOT LIKE '%test%') THEN 1 ELSE 0 END AS ib
    ->    FROM (SELECT v1 FROM t1 GROUP BY t1.id) AS tmp2) AS tmp1
    -> WHERE tmp1.ib=0;
Empty set (3.304 sec)
 
MariaDB [test]> set optimizer_switch='condition_pushdown_for_derived=off';
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [test]>  SELECT tmp1.ib FROM (SELECT CASE WHEN (tmp2.v1 IS NULL OR tmp2.v1 NOT LIKE '%test%') THEN 1 ELSE 0 END AS ib    FROM (SELECT v1 FROM t1 GROUP BY t1.id) AS tmp2) AS tmp1 WHERE tmp1.ib=0;
+------+
| ib   |
+------+
|    0 |
+------+
1 row in set (4.119 sec)
 
MariaDB [test]> set optimizer_switch='condition_pushdown_for_derived=on';
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [test]> EXPLAIN FORMAT=json
    -> SELECT tmp1.ib
    -> FROM (SELECT CASE WHEN (tmp2.v1 IS NULL OR tmp2.v1 NOT LIKE '%test%') THEN 1 ELSE 0 END AS ib
    ->    FROM (SELECT v1 FROM t1 GROUP BY t1.id) AS tmp2) AS tmp1
    -> WHERE tmp1.ib=0;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "<derived3>",
      "access_type": "ALL",
      "rows": 3,
      "filtered": 100,
      "attached_condition": "case when (tmp2.v1 is null or tmp2.v1  not like '%test%') then 1 else 0 end = 0",
      "materialized": {
        "query_block": {
          "select_id": 3,
          "filesort": {
            "sort_key": "t1.`id`",
            "temporary_table": {
              "table": {
                "table_name": "t1",
                "access_type": "ALL",
                "rows": 3,
                "filtered": 100,
                "attached_condition": "case when (tmp2.v1 is null or tmp2.v1  not like '%test%') then 1 else 0 end = 0"
              }
            }
          }
        }
      }
    }
  }
} |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (2.705 sec)

Running the example above it can be seen that bug was caused by the pushdown into the materialized derived table/view optimization. From the explain of the query in json format it can be seen that "case when (tmp2.v1 is null or tmp2.v1 not like '%test%') then 1 else 0 end = 0" condition was pushed into the WHERE clause of the derived table tmp1. This condition depends only on constants and fields of the derived table so it can be pushed down into the HAVING clause of the derived table. Condition can be pushed into the WHERE clause of the derived table if it depends only on the fields of the derived tables that are used in the GROUP BY clause of the derived table definition. "tmp2.wf" is not a field used in the GROUP BY clause so this condition can't be pushed down into the WHERE clause.

Also it should be mentioned that for the condition that was pushed down fields remain the same while they should be transformed to the appropriate values.
Here "tmp2.v1" should be transformed into "t1.v1".

Looking on the pushdown_cond_for_derived() method and check_cond_extraction_for_grouping_fields() call where it is checked if the condition should be pushed into the WHERE clause it can be seen how the condition is traversed. For the condition excl_dep_on_grouping_fields() method is recursively called.

Looking on the CASE definition it can be seen that it is OR condition with two elements.

(gdb) p dbug_print_item(this)
$14 = 0x5555570a5be0 <dbug_item_print_buf> "tmp2.v1 is null or tmp2.v1  not like '%test%'"

Going down trough the condition it can be mentioned that this condition is processed as Item_func. It tries to take its arg_count that is 0 and returns true (that condition depends on the GROUP BY fields only). That's why this condition is pushed into the WHERE clause. As it doesn't depend on GROUP BY fields but is processed as the condition that depends only on them it is not transformed in the right way. That causes wrong result.

(gdb) p arg_count
$15 = 0

To fix this problem Item_cond::excl_dep_on_grouping_fields() should be changed. It should be changed in the way that Item_cond type conditions in the CASE definition are processed in the right way.

Comment by Igor Babaev [ 2018-07-31 ]

Ok to push into 10.2

Comment by Galina Shalygina (Inactive) [ 2018-08-20 ]

Pushed in 10.2

Generated at Thu Feb 08 08:31:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.