[MDEV-25843] Wrong Query Result with LATERAL DERIVED (split_materialized=on) Created: 2021-06-02  Updated: 2021-06-04  Resolved: 2021-06-04

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.5.10
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Sascha Beining Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: optimizer
Environment:

Debian Linux


Issue Links:
Duplicate
duplicates MDEV-25714 Join using derived with aggregation r... Closed

 Description   

Only tested with MariaDB 10.5.10

Example schema and data:

CREATE TABLE product (
  `product_id` varchar(24) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '',
  `product_type_id` decimal(6,0) unsigned NOT NULL,
  PRIMARY KEY (`product_id`)
);
 
CREATE TABLE stock (
  `product_id` varchar(24) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '',
  `value` int(11) NOT NULL,
  `date` date NOT NULL,
  KEY `product_id` (`product_id`),
  CONSTRAINT `stock_product` FOREIGN KEY (`product_id`) REFERENCES `product` (`product_id`)
);
 
INSERT INTO
	`product` (`product_id`, `product_type_id`)
VALUES
	('one_product', 1),
	('another_product', 1);
	
INSERT INTO
	`stock` (`product_id`, `value`, `date`)
VALUES
	('one_product', 1, '2021-05-01'),
	('one_product', 2, '2021-05-02'),
	('one_product', 3, '2021-05-03'),
	('another_product', 2, '2021-05-01'),
	('another_product', 3, '2021-05-02'),
	('another_product', 4, '2021-05-03');

Example queries:

# Wrong result with split_materialized=on
set optimizer_switch='split_materialized=on';
SELECT
	product_id AS id,
	stock_data.stock_value
FROM
	product
JOIN (
	SELECT
		product_id
	FROM
		product p
	WHERE
		product_type_id = 1
	GROUP BY product_id
) product_data USING (product_id)
JOIN (
	SELECT
		product_id,
		SUM(stock.value) AS stock_value
	FROM
		stock
	JOIN
		product USING (product_id)
	WHERE
		stock.date BETWEEN '2021-05-01' AND '2021-05-03' AND
		product_type_id = 1
	GROUP BY
		product_id
) stock_data USING (product_id);

Result:

id stock_value
one_product 30

# Different but wrong result with a distinct alias in the second sub select
set optimizer_switch='split_materialized=on';
SELECT
	product_id AS id,
	stock_data.stock_value
FROM
	product
JOIN (
	SELECT
		product_id
	FROM
		product p
	WHERE
		product_type_id = 1
	GROUP BY product_id
) product_data USING (product_id)
JOIN (
	SELECT
		p2.product_id,
		SUM(stock.value) AS stock_value
	FROM
		stock
	JOIN
		product p2 USING (product_id)
	WHERE
		stock.date BETWEEN '2021-05-01' AND '2021-05-03' AND
		p2.product_type_id = 1
	GROUP BY
		p2.product_id
) stock_data USING (product_id);

Result:

id stock_value
another_product 15
one_product 15

Expected result (with set optimizer_switch='split_materialized=off';)

id stock_value
another_product 9
one_product 6


 Comments   
Comment by Alice Sherepa [ 2021-06-04 ]

Thank you for the report!
this if the same bug as MDEV-25714, fixed in 663bc849b5a26a5325adf009a8e commit by Igor Babaev

commit 663bc849b5a26a5325adf009a8e8fa9155c6b833 (origin/bb-10.3-igor)
Author: Igor Babaev <igor@askmonty.org>
Date:   Wed May 26 23:41:59 2021 -0700
 
    MDEV-25714 Join using derived with aggregation returns incorrect results
    
    If a join query uses a derived table (view / CTE) with GROUP BY clause then
    the execution plan for such join may employ split optimization. When this
    optimization is employed the derived table is not materialized. Rather only
    some partitions of the derived table are subject to grouping. Split
    optimization can be applied only if:
    - there are some indexes over the tables used in the join specifying the
      derived table whose prefixes partially cover the field items used in the
      GROUP BY list (such indexes are called splitting indexes)
    - the WHERE condition of the join query contains conjunctive equalities
      between columns of the derived table that comprise major parts of
      splitting indexes and columns of the other join tables.
    
    When the optimizer evaluates extending of a partial join by the rows of the
    derived table it always considers a possibility of using split optimization.
    Different splitting indexes can be used depending on the extended partial
    join. At some rare conditions, for example, when there is a non-splitting
    covering index for a table joined in the join specifying the derived table
    usage of a splitting index to produce rows needed for grouping may be still
    less beneficial than usage of such covering index without any splitting
    technique. The function JOIN_TAB::choose_best_splitting() must take this
    into account.
    

Generated at Thu Feb 08 09:40:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.