[MDEV-25843] Wrong Query Result with LATERAL DERIVED (split_materialized=on) - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 10.5.10
Fix Version/s: N/A
Component/s: Optimizer
Labels:
- optimizer
Environment:
Debian Linux

Description

Only tested with MariaDB 10.5.10

Example schema and data:

CREATE TABLE product (

  `product_id` varchar(24) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '',

  `product_type_id` decimal(6,0) unsigned NOT NULL,

  PRIMARY KEY (`product_id`)

);

CREATE TABLE stock (

  `product_id` varchar(24) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '',

  `value` int(11) NOT NULL,

  `date` date NOT NULL,

  KEY `product_id` (`product_id`),

  CONSTRAINT `stock_product` FOREIGN KEY (`product_id`) REFERENCES `product` (`product_id`)

);

INSERT INTO

	`product` (`product_id`, `product_type_id`)

VALUES

	('one_product', 1),

	('another_product', 1);

INSERT INTO

	`stock` (`product_id`, `value`, `date`)

VALUES

	('one_product', 1, '2021-05-01'),

	('one_product', 2, '2021-05-02'),

	('one_product', 3, '2021-05-03'),

	('another_product', 2, '2021-05-01'),

	('another_product', 3, '2021-05-02'),

	('another_product', 4, '2021-05-03');

Example queries:

# Wrong result with split_materialized=on

set optimizer_switch='split_materialized=on';

SELECT

	product_id AS id,

	stock_data.stock_value

FROM

	product

JOIN (

	SELECT

		product_id

	FROM

		product p

	WHERE

		product_type_id = 1

	GROUP BY product_id

) product_data USING (product_id)

JOIN (

	SELECT

		product_id,

		SUM(stock.value) AS stock_value

	FROM

		stock

	JOIN

		product USING (product_id)

	WHERE

		stock.date BETWEEN '2021-05-01' AND '2021-05-03' AND

		product_type_id = 1

	GROUP BY

		product_id

) stock_data USING (product_id);

Result:

id	stock_value
one_product	30

# Different but wrong result with a distinct alias in the second sub select

set optimizer_switch='split_materialized=on';

SELECT

	product_id AS id,

	stock_data.stock_value

FROM

	product

JOIN (

	SELECT

		product_id

	FROM

		product p

	WHERE

		product_type_id = 1

	GROUP BY product_id

) product_data USING (product_id)

JOIN (

	SELECT

		p2.product_id,

		SUM(stock.value) AS stock_value

	FROM

		stock

	JOIN

		product p2 USING (product_id)

	WHERE

		stock.date BETWEEN '2021-05-01' AND '2021-05-03' AND

		p2.product_type_id = 1

	GROUP BY

		p2.product_id

) stock_data USING (product_id);

Result:

id	stock_value
another_product	15
one_product	15

Expected result (with `set optimizer_switch='split_materialized=off';`)

id	stock_value
another_product	9
one_product	6

Attachments

Issue Links

duplicates

MDEV-25714 Join using derived with aggregation returns incorrect results

Closed

Activity

Alice Sherepa added a comment - 2021-06-04 09:52

Thank you for the report!
this if the same bug as ~~MDEV-25714~~, fixed in 663bc849b5a26a5325adf009a8e commit by Igor Babaev

commit 663bc849b5a26a5325adf009a8e8fa9155c6b833 (origin/bb-10.3-igor)

Author: Igor Babaev <igor@askmonty.org>

Date:   Wed May 26 23:41:59 2021 -0700

    MDEV-25714 Join using derived with aggregation returns incorrect results

    If a join query uses a derived table (view / CTE) with GROUP BY clause then

    the execution plan for such join may employ split optimization. When this

    optimization is employed the derived table is not materialized. Rather only

    some partitions of the derived table are subject to grouping. Split

    optimization can be applied only if:

    - there are some indexes over the tables used in the join specifying the

      derived table whose prefixes partially cover the field items used in the

      GROUP BY list (such indexes are called splitting indexes)

    - the WHERE condition of the join query contains conjunctive equalities

      between columns of the derived table that comprise major parts of

      splitting indexes and columns of the other join tables.

    When the optimizer evaluates extending of a partial join by the rows of the

    derived table it always considers a possibility of using split optimization.

    Different splitting indexes can be used depending on the extended partial

    join. At some rare conditions, for example, when there is a non-splitting

    covering index for a table joined in the join specifying the derived table

    usage of a splitting index to produce rows needed for grouping may be still

    less beneficial than usage of such covering index without any splitting

    technique. The function JOIN_TAB::choose_best_splitting() must take this

    into account.

Alice Sherepa added a comment - 2021-06-04 09:52 Thank you for the report! this if the same bug as MDEV-25714 , fixed in 663bc849b5a26a5325adf009a8e commit by Igor Babaev commit 663bc849b5a26a5325adf009a8e8fa9155c6b833 (origin/bb-10.3-igor) Author: Igor Babaev <igor@askmonty.org> Date: Wed May 26 23:41:59 2021 -0700 MDEV-25714 Join using derived with aggregation returns incorrect results If a join query uses a derived table (view / CTE) with GROUP BY clause then the execution plan for such join may employ split optimization. When this optimization is employed the derived table is not materialized. Rather only some partitions of the derived table are subject to grouping. Split optimization can be applied only if: - there are some indexes over the tables used in the join specifying the derived table whose prefixes partially cover the field items used in the GROUP BY list (such indexes are called splitting indexes) - the WHERE condition of the join query contains conjunctive equalities between columns of the derived table that comprise major parts of splitting indexes and columns of the other join tables. When the optimizer evaluates extending of a partial join by the rows of the derived table it always considers a possibility of using split optimization. Different splitting indexes can be used depending on the extended partial join. At some rare conditions, for example, when there is a non-splitting covering index for a table joined in the join specifying the derived table usage of a splitting index to produce rows needed for grouping may be still less beneficial than usage of such covering index without any splitting technique. The function JOIN_TAB::choose_best_splitting() must take this into account.

People

Assignee:: Unassigned

Reporter:: Sascha Beining

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2021-06-02 07:44

Updated:: 2021-06-04 09:52

Resolved:: 2021-06-04 09:52

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Example schema and data:

Example queries:

Result:

Result:

Expected result (with set optimizer_switch='split_materialized=off';)

Attachments

Issue Links

Activity

People

Dates

Git Integration

Expected result (with `set optimizer_switch='split_materialized=off';`)