Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25843

Wrong Query Result with LATERAL DERIVED (split_materialized=on)

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • 10.5.10
    • N/A
    • Optimizer
    • Debian Linux

    Description

      Only tested with MariaDB 10.5.10

      Example schema and data:

      CREATE TABLE product (
        `product_id` varchar(24) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '',
        `product_type_id` decimal(6,0) unsigned NOT NULL,
        PRIMARY KEY (`product_id`)
      );
       
      CREATE TABLE stock (
        `product_id` varchar(24) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '',
        `value` int(11) NOT NULL,
        `date` date NOT NULL,
        KEY `product_id` (`product_id`),
        CONSTRAINT `stock_product` FOREIGN KEY (`product_id`) REFERENCES `product` (`product_id`)
      );
       
      INSERT INTO
      	`product` (`product_id`, `product_type_id`)
      VALUES
      	('one_product', 1),
      	('another_product', 1);
      	
      INSERT INTO
      	`stock` (`product_id`, `value`, `date`)
      VALUES
      	('one_product', 1, '2021-05-01'),
      	('one_product', 2, '2021-05-02'),
      	('one_product', 3, '2021-05-03'),
      	('another_product', 2, '2021-05-01'),
      	('another_product', 3, '2021-05-02'),
      	('another_product', 4, '2021-05-03');
      

      Example queries:

      # Wrong result with split_materialized=on
      set optimizer_switch='split_materialized=on';
      SELECT
      	product_id AS id,
      	stock_data.stock_value
      FROM
      	product
      JOIN (
      	SELECT
      		product_id
      	FROM
      		product p
      	WHERE
      		product_type_id = 1
      	GROUP BY product_id
      ) product_data USING (product_id)
      JOIN (
      	SELECT
      		product_id,
      		SUM(stock.value) AS stock_value
      	FROM
      		stock
      	JOIN
      		product USING (product_id)
      	WHERE
      		stock.date BETWEEN '2021-05-01' AND '2021-05-03' AND
      		product_type_id = 1
      	GROUP BY
      		product_id
      ) stock_data USING (product_id);
      

      Result:

      id stock_value
      one_product 30

      # Different but wrong result with a distinct alias in the second sub select
      set optimizer_switch='split_materialized=on';
      SELECT
      	product_id AS id,
      	stock_data.stock_value
      FROM
      	product
      JOIN (
      	SELECT
      		product_id
      	FROM
      		product p
      	WHERE
      		product_type_id = 1
      	GROUP BY product_id
      ) product_data USING (product_id)
      JOIN (
      	SELECT
      		p2.product_id,
      		SUM(stock.value) AS stock_value
      	FROM
      		stock
      	JOIN
      		product p2 USING (product_id)
      	WHERE
      		stock.date BETWEEN '2021-05-01' AND '2021-05-03' AND
      		p2.product_type_id = 1
      	GROUP BY
      		p2.product_id
      ) stock_data USING (product_id);
      

      Result:

      id stock_value
      another_product 15
      one_product 15

      Expected result (with set optimizer_switch='split_materialized=off';)

      id stock_value
      another_product 9
      one_product 6

      Attachments

        Issue Links

          Activity

            alice Alice Sherepa added a comment -

            Thank you for the report!
            this if the same bug as MDEV-25714, fixed in 663bc849b5a26a5325adf009a8e commit by Igor Babaev

            commit 663bc849b5a26a5325adf009a8e8fa9155c6b833 (origin/bb-10.3-igor)
            Author: Igor Babaev <igor@askmonty.org>
            Date:   Wed May 26 23:41:59 2021 -0700
             
                MDEV-25714 Join using derived with aggregation returns incorrect results
                
                If a join query uses a derived table (view / CTE) with GROUP BY clause then
                the execution plan for such join may employ split optimization. When this
                optimization is employed the derived table is not materialized. Rather only
                some partitions of the derived table are subject to grouping. Split
                optimization can be applied only if:
                - there are some indexes over the tables used in the join specifying the
                  derived table whose prefixes partially cover the field items used in the
                  GROUP BY list (such indexes are called splitting indexes)
                - the WHERE condition of the join query contains conjunctive equalities
                  between columns of the derived table that comprise major parts of
                  splitting indexes and columns of the other join tables.
                
                When the optimizer evaluates extending of a partial join by the rows of the
                derived table it always considers a possibility of using split optimization.
                Different splitting indexes can be used depending on the extended partial
                join. At some rare conditions, for example, when there is a non-splitting
                covering index for a table joined in the join specifying the derived table
                usage of a splitting index to produce rows needed for grouping may be still
                less beneficial than usage of such covering index without any splitting
                technique. The function JOIN_TAB::choose_best_splitting() must take this
                into account.
                
            

            alice Alice Sherepa added a comment - Thank you for the report! this if the same bug as MDEV-25714 , fixed in 663bc849b5a26a5325adf009a8e commit by Igor Babaev commit 663bc849b5a26a5325adf009a8e8fa9155c6b833 (origin/bb-10.3-igor) Author: Igor Babaev <igor@askmonty.org> Date: Wed May 26 23:41:59 2021 -0700   MDEV-25714 Join using derived with aggregation returns incorrect results If a join query uses a derived table (view / CTE) with GROUP BY clause then the execution plan for such join may employ split optimization. When this optimization is employed the derived table is not materialized. Rather only some partitions of the derived table are subject to grouping. Split optimization can be applied only if: - there are some indexes over the tables used in the join specifying the derived table whose prefixes partially cover the field items used in the GROUP BY list (such indexes are called splitting indexes) - the WHERE condition of the join query contains conjunctive equalities between columns of the derived table that comprise major parts of splitting indexes and columns of the other join tables. When the optimizer evaluates extending of a partial join by the rows of the derived table it always considers a possibility of using split optimization. Different splitting indexes can be used depending on the extended partial join. At some rare conditions, for example, when there is a non-splitting covering index for a table joined in the join specifying the derived table usage of a splitting index to produce rows needed for grouping may be still less beneficial than usage of such covering index without any splitting technique. The function JOIN_TAB::choose_best_splitting() must take this into account.

            People

              Unassigned Unassigned
              sbeining Sascha Beining
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.