Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
10.5.10
-
Debian Linux
Description
Only tested with MariaDB 10.5.10
Example schema and data:
CREATE TABLE product ( |
`product_id` varchar(24) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '', |
`product_type_id` decimal(6,0) unsigned NOT NULL, |
PRIMARY KEY (`product_id`) |
);
|
|
CREATE TABLE stock ( |
`product_id` varchar(24) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '', |
`value` int(11) NOT NULL, |
`date` date NOT NULL, |
KEY `product_id` (`product_id`), |
CONSTRAINT `stock_product` FOREIGN KEY (`product_id`) REFERENCES `product` (`product_id`) |
);
|
|
INSERT INTO |
`product` (`product_id`, `product_type_id`)
|
VALUES
|
('one_product', 1), |
('another_product', 1); |
|
INSERT INTO |
`stock` (`product_id`, `value`, `date`) |
VALUES
|
('one_product', 1, '2021-05-01'), |
('one_product', 2, '2021-05-02'), |
('one_product', 3, '2021-05-03'), |
('another_product', 2, '2021-05-01'), |
('another_product', 3, '2021-05-02'), |
('another_product', 4, '2021-05-03'); |
Example queries:
# Wrong result with split_materialized=on |
set optimizer_switch='split_materialized=on'; |
SELECT
|
product_id AS id, |
stock_data.stock_value
|
FROM
|
product
|
JOIN ( |
SELECT |
product_id
|
FROM |
product p
|
WHERE |
product_type_id = 1
|
GROUP BY product_id |
) product_data USING (product_id)
|
JOIN ( |
SELECT |
product_id,
|
SUM(stock.value) AS stock_value |
FROM |
stock
|
JOIN |
product USING (product_id)
|
WHERE |
stock.date BETWEEN '2021-05-01' AND '2021-05-03' AND |
product_type_id = 1
|
GROUP BY |
product_id
|
) stock_data USING (product_id);
|
Result:
id | stock_value |
---|---|
one_product | 30 |
# Different but wrong result with a distinct alias in the second sub select |
set optimizer_switch='split_materialized=on'; |
SELECT
|
product_id AS id, |
stock_data.stock_value
|
FROM
|
product
|
JOIN ( |
SELECT |
product_id
|
FROM |
product p
|
WHERE |
product_type_id = 1
|
GROUP BY product_id |
) product_data USING (product_id)
|
JOIN ( |
SELECT |
p2.product_id,
|
SUM(stock.value) AS stock_value |
FROM |
stock
|
JOIN |
product p2 USING (product_id)
|
WHERE |
stock.date BETWEEN '2021-05-01' AND '2021-05-03' AND |
p2.product_type_id = 1
|
GROUP BY |
p2.product_id
|
) stock_data USING (product_id);
|
Result:
id | stock_value |
---|---|
another_product | 15 |
one_product | 15 |
Expected result (with set optimizer_switch='split_materialized=off';)
id | stock_value |
---|---|
another_product | 9 |
one_product | 6 |
Attachments
Issue Links
- duplicates
-
MDEV-25714 Join using derived with aggregation returns incorrect results
-
- Closed
-
Thank you for the report!
this if the same bug as
MDEV-25714, fixed in 663bc849b5a26a5325adf009a8e commit by Igor Babaevcommit 663bc849b5a26a5325adf009a8e8fa9155c6b833 (origin/bb-10.3-igor)
Author: Igor Babaev <igor@askmonty.org>
Date: Wed May 26 23:41:59 2021 -0700
MDEV-25714 Join using derived with aggregation returns incorrect results
If a join query uses a derived table (view / CTE) with GROUP BY clause then
the execution plan for such join may employ split optimization. When this
optimization is employed the derived table is not materialized. Rather only
some partitions of the derived table are subject to grouping. Split
optimization can be applied only if:
- there are some indexes over the tables used in the join specifying the
derived table whose prefixes partially cover the field items used in the
GROUP BY list (such indexes are called splitting indexes)
- the WHERE condition of the join query contains conjunctive equalities
between columns of the derived table that comprise major parts of
splitting indexes and columns of the other join tables.
When the optimizer evaluates extending of a partial join by the rows of the
derived table it always considers a possibility of using split optimization.
Different splitting indexes can be used depending on the extended partial
join. At some rare conditions, for example, when there is a non-splitting
covering index for a table joined in the join specifying the derived table
usage of a splitting index to produce rows needed for grouping may be still
less beneficial than usage of such covering index without any splitting
technique. The function JOIN_TAB::choose_best_splitting() must take this
into account.