[MDEV-32349] Window Lag function returning different result under from MySQL-8.0 Created: 2023-10-04  Updated: 2023-10-31

Status: Open
Project: MariaDB Server
Component/s: Optimizer - Window functions
Affects Version/s: 10.4.31
Fix Version/s: 10.4, 10.6

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Vicențiu Ciorbaru
Resolution: Unresolved Votes: 0
Labels: None


 Description   

CREATE TABLE aggregates
(
  `date` date,
  `ticker` varchar(3),
  l float,
  is_fs tinyint(1)
);
 
INSERT INTO aggregates
VALUES
	('2016-08-02', 'GFF', 16.75, NULL),
	('2016-08-24', 'GFF', 16.75, NULL),
    ('2016-09-16', 'GFF', 16.75, NULL),
    ('2016-01-06', 'GFF', 16.75, NULL),
    ('2016-10-10', 'GFF', 16.75, NULL);

 
SELECT 
 
(1=1) 
AND 
(
    (LAG(l) OVER (PARTITION BY ticker ORDER BY date) > 1 AND 1=1) 
OR (LAG(l) OVER (PARTITION BY ticker ORDER BY date) > 1 AND 1=1)
)
 
AS is_fs FROM `aggregates` WHERE YEAR(date) = 2016 AND ticker = 'GFF' ORDER BY is_fs DESC

result:

NULL
NULL
NULL
NULL
NULL

https://www.db-fiddle.com/f/iEoGYnnKA3pmkB9piu8pWk/0

MySQL-8.0 result

1
1
1
1
NULL



 Comments   
Comment by Vicențiu Ciorbaru [ 2023-10-09 ]

The problem is caused by the 3-level Window Function dependency tree.

Simplified query:

 
-- Setup
CREATE TABLE aggregates
(
`date` date,
`ticker` varchar(3),
l float,
is_fs tinyint(1)
);
INSERT INTO aggregates
VALUES
('2016-08-02', 'GFF', 16.75, NULL),
('2016-08-24', 'GFF', 16.75, NULL);
 
-- Problematic Query
SELECT
(
  (SUM(1) OVER () > 0 AND true) OR
  (AVG(1) OVER () > 0 AND true)
) and true
FROM `aggregates`;

The execution of this query generates a temporary table with the following fields corresponding to the following Item subtrees (in this order)

  • Field 0 - Item_cond_or(ref_to_field_3, ref_to_field2)
  • Field 1 - Item_cond_and(ref_to_field_2, true)
  • Field 2 - Item_window_func(avg...)
  • Field 3 - Item_cond_and(ref_to_field_4, true)
  • Field 4 - Item_window_func(sum...)
  • Field 5 - Item_cond_and(ref_to_field0, true)

The problem with this field order is that the final step in window function computation does not consider field dependencies properly. To be able to fill the whole table, one must start by computing window functions first (which is done correctly), then the items depending directly on window functions, then the items depending indirectly and so on.

We currently only properly handle first and second level items containing window functions. 3 levels and more are not evaluated in proper order, hence this bug.

This translates into a need to do a topological sort of temp-table fields before saving the actual results.

Comment by Vicențiu Ciorbaru [ 2023-10-09 ]

Discussion with psergei:

Items can be split into 3 categories:
1. Those that do not use window functions
2. Those that ARE window functions
3. Those that use window function values.

With this in mind, there should be no need for topological sorting, as all items of type 3 can be computed regardless of ordering. The current code however creates separate columns in the tmp table depending on the expression tree with type 3 items referencing window functions directly, but also with type 3 items referencing window functions via one item indirection. This needs to be investigated further to see why we have an extra layer of Item_refs placed in the tmp table.

Generated at Thu Feb 08 10:30:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.