[MDEV-12839] incorrect lead window function value with null Created: 2017-05-18  Updated: 2018-06-14  Resolved: 2017-05-19

Status: Closed
Project: MariaDB Server
Component/s: Optimizer - Window functions
Affects Version/s: 10.2.5
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: David Thompson (Inactive) Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: need_feedback


 Description   

when you have a null in the value being windowed over this is being put at the end but the lead value is the first value in the window rather than null. I believe it should be null. If you don't have any nulls it behaves this way. This would also be a regression for columnstore 1.1.

Steps to repro:

create table t1(dt datetime);
insert into t1 values ('2017-05-17'), ('2017-05-18'), (null);
 
> select dt,lead(dt) over (order by dt) next from t1 where dt is not null;
+---------------------+---------------------+
| dt                  | next                |
+---------------------+---------------------+
| 2017-05-17 00:00:00 | 2017-05-18 00:00:00 |
| 2017-05-18 00:00:00 | NULL                |
+---------------------+---------------------+
2 rows in set (0.00 sec)
 
MariaDB [genesis]> select dt,lead(dt) over (order by dt) next from t1;
+---------------------+---------------------+
| dt                  | next                |
+---------------------+---------------------+
| 2017-05-17 00:00:00 | 2017-05-18 00:00:00 |
| 2017-05-18 00:00:00 | NULL                |
| NULL                | 2017-05-17 00:00:00 |
+---------------------+---------------------+
3 rows in set (0.00 sec)
 



 Comments   
Comment by Elena Stepanova [ 2017-05-18 ]

It is put at the end because you don't specify any order, it could be different in other circumstances, e.g.:

MariaDB [test]> insert into t1 values (null),('2017-05-17'), ('2017-05-18');
Query OK, 3 rows affected (0.04 sec)
Records: 3  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select dt,lead(dt) over (order by dt) next from t1;
+---------------------+---------------------+
| dt                  | next                |
+---------------------+---------------------+
| NULL                | 2017-05-17 00:00:00 |
| 2017-05-17 00:00:00 | 2017-05-18 00:00:00 |
| 2017-05-18 00:00:00 | NULL                |
+---------------------+---------------------+
3 rows in set (0.00 sec)

or

MariaDB [test]> insert into t1 values ('2017-05-17'), ('2017-05-18'), (null);
Query OK, 3 rows affected (0.05 sec)
Records: 3  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select dt,lead(dt) over (order by dt) next from t1 order by dt;
+---------------------+---------------------+
| dt                  | next                |
+---------------------+---------------------+
| NULL                | 2017-05-17 00:00:00 |
| 2017-05-17 00:00:00 | 2017-05-18 00:00:00 |
| 2017-05-18 00:00:00 | NULL                |
+---------------------+---------------------+
3 rows in set (0.00 sec)

Given that, do you still think it works incorrectly?

Comment by David Thompson (Inactive) [ 2017-05-18 ]

Hmmn thinking about it, i think the behavior would be explained by the window function order being nulls first? My understanding is that the window function order by is distinct from the select order by as well so it shouldn't matter how you order the select.

However i thought the default 'mysql way' is nulls last. I couldn't find anything in the kb article. So if the defined behavior of windows functions in 10.2 is that nulls come first then this is not a bug and can be closed.

Comment by Elena Stepanova [ 2017-05-18 ]

I"m not sure I've understood correctly this part "My understanding is that the window function order by is distinct from the select order by as well" ; but if I'm guessing right and you mean that ORDER BY in the window definition also defines the order of the resultset – it doesn't, they are independent.

MariaDB [test]> create table tt (i int);
Query OK, 0 rows affected (0.22 sec)
 
MariaDB [test]> insert into tt values (2),(5),(1);
Query OK, 3 rows affected (0.03 sec)
Records: 3  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select i, lead(i) over (order by i) w from tt;
+------+------+
| i    | w    |
+------+------+
|    2 |    5 |
|    5 | NULL |
|    1 |    2 |
+------+------+
3 rows in set (0.01 sec)
 
MariaDB [test]> select i, lead(i) over (order by i) w from tt order by i;
+------+------+
| i    | w    |
+------+------+
|    1 |    2 |
|    2 |    5 |
|    5 | NULL |
+------+------+
3 rows in set (0.00 sec)
 
MariaDB [test]> select i, lead(i) over (order by i desc) w from tt;
+------+------+
| i    | w    |
+------+------+
|    2 |    1 |
|    5 |    2 |
|    1 | NULL |
+------+------+
3 rows in set (0.00 sec)
 
MariaDB [test]> select i, lead(i) over (order by i desc) w from tt order by i;
+------+------+
| i    | w    |
+------+------+
|    1 | NULL |
|    2 |    1 |
|    5 |    2 |
+------+------+
3 rows in set (0.00 sec)

etc.

For NULLs, the default "mysql way" is nulls first.
https://dev.mysql.com/doc/refman/5.7/en/working-with-null.html

When doing an ORDER BY, NULL values are presented first if you do ORDER BY ... ASC and last if you do ORDER BY ... DESC.

MySQL [test]> create table tx (i int);
Query OK, 0 rows affected (0.36 sec)
 
MySQL [test]> insert into tx values (2),(null),(1);
Query OK, 3 rows affected (0.04 sec)
Records: 3  Duplicates: 0  Warnings: 0
 
MySQL [test]> select * from tx order by i;
+------+
| i    |
+------+
| NULL |
|    1 |
|    2 |
+------+
3 rows in set (0.00 sec)
 
MySQL [test]> select @@version;
+--------------+
| @@version    |
+--------------+
| 5.7.17-debug |
+--------------+
1 row in set (0.00 sec)

Comment by David Thompson (Inactive) [ 2017-05-19 ]

Yes, correct i was thrown by the separate table order so this is behaving as intended.

Comment by Corbin Bohne [ 2018-06-14 ]

Agree with all here that this is expected behavior.

Generated at Thu Feb 08 08:00:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.