[MDEV-10172] UNION query returns incorrect rows outside conditional evaluation Created: 2016-06-02  Updated: 2016-12-20  Resolved: 2016-12-20

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.1.67, 5.2.14, 5.3.12, 10.1.13, 5.5, 10.0, 10.1, 10.2
Fix Version/s: 5.5.54, 10.0.29, 10.1.21, 10.2.3

Type: Bug Priority: Major
Reporter: Andy Cooper Assignee: Oleksandr Byelkin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-7805 Wrong result (extra rows) with UNION ... Closed
Sprint: 5.5.50, 5.5.54

 Description   

Please consider the following scenario:

drop table if exists union_bug;
create table union_bug (d datetime not null primary key);
insert into union_bug(d) values ('2016-06-01'),('2016-06-02'),('2016-06-03'),('2016-06-04');
select * from
(
	select * from union_bug where d between '2016-06-02' and '2016-06-05'
	union
	(select * from union_bug where d < '2016-06-05' order by d desc limit 1)
) onlyJun2toJun4
order by d

The above query should return 3 rows dated 02 through 04 June 2016. This can be born out by running the individual select statements.

However in practice this query returns 4 rows incorrectly dated 01 through 04 June 2016. It should NEVER NEVER NEVER return a row for 01 June, but it does.

I suspect this may be somehow related to the duplicate row from the lower query. If I change the 'union to a 'union all' then the query correctly returns the expected 4 rows, including the one duplicate row.



 Comments   
Comment by Elena Stepanova [ 2016-06-02 ]

AndyCooper, thanks for the report and test case.

It is likely to be a duplicate of MDEV-7805, but since there is a good simple test case, I'll keep it open and assign to sanja so that when he comes up with a fix for MDEV-7805, he would make sure this case is covered (fixed) as well.

Comment by Andy Cooper [ 2016-06-03 ]

In my opinion, the fact that this bug returns incorrect data (which is out of the bounds of either of the queries that comprise the UNION) makes it a very bad bug - I'm really surprised this has been open for over a year!

Comment by Sergei Petrunia [ 2016-06-05 ]

The bug seems to be that the LIMIT clause in the subquery is ignored.

Comment by Sergei Petrunia [ 2016-06-05 ]

EXPLAIN of the query:

+------+--------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| id   | select_type  | table      | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+------+--------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
|    1 | PRIMARY      | <derived2> | ALL   | NULL          | NULL    | NULL    | NULL |    4 | Using filesort           |
|    2 | DERIVED      | union_bug  | range | PRIMARY       | PRIMARY | 5       | NULL |    3 | Using where; Using index |
|    3 | UNION        | union_bug  | range | PRIMARY       | PRIMARY | 5       | NULL |    4 | Using where; Using index |
| NULL | UNION RESULT | <union2,3> | ALL   | NULL          | NULL    | NULL    | NULL | NULL |                          |
+------+--------------+------------+-------+---------------+---------+---------+------+------+--------------------------+

it looks like the ORDER BY ... LIMIT part is to be satisfied by using range access and a suitable index.

If I change the query so that the index is not usable for resolving ORDER BY, I get correct result:

select * from ( select * from union_bug where d between '2016-06-02' and '2016-06-05' union (select * from union_bug where d < '2016-06-05' order by ifnull(d,null) desc limit 1) ) onlyJun2toJun4 order by d;
+---------------------+
| d                   |
+---------------------+
| 2016-06-02 00:00:00 |
| 2016-06-03 00:00:00 |
| 2016-06-04 00:00:00 |
+---------------------+

EXPLAIN of the above query:

+------+--------------+------------+-------+---------------+---------+---------+------+------+------------------------------------------+
| id   | select_type  | table      | type  | possible_keys | key     | key_len | ref  | rows | Extra                                    |
+------+--------------+------------+-------+---------------+---------+---------+------+------+------------------------------------------+
|    1 | PRIMARY      | <derived2> | ALL   | NULL          | NULL    | NULL    | NULL |    4 | Using filesort                           |
|    2 | DERIVED      | union_bug  | range | PRIMARY       | PRIMARY | 5       | NULL |    3 | Using where; Using index                 |
|    3 | UNION        | union_bug  | index | PRIMARY       | PRIMARY | 5       | NULL |    4 | Using where; Using index; Using filesort |
| NULL | UNION RESULT | <union2,3> | ALL   | NULL          | NULL    | NULL    | NULL | NULL |                                          |
+------+--------------+------------+-------+---------------+---------+---------+------+------+------------------------------------------+

Comment by Sergei Petrunia [ 2016-06-05 ]

ANALYZE confirms filesort uses limit:

                        ...
                        "r_rows": 1,
                        "filesort": {
                          "r_loops": 1,
                          "r_total_time_ms": 0.3831,
                          "r_limit": 1,
                          "r_used_priority_queue": true,
                          "r_output_rows": 2,
                          "table": {
                            "table_name": "union_bug",
                            "access_type": "index",
                            "possible_keys": ["PRIMARY"], 
                             ...

Comment by Sergei Petrunia [ 2016-06-05 ]

Note that if one puts the ORDER BY .. LIMIT as the first union member, the result is correct:

select * from ((select * from union_bug where d < '2016-06-05' order by d desc limit 1) union (select * from union_bug where d between '2016-06-02' and '2016-06-05')) onlyJun2toJun4 order by d;
+---------------------+
| d                   |
+---------------------+
| 2016-06-02 00:00:00 |
| 2016-06-03 00:00:00 |
| 2016-06-04 00:00:00 |
+---------------------+

Comment by Sergei Petrunia [ 2016-06-05 ]

Ok, it fails as follows:

We arrive at the point where we've read the first record for select with id=3 (the one with ORDER BY):

  #0  end_send (join=0x7fff9000a170, join_tab=0x7fff9004d608, end_of_records=false) at /home/psergey/dev-git/10.1-dbg5/sql/sql_select.cc:19439
  #1  0x0000555555abb0c4 in evaluate_join_record (join=0x7fff9000a170, join_tab=0x7fff9004d2c0, error=0) at /home/psergey/dev-git/10.1-dbg5/sql/sql_select.cc:18528
  #2  0x0000555555aba9b9 in sub_select (join=0x7fff9000a170, join_tab=0x7fff9004d2c0, end_of_records=false) at /home/psergey/dev-git/10.1-dbg5/sql/sql_select.cc:18303
  #3  0x0000555555aba219 in do_select (join=0x7fff9000a170, fields=0x7fff900077d8, table=0x0, procedure=0x0) at /home/psergey/dev-git/10.1-dbg5/sql/sql_select.cc:17958
  #4  0x0000555555a96432 in JOIN::exec_inner (this=0x7fff9000a170) at /home/psergey/dev-git/10.1-dbg5/sql/sql_select.cc:3228
  #5  0x0000555555a93717 in JOIN::exec (this=0x7fff9000a170) at /home/psergey/dev-git/10.1-dbg5/sql/sql_select.cc:2518
  #6  0x0000555555b27d2a in st_select_lex_unit::exec (this=0x7fff90005e78) at /home/psergey/dev-git/10.1-dbg5/sql/sql_union.cc:832
  #7  0x0000555555a2851f in mysql_derived_fill (thd=0x555558000c10, lex=0x555558004560, derived=0x7fff900089e8) at /home/psergey/dev-git/10.1-dbg5/sql/sql_derived.cc:918
  #8  0x0000555555a26f3a in mysql_handle_single_derived (lex=0x555558004560, derived=0x7fff900089e8, phases=96) at /home/psergey/dev-git/10.1-dbg5/sql/sql_derived.cc:195
  #9  0x0000555555aab67b in st_join_table::preread_init (this=0x7fff9004ecd8) at /home/psergey/dev-git/10.1-dbg5/sql/sql_select.cc:11695
  #10 0x0000555555ac2014 in create_sort_index (thd=0x555558000c10, join=0x7fff90009230, order=0x7fff900090e0, filesort_limit=18446744073709551615, select_limit=18446744073709551615, is_order_by=true) at /home/psergey/dev-git/10.1-dbg5/sql/sql_select.cc:21431
  #11 0x0000555555a96210 in JOIN::exec_inner (this=0x7fff90009230) at /home/psergey/dev-git/10.1-dbg5/sql/sql_select.cc:3198
  #12 0x0000555555a93717 in JOIN::exec (this=0x7fff90009230) at /home/psergey/dev-git/10.1-dbg5/sql/sql_select.cc:2518
  #13 0x0000555555a96c8a in mysql_select (thd=0x555558000c10, rref_pointer_array=0x555558004fa0, tables=0x7fff900089e8, wild_num=1, fields=..., conds=0x0, og_num=1, order=0x7fff900090e0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fff90009210, unit=0x555558004628, select_lex=0x555558004d28) at /home/psergey/dev-git/10.1-dbg5/sql/sql_select.cc:3451
  #14 0x0000555555a8c7fd in handle_select (thd=0x555558000c10, lex=0x555558004560, result=0x7fff90009210, setup_tables_done_option=0) at /home/psergey/dev-git/10.1-dbg5/sql/sql_select.cc:384

Then,

  • we enter select_union::send_data
  • it calls table->file->ha_write_tmp_row, which returns HA_ERR_FOUND_DUPP_KEY
  • this causes select_union::send_data to return immediately, without incrementing join->send_records. That way, LIMIT is not honored.

It seems to have been done intentionally, see comment:

      /* result < 0 if row was not accepted and should not be counted */

Comment by Sergei Petrunia [ 2016-06-06 ]

AndyCooper, agree this should be fixed. I've investigated the problem. Couldn't figure out the fix so far, I (or somebody else) will continue to work on this.

Comment by Oleksandr Byelkin [ 2016-12-20 ]

revision-id: 536752151c17d23a9cf98c0b9512aca872766663 (mariadb-5.5.53-30-g536752151c1)
parent(s): c4d9dc705b781bb155aab8f04cece2b87116d3c1
committer: Oleksandr Byelkin
timestamp: 2016-12-20 10:56:49 +0100
message:

MDEV-10172: UNION query returns incorrect rows outside conditional evaluation

count duplicate of UNION SELECT separately to awoid influence on lokal LIMIT clause.

Generated at Thu Feb 08 07:40:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.