[MDEV-30540] Wrong result with IN list length reaching IN_PREDICATE_CONVERSION_THRESHOLD Created: 2023-02-01  Updated: 2023-04-07  Resolved: 2023-02-10

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: N/A
Fix Version/s: 11.0.0

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Attachments: File fix-unique-attempt.diff    
Issue Links:
Problem/Incident
causes MDEV-31022 SIGSEGV in maria_create from create_i... Closed
is caused by MDEV-26974 Improve selectivity and related costs... Closed
Relates

 Description   

I could only reproduce it on bb-11.0 so far.

--source include/have_sequence.inc
 
CREATE TABLE t (a INT PRIMARY KEY);
INSERT INTO t SELECT seq FROM seq_1_to_30;
 
ANALYZE TABLE t PERSISTENT FOR ALL;
 
SET IN_PREDICATE_CONVERSION_THRESHOLD=4;
SELECT a FROM t WHERE a IN ( 1, 1, 2, 194 );
SET IN_PREDICATE_CONVERSION_THRESHOLD=100;
SELECT a FROM t WHERE a IN ( 1, 1, 2, 194 );
 
drop table t;

bb-11.0

SET IN_PREDICATE_CONVERSION_THRESHOLD=4;
SELECT a FROM t WHERE a IN ( 1, 1, 2, 194 );
a
1
1
2
SET IN_PREDICATE_CONVERSION_THRESHOLD=100;
SELECT a FROM t WHERE a IN ( 1, 1, 2, 194 );
a
1
2

The second one is of course the correct result.

Plan with the wrong result:

explain extended SELECT a FROM t WHERE a IN ( 1, 1, 2, 194 );
id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	<derived3>	ALL	NULL	NULL	NULL	NULL	4	100.00	
1	PRIMARY	t	eq_ref	PRIMARY	PRIMARY	4	tvc_0._col_1	1	100.00	Using index
3	DERIVED	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	No tables used
Warnings:
Note	1003	/* select#1 */ select `test`.`t`.`a` AS `a` from (values (1),(1),(2),(194)) `tvc_0` join `test`.`t` where `test`.`t`.`a` = `tvc_0`.`_col_1`

Reproducible with MyISAM, InnoDB, Aria.



 Comments   
Comment by Sergei Petrunia [ 2023-02-03 ]

The scenario which causes the bug is as follows:

  • IN-predicate is converted into a derived table, (SELECT ... FROM temp_table)
  • Derived table is created with a unique key (Good so far).
  • derived_with_keys optimization starts. It adds another "potential" index (This is a known deficiency, it could actually re-use the key that already exists).
  • join optimization decides to use a full scan on the derived table.
  • derived_with_keys sees that indexes it has added are not necessary.
  • It removes all indexes, including the unique one (This is where the bug is)
  • Now, when the query is run, there is no unique key on the temp_table, duplicates are not removed and so duplicate rows are produced in the output.
Comment by Sergei Petrunia [ 2023-02-03 ]

Stack trace of the location where all indexes are removed:

#0  TABLE::use_index (this=0x7fff9cb411c0, key_to_save=-1) at /home/psergey/dev-git/11.0-before-rebase/sql/table.
cc:8469
#1  0x0000555555fe1cd7 in JOIN::drop_unused_derived_keys (this=0x7fff9c018500) at /home/psergey/dev-git/11.0-befo
re-rebase/sql/sql_select.cc:13872
#2  0x0000555555fbe31f in JOIN::optimize_stage2 (this=0x7fff9c018500) at /home/psergey/dev-git/11.0-before-rebase
/sql/sql_select.cc:2632
#3  0x0000555555fbe0dc in JOIN::optimize_inner (this=0x7fff9c018500) at /home/psergey/dev-git/11.0-before-rebase/
sql/sql_select.cc:2597
#4  0x0000555555fbb7af in JOIN::optimize (this=0x7fff9c018500) at /home/psergey/dev-git/11.0-before-rebase/sql/sq
l_select.cc:1899

Comment by Sergei Petrunia [ 2023-02-06 ]

Developed a fix for this, but I get a failure in another test. It shows that the fix exposes another bug which is not easy to fix.

Pushed a commit which temporarily disables the Derived tables and union can now create distinct keys feature of bb-11.0 tree. It makes this bug go away.

commit 60f08a7d49c4ff1838319cdd84be064ab0a9b4c8
Author: Sergei Petrunia <sergey@mariadb.com>
Date:   Sat Feb 4 13:26:30 2023 +0300
 
    MDEV-30540: Temporarily revert : Derived tables and union can now create distinct keys
    
    Revert this as it was causing the issue:
    
    commit 5e5a8eda1641eda1d915a7eb5736e494d2179795
    Author: Monty <monty@mariadb.org>
    Date:   Wed May 4 17:26:43 2022 +0300
    
        Derived tables and union can now create distinct keys
    
        The idea is that instead of marking all select_lex's with DISTINCT, we
        only mark those that really need distinct result.
    

Comment by Sergei Petrunia [ 2023-02-09 ]

The patch which makes derived_with_keys optimization not to remove unique keys: fix-unique-attempt.diff

It causes this test failure:

mysql-test/main/derived.result    2023-02-03 15:02:03.784632772 +0300
mysql-test/main/derived.reject    2023-02-09 10:50:42.571006411 +0300
@@ -355,43 +355,41 @@
 create table t1(a int);
 create table t2(a int);
 create table t3(a int);
 insert into t1 values(1),(1);
 insert into t2 values(2),(2);
 insert into t3 values(3),(3);
 select * from t1 union distinct select * from t2 union all select * from t3;
 a
 1
 2
 3
 3
 select * from (select * from t1 union distinct select * from t2 union all select * from t3) X;
 a
 1
 2
 3
-3
 set @save2_derived_optimizer_switch_bug=@@optimizer_switch;
 set @@optimizer_switch=default;
 select * from (select * from t1 union distinct select * from t2 union all select * from t3) X;
 a
 1
 2
 3
-3

Comment by Michael Widenius [ 2023-02-10 ]

Worked with Petruna to fix this one.

The problem was the mysql_derived_prepare() did not correctly set
'distinct' when creating a temporary derived table.

Fixed by separating checking for distinct for queries with and without UNION.

Comment by Michael Widenius [ 2023-03-01 ]

The patch was not pushed at once as buildbot discovered a crash on Windows after the patch was pushed.
The crash was caused by 2 other independent bugs (one windows only and one general).
These are now fixed.

Generated at Thu Feb 08 10:17:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.