[MDEV-23878] Invalid result when in_predicate_conversion_threshold is reached using IN clause Created: 2020-10-02  Updated: 2020-10-18

Status: Confirmed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.4.14, 10.4, 10.5
Fix Version/s: 10.4, 10.5

Type: Bug Priority: Major
Reporter: carlos tutte Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 1
Labels: None

Attachments: File mdev23878.test     File subscriber_data.dev_config.sql    

 Description   

When the number of conditions inside the "IN" clause matches the value of in_predicate_conversion_threshold , query behavior changes and returned result is incorrect for query.

Attaching file for case reproduction:

CREATE DATABASE subscriber_data;
mysql subscriber_data < subscriber_data.sql

Execute:

pager;
set in_predicate_conversion_threshold = 51;
pager md5sum;
 
SELECT object, ind, data FROM `subscriber_data`.`dev_config` AS live JOIN (SELECT object, ind, MAX(collection_time) AS collection_time FROM `subscriber_data`.`dev_config` WHERE collection_time <= '2020-09-24' AND ind IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51) GROUP BY object, ind) AS latest USING (object, ind, collection_time) WHERE ind IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51);

Result is
13 rows in set (0.039 sec)

Otherwise setting set in_predicate_conversion_threshold = 1000; (value bigger than number of clauses inside "IN" ):

pager;
set in_predicate_conversion_threshold = 1000;
pager md5sum;
 
SELECT object, ind, data FROM `subscriber_data`.`dev_config` AS live JOIN (SELECT object, ind, MAX(collection_time) AS collection_time FROM `subscriber_data`.`dev_config` WHERE collection_time <= '2020-09-24' AND ind IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51) GROUP BY object, ind) AS latest USING (object, ind, collection_time) WHERE ind IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51);

Returns
700 rows in set (0.038 sec)

The returned result when bug triggers (in_predicate_conversion_threshold = 51) is:
-------------------------------------------

object ind data

-------------------------------------------

1 51 51
2 51 0
3 51 us-east-1d
4 51 ebd81398742e75f2fafb5511e06720c3
5 51 0
6 51 0
7 51 0
8 51 9b5d0e93ea552fb8a86420e87410ae21
10 51 NULL
11 51 0
12 51 0
13 51 0
14 51 0

-------------------------------------------
13 rows in set (0.046 sec)

But if I reorder the values inside the query IN clause and set "1" at the end instead of 51, result is:
-------------------------------------------

object ind data

-------------------------------------------

1 1 1
2 1 0
3 1 test_by_vcug_device
4 1 6dc8e300391a4730609c89d9bf5e7564
5 1 0
6 1 NULL
7 1 0
8 1 8d4ffb6e30134a25b5297975eb3c0155
10 1 NULL
11 1 0
12 1 0
13 1 0
14 1 0

-------------------------------------------
13 rows in set (0.040 sec)

It can be seen that on the group by "ind" , the result "ind" value displayed is only the one last used in the "IN" conditions, and there seems to is a "UNION ALL" missing to merge all results when in_predicate_conversion_threshold kicks in.



 Comments   
Comment by Elena Stepanova [ 2020-10-18 ]

Thanks for the report and test case, reproducible as described.
Attached mdev23878.test is the same test case, only in MTR friendly form, and it has SELECT COUNT instead of a select list, for easier result interpretation.

set in_predicate_conversion_threshold = 51;
SELECT count(*) FROM `subscriber_data`.`dev_config` AS live JOIN (SELECT object, ind, MAX(collection_time) AS collection_time FROM `subscriber_data`.`dev_config` WHERE collection_time <= '2020-09-24' AND ind IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51) GROUP BY object, ind) AS latest USING (object, ind, collection_time) WHERE ind IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51);
count(*)
13
set in_predicate_conversion_threshold = 1000;
SELECT count(*) FROM `subscriber_data`.`dev_config` AS live JOIN (SELECT object, ind, MAX(collection_time) AS collection_time FROM `subscriber_data`.`dev_config` WHERE collection_time <= '2020-09-24' AND ind IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51) GROUP BY object, ind) AS latest USING (object, ind, collection_time) WHERE ind IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51);
count(*)
700

The test case is for debugging purposes only, don't put it into the regression suite like this!

With otherwise default optimizer settings, reproducible on 10.4 and 10.5. Possibly with non-default settings and/or adjustments to the data and queries it could be reproduced on earlier versions.

Generated at Thu Feb 08 09:25:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.