[MCOL-1864] Very long IN clause causing wrong results Created: 2018-11-08  Updated: 2022-11-05  Resolved: 2022-11-05

Status: Closed
Project: MariaDB ColumnStore
Component/s: N/A
Affects Version/s: 1.1.6
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Dev OneOone Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: Compatibility
Environment:

Ubuntu 18.04.1 LTS
10.2.17-MariaDB



 Description   

A very long IN clause is causing wrong results. My testcase works fine with INNODB engine though.

Steps to reproduce (attachment would be too long):

CREATE TABLE IF NOT EXISTS numbers (seq INT) ENGINE=COLUMNSTORE;
INSERT INTO numbers SELECT seq FROM seq_1_to_2015667;
SELECT count(*),max(seq) FROM numbers; #shows 2015667
SELECT * FROM numbers WHERE seq IN (SELECT seq FROM seq_1_to_2015667); 

Resultset counts 2015667 records.
Now the weird part: Generate a comma separated list of all numbers, e.g.

SELECT GROUP_CONCAT(seq) FROM seq_1_to_2015667;

and send them explicitly in the query.

SELECT * FROM numbers WHERE seq IN (1,2,3,4,...many numbers...,2015666,2015667);

Resultset counts 49587 records.
On INNODB, this is working fine. The second resultset counts 2015667 records, as expected. I've made the following changes to the config:

[mysqld]
tmp_table_size=4294967295
max_heap_table_size=4294966272
skip-log-bin
disable-log-bin
max_allowed_packet=1073741824
max_length_for_sort_data=8388608
max_long_data_size=4294967295
max_sort_length=8388608

I'm suspecting some undocumented limit or a bug.



 Comments   
Comment by Dev OneOone [ 2018-11-12 ]

In addition: the columnstore query takes about 5 times as long as querying the innodb table.

Comment by Roman [ 2018-12-08 ]

Many thanks for the clue. We will look into it.

Comment by Patrick LeBlanc (Inactive) [ 2018-12-10 ]

IIRC there is an implicit limit on the in-clause length leading up to execution. Either in the component that sends the maria execution plan to exemgr, or in the conversion to a CS execution plan.

Comment by patrice [ 2018-12-12 ]

there is an issue for this : https://jira.mariadb.org/browse/MCOL-557 . we ran into this, and it is not recommended to have more than 50k in the in clause.

Comment by Dev OneOone [ 2018-12-17 ]

@all thank you for taking notice
@patrice thanks for the reference, I didn't know about it

Does anyone know if this is a hard limit or is it somehow configurable? Where's the exact threshold?
I tried the workaround and it works. Don't get me wrong, but it looks like ugly unnecessary overhead to me. IMHO the query should either return no results or throw an error. A wrong resultset is just highly confusing.

Comment by Nivesh [ 2019-01-24 ]

Hi

I have a workaround for this issue however is it very dependant on the amount of memory is available on the UM.

Running a trim on the in list fixes this issue.

Comment by Todd Stoffel (Inactive) [ 2022-11-05 ]

Item is out of date. Closing due to inactivity. If you feel this was done in error please open a new ticket.

Generated at Thu Feb 08 02:31:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.