When specifying 100,000 values ββin SELECT WHERE IN(1,2,10,), only 34,464 rows are returned. If the table does not specify Engine = Columnstore, it returns 100,000 rows correctly.
Is there a limit to the number of IN() in ColumnStore?γ
I can testify to this IN clause unreliable behavior. Many of our queries using col IN (val1, val2, ...) were presenting incorrect results when compared to the equivalent query using the col=val1 OR col=val2 OR col=val3 expression. The later comes up with the correct results.
We have solved this kind of issue in our ColumnStore client server app by replacing every x IN (a,b,c...) by x=a OR x=b OR x=c .... in our query builder. The IN clause in ColumnStore is not reliable as of now!
Our AND x IN (vals...) clause - implemented with (x=vals OR ...) - builder, TypeScript:
return `'${escape(element)}'`; // String elements escaped and between ''
} else if (element != null) {
return element; // Others stay raw
}
return -1; // Invalid element
});
let orEquals = "";
for (const e of treated) {
orEquals += `${safeColName}=${e} OR `;
}
orEquals = orEquals.substring(0, orEquals.length - 4); // removes last ' OR '
return `AND (${orEquals}) `;
}
return '';
}
EugΓͺnio Pacceli Reis da Fonseca
added a comment - - edited I can testify to this IN clause unreliable behavior. Many of our queries using col IN (val1, val2, ...) were presenting incorrect results when compared to the equivalent query using the col=val1 OR col=val2 OR col=val3 expression. The later comes up with the correct results.
We have solved this kind of issue in our ColumnStore client server app by replacing every x IN (a,b,c...) by x=a OR x=b OR x=c .... in our query builder. The IN clause in ColumnStore is not reliable as of now!
Our AND x IN (vals...) clause - implemented with (x=vals OR ...) - builder, TypeScript:
andInIfNotEmpty(columnName: string, arr: Array<any>): string {
const safeColName = escape(columnName);
if (arr != null && arr.length > 0) {
const treated = arr.map(element => {
if (typeof element == 'string') {
return `'${escape(element)}'`; // String elements escaped and between ''
} else if (element != null) {
return element; // Others stay raw
}
return -1; // Invalid element
});
let orEquals = "";
for (const e of treated) {
orEquals += `${safeColName}=${e} OR `;
}
orEquals = orEquals.substring(0, orEquals.length - 4); // removes last ' OR '
return `AND (${orEquals}) `;
}
return '';
}
eugenio.pacceli Greetings! Internally MCS turns col IN (val1, val2..., valX) into a filter expression that equals to that you use in a workaround so I presume the incorrect filtering happens b/c of the truncation of the filter expression sent from MDB side into ExeMgr. I wonder if you tried binary search to get the threshold after which MCS delivers wrong results?
Roman
added a comment - eugenio.pacceli Greetings! Internally MCS turns col IN (val1, val2..., valX) into a filter expression that equals to that you use in a workaround so I presume the incorrect filtering happens b/c of the truncation of the filter expression sent from MDB side into ExeMgr. I wonder if you tried binary search to get the threshold after which MCS delivers wrong results?
alexey vorovich (Inactive)
added a comment - There are 2 issues here
-support of long list(unlimited lists ??/) . Agree with David on the correct approach with tmp table. Also see for comparison http://www.dba-oracle.com/t_maximum_number_of_sql_in_list_values.htm
-do we produce an error msg? allen.herrera if not, lets open a ticket for that
I tracked down the problem to the count of the number of items being stored in a 16 bit unsigned. This maxes out at 65535 count. The code doesn't break (segv or error of some sort) because the filters themselves are being stored in a bytestream which can handle arbitrary sized streams (within memory constraints). But the count rolls over so 65536 becomes 0, and all those filters are ignored.
Thinking this could be an easy fix, I tried to change it to a 32 bit int, but soon discovered that, in some code paths (meaning for some queries), this number is tied to the number of RIDs in a block, which is stored in a 16 bit unsigned. Rather than chase the rabbit down this possibly very deep hole, I backed out and added an edit to error out if the number of items is > 65535.
In truth, we shouldn't need to support any more than this. Really, any count > than 1000 is inefficient. The query should be re-worked to use a temp table to hold the values and a join or subselect used rather than "IN" (or "NOT IN", or "BETWEEN", etc.)
David Hall (Inactive)
added a comment - - edited I tracked down the problem to the count of the number of items being stored in a 16 bit unsigned. This maxes out at 65535 count. The code doesn't break (segv or error of some sort) because the filters themselves are being stored in a bytestream which can handle arbitrary sized streams (within memory constraints). But the count rolls over so 65536 becomes 0, and all those filters are ignored.
Thinking this could be an easy fix, I tried to change it to a 32 bit int, but soon discovered that, in some code paths (meaning for some queries), this number is tied to the number of RIDs in a block, which is stored in a 16 bit unsigned. Rather than chase the rabbit down this possibly very deep hole, I backed out and added an edit to error out if the number of items is > 65535.
In truth, we shouldn't need to support any more than this. Really, any count > than 1000 is inefficient. The query should be re-worked to use a temp table to hold the values and a join or subselect used rather than "IN" (or "NOT IN", or "BETWEEN", etc.)
I can testify to this IN clause unreliable behavior. Many of our queries using col IN (val1, val2, ...) were presenting incorrect results when compared to the equivalent query using the col=val1 OR col=val2 OR col=val3 expression. The later comes up with the correct results.
We have solved this kind of issue in our ColumnStore client server app by replacing every x IN (a,b,c...) by x=a OR x=b OR x=c .... in our query builder. The IN clause in ColumnStore is not reliable as of now!
Our AND x IN (vals...) clause - implemented with (x=vals OR ...) - builder, TypeScript:
andInIfNotEmpty(columnName: string, arr: Array<any>): string {
const safeColName = escape(columnName);
if (arr != null && arr.length > 0) {
const treated = arr.map(element => {
if (typeof element == 'string') {
return `'${escape(element)}'`; // String elements escaped and between ''
} else if (element != null) {
return element; // Others stay raw
}
return -1; // Invalid element
});
let orEquals = "";
for (const e of treated) {
orEquals += `${safeColName}=${e} OR `;
}
orEquals = orEquals.substring(0, orEquals.length - 4); // removes last ' OR '
return `AND (${orEquals}) `;
}
return '';
}