Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4800

IN clause wrong results -add error msg

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 5.5.2, 5.6.5, 6.2.1
    • 22.08.1
    • None
    • None
    • 2021-17

    Description

      When specifying 100,000 values ​​in SELECT WHERE IN(1,2,10,), only 34,464 rows are returned. If the table does not specify Engine = Columnstore, it returns 100,000 rows correctly.

      Is there a limit to the number of IN() in ColumnStore?γ€€

      (MAX 65,535?) MariaDB Version is 10.5.11.

      Attachments

        Issue Links

          Activity

            I can testify to this IN clause unreliable behavior. Many of our queries using col IN (val1, val2, ...) were presenting incorrect results when compared to the equivalent query using the col=val1 OR col=val2 OR col=val3 expression. The later comes up with the correct results.

            We have solved this kind of issue in our ColumnStore client server app by replacing every x IN (a,b,c...) by x=a OR x=b OR x=c .... in our query builder. The IN clause in ColumnStore is not reliable as of now!

            Our AND x IN (vals...) clause - implemented with (x=vals OR ...) - builder, TypeScript:

                andInIfNotEmpty(columnName: string, arr: Array<any>): string {
                    const safeColName = escape(columnName);
                    if (arr != null && arr.length > 0) {
                        const treated = arr.map(element => {
                            if (typeof element == 'string') {
                                return `'${escape(element)}'`; // String elements escaped and between ''
                            } else if (element != null) {
                                return element; // Others stay raw
                            }
                            return -1; // Invalid element
                        });
             
                        let orEquals = "";
             
                        for (const e of treated) {
                            orEquals += `${safeColName}=${e} OR `;
                        }
             
                        orEquals = orEquals.substring(0, orEquals.length - 4); // removes last ' OR '
             
                        return `AND (${orEquals}) `;
                    }
                    return '';
                }
            

            eugenio.pacceli EugΓͺnio Pacceli Reis da Fonseca added a comment - - edited I can testify to this IN clause unreliable behavior. Many of our queries using col IN (val1, val2, ...) were presenting incorrect results when compared to the equivalent query using the col=val1 OR col=val2 OR col=val3 expression. The later comes up with the correct results. We have solved this kind of issue in our ColumnStore client server app by replacing every x IN (a,b,c...) by x=a OR x=b OR x=c .... in our query builder. The IN clause in ColumnStore is not reliable as of now! Our AND x IN (vals...) clause - implemented with (x=vals OR ...) - builder, TypeScript: andInIfNotEmpty(columnName: string, arr: Array<any>): string { const safeColName = escape(columnName); if (arr != null && arr.length > 0) { const treated = arr.map(element => { if (typeof element == 'string') { return `'${escape(element)}'`; // String elements escaped and between '' } else if (element != null) { return element; // Others stay raw } return -1; // Invalid element });   let orEquals = "";   for (const e of treated) { orEquals += `${safeColName}=${e} OR `; }   orEquals = orEquals.substring(0, orEquals.length - 4); // removes last ' OR '   return `AND (${orEquals}) `; } return ''; }
            drrtuy Roman added a comment -

            eugenio.pacceli Greetings! Internally MCS turns col IN (val1, val2..., valX) into a filter expression that equals to that you use in a workaround so I presume the incorrect filtering happens b/c of the truncation of the filter expression sent from MDB side into ExeMgr. I wonder if you tried binary search to get the threshold after which MCS delivers wrong results?

            drrtuy Roman added a comment - eugenio.pacceli Greetings! Internally MCS turns col IN (val1, val2..., valX) into a filter expression that equals to that you use in a workaround so I presume the incorrect filtering happens b/c of the truncation of the filter expression sent from MDB side into ExeMgr. I wonder if you tried binary search to get the threshold after which MCS delivers wrong results?

            There are 2 issues here
            -support of long list(unlimited lists ??/) . Agree with David on the correct approach with tmp table. Also see for comparison http://www.dba-oracle.com/t_maximum_number_of_sql_in_list_values.htm
            -do we produce an error msg? allen.herrera if not, lets open a ticket for that

            alexey.vorovich alexey vorovich (Inactive) added a comment - There are 2 issues here -support of long list(unlimited lists ??/) . Agree with David on the correct approach with tmp table. Also see for comparison http://www.dba-oracle.com/t_maximum_number_of_sql_in_list_values.htm -do we produce an error msg? allen.herrera if not, lets open a ticket for that

            @dlee Please comment on this. I think we can close it but I'm not sure.

            David.Hall David Hall (Inactive) added a comment - @dlee Please comment on this. I think we can close it but I'm not sure.

            David.Hall , is tis the right ticket ? There is no PR here

            alexey.vorovich alexey vorovich (Inactive) added a comment - David.Hall , is tis the right ticket ? There is no PR here
            David.Hall David Hall (Inactive) added a comment - - edited

            I tracked down the problem to the count of the number of items being stored in a 16 bit unsigned. This maxes out at 65535 count. The code doesn't break (segv or error of some sort) because the filters themselves are being stored in a bytestream which can handle arbitrary sized streams (within memory constraints). But the count rolls over so 65536 becomes 0, and all those filters are ignored.

            Thinking this could be an easy fix, I tried to change it to a 32 bit int, but soon discovered that, in some code paths (meaning for some queries), this number is tied to the number of RIDs in a block, which is stored in a 16 bit unsigned. Rather than chase the rabbit down this possibly very deep hole, I backed out and added an edit to error out if the number of items is > 65535.

            In truth, we shouldn't need to support any more than this. Really, any count > than 1000 is inefficient. The query should be re-worked to use a temp table to hold the values and a join or subselect used rather than "IN" (or "NOT IN", or "BETWEEN", etc.)

            David.Hall David Hall (Inactive) added a comment - - edited I tracked down the problem to the count of the number of items being stored in a 16 bit unsigned. This maxes out at 65535 count. The code doesn't break (segv or error of some sort) because the filters themselves are being stored in a bytestream which can handle arbitrary sized streams (within memory constraints). But the count rolls over so 65536 becomes 0, and all those filters are ignored. Thinking this could be an easy fix, I tried to change it to a 32 bit int, but soon discovered that, in some code paths (meaning for some queries), this number is tied to the number of RIDs in a block, which is stored in a 16 bit unsigned. Rather than chase the rabbit down this possibly very deep hole, I backed out and added an edit to error out if the number of items is > 65535. In truth, we shouldn't need to support any more than this. Really, any count > than 1000 is inefficient. The query should be re-worked to use a temp table to hold the values and a join or subselect used rather than "IN" (or "NOT IN", or "BETWEEN", etc.)

            Build verified: 22.08-1 (#5243)

            Verified reported test scenario:

            [root@s0 data]# mariadb mytest < sample.sql|wc -l
            ERROR 1815 (HY000) at line 1: Internal error: MCS-1001: Function 'in with argument count > 65535' isn't supported.
            0

            dleeyh Daniel Lee (Inactive) added a comment - Build verified: 22.08-1 (#5243) Verified reported test scenario: [root@s0 data] # mariadb mytest < sample.sql|wc -l ERROR 1815 (HY000) at line 1: Internal error: MCS-1001: Function 'in with argument count > 65535' isn't supported. 0

            People

              David.Hall David Hall (Inactive)
              toddstoffel Todd Stoffel (Inactive)
              Roman Roman
              Daniel Lee Daniel Lee (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.