Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21445

Strange/inconsistent behavior of IN condition when mixing numbers and strings

Details

    Description

      Hi,

      today we encountered one strange behavior in an older project. Below are simple SQL statements that quickly show where the problem is.

      When combining a number and a number as a string in an IN WHERE condition, it sometimes returns extra unwanted records.

      As you can see, condition id = 1234 OR id = "97716021308405775" returns expected 2 records, but equivalent id IN(1234,"97716021308405775") returns 4 records. This unexpected behavior is independent on storage engine - it works the same with InnoDB, MYISAM or TokuDB.

      We'll fix our ORM layer to use the IN condition correctly (with either numbers or strings only), but maybe it's a bug in MariaDB that should be fixed. Maybe it's just old, familiar, unexpected behavior that can't be corrected?

      DROP TABLE IF EXISTS test;
      CREATE TABLE `test` (
      `id` bigint(20) unsigned NOT NULL,
      PRIMARY KEY (`id`)
      ) ENGINE=InnoDB;

      INSERT IGNORE INTO `test` VALUES(1234),(97716021308405775),(97716021308405770),(97716021308405780);

      – OK
      SELECT * FROM `test` WHERE `id` IN(97716021308405775);
      SELECT * FROM `test` WHERE `id` IN("97716021308405775");
      SELECT * FROM `test` WHERE `id` IN("1234",97716021308405775);
      SELECT * FROM `test` WHERE `id` IN("1234","97716021308405775");
      SELECT * FROM `test` WHERE `id` = 1234 OR id = "97716021308405775";

      – NOT OK (returns 4 records instead of 2)
      SELECT * FROM `test` WHERE `id` IN(1234,"97716021308405775");
      SELECT * FROM `test` WHERE `id` IN("97716021308405775",1234);

      Attachments

        Issue Links

          Activity

            Thanks for the report. Reproducible as described on 5.5-10.5 and MySQL 5.6, 8.0.

            elenst Elena Stepanova added a comment - Thanks for the report. Reproducible as described on 5.5-10.5 and MySQL 5.6, 8.0.
            bar Alexander Barkov added a comment - - edited

            The problem happens because convert_const_to_int() is applied inconsistently:

            • When this query is performed:

              SELECT * FROM t1 WHERE id = 1234 OR id = "97716021308405775";
              

              the string constant "97716021308405775" gets converted to a longlong constant 97716021308405775 using convert_const_to_int(). The further comparision is done in longlong format.

            • When this query is performed:

              SELECT * FROM test WHERE id IN("1234","97716021308405775");
              

              all string constants get converted to longlong constants using convert_const_to_int() and comparison is further done in longlong format.

            • When this query is performed:

              SELECT * FROM test WHERE id IN(1234,"97716021308405775");
              

              conversion of the string constant "97716021308405775" to longlong does not happen because pairs (id,1234) and (id,"97716021308405775") are compared using different data types. The latter is performed using DOUBLE format, which leads to precision loss and therefore extra rows are returned.

            bar Alexander Barkov added a comment - - edited The problem happens because convert_const_to_int() is applied inconsistently: When this query is performed: SELECT * FROM t1 WHERE id = 1234 OR id = "97716021308405775" ; the string constant "97716021308405775" gets converted to a longlong constant 97716021308405775 using convert_const_to_int() . The further comparision is done in longlong format. When this query is performed: SELECT * FROM test WHERE id IN ( "1234" , "97716021308405775" ); all string constants get converted to longlong constants using convert_const_to_int() and comparison is further done in longlong format. When this query is performed: SELECT * FROM test WHERE id IN (1234, "97716021308405775" ); conversion of the string constant "97716021308405775" to longlong does not happen because pairs (id,1234) and (id,"97716021308405775") are compared using different data types. The latter is performed using DOUBLE format, which leads to precision loss and therefore extra rows are returned.

            The same problem happen with a simple CASE:

            CREATE OR REPLACE TABLE t1 (id bigint(20) unsigned NOT NULL PRIMARY KEY);
            INSERT INTO t1 VALUES(1234),(97716021308405775),(97716021308405770),(97716021308405780);
            SELECT
              id,
              CASE id
                WHEN '97716021308405770' THEN '70'
                WHEN '97716021308405775' THEN '75'
                WHEN '97716021308405780' THEN '80'
              END AS c1
            FROM t1;
            

            +-------------------+------+
            | id                | c1   |
            +-------------------+------+
            |              1234 | NULL |
            | 97716021308405770 | 70   |
            | 97716021308405775 | 70   |
            | 97716021308405780 | 70   |
            +-------------------+------+
            

            bar Alexander Barkov added a comment - The same problem happen with a simple CASE: CREATE OR REPLACE TABLE t1 (id bigint (20) unsigned NOT NULL PRIMARY KEY ); INSERT INTO t1 VALUES (1234),(97716021308405775),(97716021308405770),(97716021308405780); SELECT id, CASE id WHEN '97716021308405770' THEN '70' WHEN '97716021308405775' THEN '75' WHEN '97716021308405780' THEN '80' END AS c1 FROM t1; +-------------------+------+ | id | c1 | +-------------------+------+ | 1234 | NULL | | 97716021308405770 | 70 | | 97716021308405775 | 70 | | 97716021308405780 | 70 | +-------------------+------+

            BETWEEN works fine:

            CREATE OR REPLACE TABLE t1 (id bigint(20) unsigned NOT NULL PRIMARY KEY);
            INSERT INTO t1 VALUES(1234),(97716021308405775),(97716021308405770),(97716021308405780);
            SELECT id, id BETWEEN "97716021308405775" AND "97716021308405775" AS c1 FROM t1;
            

            +-------------------+----+
            | id                | c1 |
            +-------------------+----+
            |              1234 |  0 |
            | 97716021308405770 |  0 |
            | 97716021308405775 |  1 |
            | 97716021308405780 |  0 |
            +-------------------+----+
            

            Searched CASE works fine:

            CREATE OR REPLACE TABLE t1 (id bigint(20) unsigned NOT NULL PRIMARY KEY);
            INSERT INTO t1 VALUES(1234),(97716021308405775),(97716021308405770),(97716021308405780);
            SELECT
              id,
              CASE
                WHEN id='97716021308405770' THEN '70'
                WHEN id='97716021308405775' THEN '75'
                WHEN id='97716021308405780' THEN '80'
              END AS c1
            FROM t1;
            

            +-------------------+------+
            | id                | c1   |
            +-------------------+------+
            |              1234 | NULL |
            | 97716021308405770 | 70   |
            | 97716021308405775 | 75   |
            | 97716021308405780 | 80   |
            +-------------------+------+
            

            bar Alexander Barkov added a comment - BETWEEN works fine: CREATE OR REPLACE TABLE t1 (id bigint (20) unsigned NOT NULL PRIMARY KEY ); INSERT INTO t1 VALUES (1234),(97716021308405775),(97716021308405770),(97716021308405780); SELECT id, id BETWEEN "97716021308405775" AND "97716021308405775" AS c1 FROM t1; +-------------------+----+ | id | c1 | +-------------------+----+ | 1234 | 0 | | 97716021308405770 | 0 | | 97716021308405775 | 1 | | 97716021308405780 | 0 | +-------------------+----+ Searched CASE works fine: CREATE OR REPLACE TABLE t1 (id bigint (20) unsigned NOT NULL PRIMARY KEY ); INSERT INTO t1 VALUES (1234),(97716021308405775),(97716021308405770),(97716021308405780); SELECT id, CASE WHEN id= '97716021308405770' THEN '70' WHEN id= '97716021308405775' THEN '75' WHEN id= '97716021308405780' THEN '80' END AS c1 FROM t1; +-------------------+------+ | id | c1 | +-------------------+------+ | 1234 | NULL | | 97716021308405770 | 70 | | 97716021308405775 | 75 | | 97716021308405780 | 80 | +-------------------+------+
            bar Alexander Barkov added a comment - - edited

            A possible solution would be to compare pairs (int,string) as DECIMAL rather than DOUBLE.

            @serg suggests trying this starting from 10.2.

            We should suppress warnings when decimal2string() runs outside of the DECIMAL range, e.g. :

            SELECT '1e100'=1;
            

            as the comparison to INT will work correctly anyway.

            bar Alexander Barkov added a comment - - edited A possible solution would be to compare pairs (int,string) as DECIMAL rather than DOUBLE. @serg suggests trying this starting from 10.2. We should suppress warnings when decimal2string() runs outside of the DECIMAL range, e.g. : SELECT '1e100' =1; as the comparison to INT will work correctly anyway.

            The problem is also repeatable with engines MyISAM and HEAP.

            CREATE OR REPLACE TABLE t1 (
              id bigint(20) unsigned NOT NULL,
              PRIMARY KEY (id)
            ) ENGINE=MyISAM;
             
            INSERT IGNORE INTO t1 VALUES(1234),(97716021308405775),(97716021308405770),(97716021308405780);
             
            SELECT * FROM t1 WHERE id IN(97716021308405775);
            SELECT * FROM t1 WHERE id IN('97716021308405775');
            SELECT * FROM t1 WHERE id IN('1234',97716021308405775);
            SELECT * FROM t1 WHERE id IN('1234','97716021308405775');
            SELECT * FROM t1 WHERE id = 1234 OR id = '97716021308405775';
             
            -- NOT OK (returns 4 records instead of 2)
            SELECT * FROM t1 WHERE id IN(1234,'97716021308405775');
            SELECT * FROM t1 WHERE id IN('97716021308405775',1234);
            

            bar Alexander Barkov added a comment - The problem is also repeatable with engines MyISAM and HEAP. CREATE OR REPLACE TABLE t1 ( id bigint (20) unsigned NOT NULL , PRIMARY KEY (id) ) ENGINE=MyISAM;   INSERT IGNORE INTO t1 VALUES (1234),(97716021308405775),(97716021308405770),(97716021308405780);   SELECT * FROM t1 WHERE id IN (97716021308405775); SELECT * FROM t1 WHERE id IN ( '97716021308405775' ); SELECT * FROM t1 WHERE id IN ( '1234' ,97716021308405775); SELECT * FROM t1 WHERE id IN ( '1234' , '97716021308405775' ); SELECT * FROM t1 WHERE id = 1234 OR id = '97716021308405775' ;   -- NOT OK (returns 4 records instead of 2) SELECT * FROM t1 WHERE id IN (1234, '97716021308405775' ); SELECT * FROM t1 WHERE id IN ( '97716021308405775' ,1234);
            bar Alexander Barkov added a comment - serg , please review a patch for 10.3: https://github.com/MariaDB/server/commit/d19f380d62c3145258c229d102b829c2f1c77223 Thanks.
            elenst Elena Stepanova added a comment - sanja : Probably this is the review: https://lists.launchpad.net/maria-developers/msg12411.html

            People

              bar Alexander Barkov
              jan.reges Ján Regeš
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.