[MDEV-21445] Strange/inconsistent behavior of IN condition when mixing numbers and strings Created: 2020-01-08  Updated: 2022-08-18  Resolved: 2022-07-06

Status: Closed
Project: MariaDB Server
Component/s: Data types
Affects Version/s: 5.5, 10.1, 10.2.16, 10.3.21, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.3.36, 10.4.26, 10.5.17, 10.6.9, 10.7.5, 10.8.4, 10.9.2

Type: Bug Priority: Critical
Reporter: Ján Regeš Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: in, upstream, where
Environment:

Tested on Debian with 10.3.21 and Gentoo with 10.2.16.


Attachments: Text File MDEV-21445-REVIEW.txt     File MDEV-21445.diff    
Issue Links:
Blocks
is blocked by MDEV-22976 CAST(JSON_EXTRACT() AS DECIMAL) does ... Closed
is blocked by MDEV-29041 Redundant truncation warning on CAST(... Closed
Duplicate
is duplicated by MDEV-25492 BETWEEN clause returns incorrect resu... Closed
Relates
relates to MDEV-25871 compare error bigint to varchar Closed
relates to MDEV-29259 Comparison semantic of int = string c... Confirmed

 Description   

Hi,

today we encountered one strange behavior in an older project. Below are simple SQL statements that quickly show where the problem is.

When combining a number and a number as a string in an IN WHERE condition, it sometimes returns extra unwanted records.

As you can see, condition id = 1234 OR id = "97716021308405775" returns expected 2 records, but equivalent id IN(1234,"97716021308405775") returns 4 records. This unexpected behavior is independent on storage engine - it works the same with InnoDB, MYISAM or TokuDB.

We'll fix our ORM layer to use the IN condition correctly (with either numbers or strings only), but maybe it's a bug in MariaDB that should be fixed. Maybe it's just old, familiar, unexpected behavior that can't be corrected?

DROP TABLE IF EXISTS test;
CREATE TABLE `test` (
`id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;

INSERT IGNORE INTO `test` VALUES(1234),(97716021308405775),(97716021308405770),(97716021308405780);

– OK
SELECT * FROM `test` WHERE `id` IN(97716021308405775);
SELECT * FROM `test` WHERE `id` IN("97716021308405775");
SELECT * FROM `test` WHERE `id` IN("1234",97716021308405775);
SELECT * FROM `test` WHERE `id` IN("1234","97716021308405775");
SELECT * FROM `test` WHERE `id` = 1234 OR id = "97716021308405775";

– NOT OK (returns 4 records instead of 2)
SELECT * FROM `test` WHERE `id` IN(1234,"97716021308405775");
SELECT * FROM `test` WHERE `id` IN("97716021308405775",1234);



 Comments   
Comment by Elena Stepanova [ 2020-01-13 ]

Thanks for the report. Reproducible as described on 5.5-10.5 and MySQL 5.6, 8.0.

Comment by Alexander Barkov [ 2020-01-14 ]

The problem happens because convert_const_to_int() is applied inconsistently:

  • When this query is performed:

    SELECT * FROM t1 WHERE id = 1234 OR id = "97716021308405775";
    

    the string constant "97716021308405775" gets converted to a longlong constant 97716021308405775 using convert_const_to_int(). The further comparision is done in longlong format.

  • When this query is performed:

    SELECT * FROM test WHERE id IN("1234","97716021308405775");
    

    all string constants get converted to longlong constants using convert_const_to_int() and comparison is further done in longlong format.

  • When this query is performed:

    SELECT * FROM test WHERE id IN(1234,"97716021308405775");
    

    conversion of the string constant "97716021308405775" to longlong does not happen because pairs (id,1234) and (id,"97716021308405775") are compared using different data types. The latter is performed using DOUBLE format, which leads to precision loss and therefore extra rows are returned.

Comment by Alexander Barkov [ 2020-06-18 ]

The same problem happen with a simple CASE:

CREATE OR REPLACE TABLE t1 (id bigint(20) unsigned NOT NULL PRIMARY KEY);
INSERT INTO t1 VALUES(1234),(97716021308405775),(97716021308405770),(97716021308405780);
SELECT
  id,
  CASE id
    WHEN '97716021308405770' THEN '70'
    WHEN '97716021308405775' THEN '75'
    WHEN '97716021308405780' THEN '80'
  END AS c1
FROM t1;

+-------------------+------+
| id                | c1   |
+-------------------+------+
|              1234 | NULL |
| 97716021308405770 | 70   |
| 97716021308405775 | 70   |
| 97716021308405780 | 70   |
+-------------------+------+

Comment by Alexander Barkov [ 2020-06-18 ]

BETWEEN works fine:

CREATE OR REPLACE TABLE t1 (id bigint(20) unsigned NOT NULL PRIMARY KEY);
INSERT INTO t1 VALUES(1234),(97716021308405775),(97716021308405770),(97716021308405780);
SELECT id, id BETWEEN "97716021308405775" AND "97716021308405775" AS c1 FROM t1;

+-------------------+----+
| id                | c1 |
+-------------------+----+
|              1234 |  0 |
| 97716021308405770 |  0 |
| 97716021308405775 |  1 |
| 97716021308405780 |  0 |
+-------------------+----+

Searched CASE works fine:

CREATE OR REPLACE TABLE t1 (id bigint(20) unsigned NOT NULL PRIMARY KEY);
INSERT INTO t1 VALUES(1234),(97716021308405775),(97716021308405770),(97716021308405780);
SELECT
  id,
  CASE
    WHEN id='97716021308405770' THEN '70'
    WHEN id='97716021308405775' THEN '75'
    WHEN id='97716021308405780' THEN '80'
  END AS c1
FROM t1;

+-------------------+------+
| id                | c1   |
+-------------------+------+
|              1234 | NULL |
| 97716021308405770 | 70   |
| 97716021308405775 | 75   |
| 97716021308405780 | 80   |
+-------------------+------+

Comment by Alexander Barkov [ 2020-08-10 ]

A possible solution would be to compare pairs (int,string) as DECIMAL rather than DOUBLE.

@serg suggests trying this starting from 10.2.

We should suppress warnings when decimal2string() runs outside of the DECIMAL range, e.g. :

SELECT '1e100'=1;

as the comparison to INT will work correctly anyway.

Comment by Alexander Barkov [ 2020-08-10 ]

The problem is also repeatable with engines MyISAM and HEAP.

CREATE OR REPLACE TABLE t1 (
  id bigint(20) unsigned NOT NULL,
  PRIMARY KEY (id)
) ENGINE=MyISAM;
 
INSERT IGNORE INTO t1 VALUES(1234),(97716021308405775),(97716021308405770),(97716021308405780);
 
SELECT * FROM t1 WHERE id IN(97716021308405775);
SELECT * FROM t1 WHERE id IN('97716021308405775');
SELECT * FROM t1 WHERE id IN('1234',97716021308405775);
SELECT * FROM t1 WHERE id IN('1234','97716021308405775');
SELECT * FROM t1 WHERE id = 1234 OR id = '97716021308405775';
 
-- NOT OK (returns 4 records instead of 2)
SELECT * FROM t1 WHERE id IN(1234,'97716021308405775');
SELECT * FROM t1 WHERE id IN('97716021308405775',1234);

Comment by Alexander Barkov [ 2020-08-25 ]

serg, please review a patch for 10.3:

https://github.com/MariaDB/server/commit/d19f380d62c3145258c229d102b829c2f1c77223

Thanks.

Comment by Elena Stepanova [ 2021-09-21 ]

sanja:

Probably this is the review: https://lists.launchpad.net/maria-developers/msg12411.html

Generated at Thu Feb 08 09:07:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.