Tested on Debian with 10.3.21 and Gentoo with 10.2.16.
Description
Hi,
today we encountered one strange behavior in an older project. Below are simple SQL statements that quickly show where the problem is.
When combining a number and a number as a string in an IN WHERE condition, it sometimes returns extra unwanted records.
As you can see, condition id = 1234 OR id = "97716021308405775" returns expected 2 records, but equivalent id IN(1234,"97716021308405775") returns 4 records. This unexpected behavior is independent on storage engine - it works the same with InnoDB, MYISAM or TokuDB.
We'll fix our ORM layer to use the IN condition correctly (with either numbers or strings only), but maybe it's a bug in MariaDB that should be fixed. Maybe it's just old, familiar, unexpected behavior that can't be corrected?
DROP TABLE IF EXISTS test;
CREATE TABLE `test` (
`id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
INSERT IGNORE INTO `test` VALUES(1234),(97716021308405775),(97716021308405770),(97716021308405780);
– OK
SELECT * FROM `test` WHERE `id` IN(97716021308405775);
SELECT * FROM `test` WHERE `id` IN("97716021308405775");
SELECT * FROM `test` WHERE `id` IN("1234",97716021308405775);
SELECT * FROM `test` WHERE `id` IN("1234","97716021308405775");
SELECT * FROM `test` WHERE `id` = 1234 OR id = "97716021308405775";
– NOT OK (returns 4 records instead of 2)
SELECT * FROM `test` WHERE `id` IN(1234,"97716021308405775");
SELECT * FROM `test` WHERE `id` IN("97716021308405775",1234);
The problem happens because convert_const_to_int() is applied inconsistently:
When this query is performed:
SELECT * FROM t1 WHERE id = 1234 OR id = "97716021308405775";
the string constant "97716021308405775" gets converted to a longlong constant 97716021308405775 using convert_const_to_int(). The further comparision is done in longlong format.
When this query is performed:
SELECT * FROM test WHERE id IN("1234","97716021308405775");
all string constants get converted to longlong constants using convert_const_to_int() and comparison is further done in longlong format.
When this query is performed:
SELECT * FROM test WHERE id IN(1234,"97716021308405775");
conversion of the string constant "97716021308405775" to longlong does not happen because pairs (id,1234) and (id,"97716021308405775") are compared using different data types. The latter is performed using DOUBLE format, which leads to precision loss and therefore extra rows are returned.
Alexander Barkov
added a comment - - edited The problem happens because convert_const_to_int() is applied inconsistently:
When this query is performed:
SELECT * FROM t1 WHERE id = 1234 OR id = "97716021308405775" ;
the string constant "97716021308405775" gets converted to a longlong constant 97716021308405775 using convert_const_to_int() . The further comparision is done in longlong format.
When this query is performed:
SELECT * FROM test WHERE id IN ( "1234" , "97716021308405775" );
all string constants get converted to longlong constants using convert_const_to_int() and comparison is further done in longlong format.
When this query is performed:
SELECT * FROM test WHERE id IN (1234, "97716021308405775" );
conversion of the string constant "97716021308405775" to longlong does not happen because pairs (id,1234) and (id,"97716021308405775") are compared using different data types. The latter is performed using DOUBLE format, which leads to precision loss and therefore extra rows are returned.
Alexander Barkov
added a comment - The same problem happen with a simple CASE:
CREATE OR REPLACE TABLE t1 (id bigint (20) unsigned NOT NULL PRIMARY KEY );
INSERT INTO t1 VALUES (1234),(97716021308405775),(97716021308405770),(97716021308405780);
SELECT
id,
CASE id
WHEN '97716021308405770' THEN '70'
WHEN '97716021308405775' THEN '75'
WHEN '97716021308405780' THEN '80'
END AS c1
FROM t1;
+-------------------+------+
| id | c1 |
+-------------------+------+
| 1234 | NULL |
| 97716021308405770 | 70 |
| 97716021308405775 | 70 |
| 97716021308405780 | 70 |
+-------------------+------+
Alexander Barkov
added a comment - BETWEEN works fine:
CREATE OR REPLACE TABLE t1 (id bigint (20) unsigned NOT NULL PRIMARY KEY );
INSERT INTO t1 VALUES (1234),(97716021308405775),(97716021308405770),(97716021308405780);
SELECT id, id BETWEEN "97716021308405775" AND "97716021308405775" AS c1 FROM t1;
+-------------------+----+
| id | c1 |
+-------------------+----+
| 1234 | 0 |
| 97716021308405770 | 0 |
| 97716021308405775 | 1 |
| 97716021308405780 | 0 |
+-------------------+----+
Searched CASE works fine:
CREATE OR REPLACE TABLE t1 (id bigint (20) unsigned NOT NULL PRIMARY KEY );
INSERT INTO t1 VALUES (1234),(97716021308405775),(97716021308405770),(97716021308405780);
SELECT
id,
CASE
WHEN id= '97716021308405770' THEN '70'
WHEN id= '97716021308405775' THEN '75'
WHEN id= '97716021308405780' THEN '80'
END AS c1
FROM t1;
+-------------------+------+
| id | c1 |
+-------------------+------+
| 1234 | NULL |
| 97716021308405770 | 70 |
| 97716021308405775 | 75 |
| 97716021308405780 | 80 |
+-------------------+------+
A possible solution would be to compare pairs (int,string) as DECIMAL rather than DOUBLE.
@serg suggests trying this starting from 10.2.
We should suppress warnings when decimal2string() runs outside of the DECIMAL range, e.g. :
SELECT'1e100'=1;
as the comparison to INT will work correctly anyway.
Alexander Barkov
added a comment - - edited A possible solution would be to compare pairs (int,string) as DECIMAL rather than DOUBLE.
@serg suggests trying this starting from 10.2.
We should suppress warnings when decimal2string() runs outside of the DECIMAL range, e.g. :
SELECT '1e100' =1;
as the comparison to INT will work correctly anyway.
SELECT * FROM t1 WHERE id IN('97716021308405775');
SELECT * FROM t1 WHERE id IN('1234',97716021308405775);
SELECT * FROM t1 WHERE id IN('1234','97716021308405775');
SELECT * FROM t1 WHERE id = 1234 OR id = '97716021308405775';
-- NOT OK (returns 4 records instead of 2)
SELECT * FROM t1 WHERE id IN(1234,'97716021308405775');
SELECT * FROM t1 WHERE id IN('97716021308405775',1234);
Alexander Barkov
added a comment - The problem is also repeatable with engines MyISAM and HEAP.
CREATE OR REPLACE TABLE t1 (
id bigint (20) unsigned NOT NULL ,
PRIMARY KEY (id)
) ENGINE=MyISAM;
INSERT IGNORE INTO t1 VALUES (1234),(97716021308405775),(97716021308405770),(97716021308405780);
SELECT * FROM t1 WHERE id IN (97716021308405775);
SELECT * FROM t1 WHERE id IN ( '97716021308405775' );
SELECT * FROM t1 WHERE id IN ( '1234' ,97716021308405775);
SELECT * FROM t1 WHERE id IN ( '1234' , '97716021308405775' );
SELECT * FROM t1 WHERE id = 1234 OR id = '97716021308405775' ;
-- NOT OK (returns 4 records instead of 2)
SELECT * FROM t1 WHERE id IN (1234, '97716021308405775' );
SELECT * FROM t1 WHERE id IN ( '97716021308405775' ,1234);
Alexander Barkov
added a comment - serg , please review a patch for 10.3:
https://github.com/MariaDB/server/commit/d19f380d62c3145258c229d102b829c2f1c77223
Thanks.
Thanks for the report. Reproducible as described on 5.5-10.5 and MySQL 5.6, 8.0.