[MDEV-6978] Bad results with join comparing case insensitive VARCHAR/ENUM/SET expression to a _bin ENUM column Created: 2014-10-29  Updated: 2021-07-12  Resolved: 2021-07-12

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.2.29, 10.1.44, 10.3.22, 10.4.12, 10.5.1
Fix Version/s: 10.0.15

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-26129 Bad results with join comparing case ... Closed

 Description   

DROP TABLE IF EXISTS t1,t2;
CREATE TABLE t1 (a ENUM('a') CHARACTER SET latin1 PRIMARY KEY);
INSERT INTO t1 VALUES ('a');
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 (a ENUM('a','A') CHARACTER SET latin1 COLLATE latin1_bin);
INSERT INTO t2 VALUES ('a'),('A');
SELECT t1.* FROM t1 JOIN t2 ON t1.a COLLATE latin1_swedish_ci=t2.a;
SELECT t1.* FROM t1 LEFT JOIN t2 ON t1.a COLLATE latin1_swedish_ci=t2.a;

This script correctly returns 2 rows for the both SELECT queries.

+---+
| a |
+---+
| a |
| a |
+---+
2 rows in set (0.00 sec)

If I now add a primary key on t2 and rerun the queries again:

ALTER TABLE t2 ADD PRIMARY KEY(a);
SELECT t1.* FROM t1 JOIN t2 ON t1.a COLLATE latin1_swedish_ci=t2.a;
SELECT t1.* FROM t1 LEFT JOIN t2 ON t1.a COLLATE latin1_swedish_ci=t2.a;

both queries return only one rows:

+---+
| a |
+---+
| a |
+---+
1 row in set (0.00 sec)

This is wrong. Two rows should always be returned.

The same problem is repeatable if I change data type for t1.a from ENUM to VARCHAR(10) or SET('a').



 Comments   
Comment by Igor Babaev [ 2020-03-01 ]

I've reopened this bug because I found a variant of the test case for MDEV-6978 that does not work properly.
Let's change the definition of table t2 for

CREATE TABLE t2 (c1 ENUM('a','A','b','B','c','C','d','D','e','E') CHARACTER SET latin1 COLLATE latin1_bin); 

Now let's add one row

INSERT INTO t2 VALUES ('a'),('A');

After this let's execute the query from your test case

SELECT t1.* FROM t1 LEFT JOIN t2 ON t1.c1 COLLATE latin1_swedish_ci=t2.c1;

We get the result

MariaDB [test]> SELECT t1.* FROM t1 LEFT JOIN t2 ON t1.c1 COLLATE latin1_swedish_ci=t2.c1;
+----+
| c1 |
+----+
| a  |
| a  |
+----+

Now let's add more rows:

INSERT INTO t2 VALUES ('b'),('B'),('c'),('C'),('d'),('D'),('e'),('E');

The query returns the same result set as expected.
Now let's add the primary key as in your test case

ALTER TABLE t2 ADD PRIMARY KEY(c1);

Now the query returns a wrong result set

MariaDB [test]> SELECT t1.* FROM t1 LEFT JOIN t2 ON t1.c1 COLLATE latin n1_swedish_ci=t2.c1;
+----+
| c1 |
+----+
| a  |
+----+

Note that now the query uses a plan different from that used in your test case after addition of the primary key. This plan employs a range index scan rather then a full index scan as in your test case.

MariaDB [test]> EXPLAIN SELECT t1.* FROM t1 LEFT JOIN t2 ON t1.c1 COLLATE latin1_swedish_ci=t2.c1;
+------+-------------+-------+--------+---------------+---------+---------+------+------+--------------------------+
| id   | select_type | table | type   | possible_keys | key     | key_len | ref  | rows | Extra                    |
+------+-------------+-------+--------+---------------+---------+---------+------+------+--------------------------+
|    1 | SIMPLE      | t1    | system | NULL          | NULL    | NULL    | NULL |    1 |                          |
|    1 | SIMPLE      | t2    | range  | PRIMARY       | PRIMARY | 1       | NULL |    1 | Using where; Using index |
+------+-------------+-------+--------+---------------+---------+---------+------+------+--------------------------+

I reproduced the problem in the current 10.4 and 10.1. All other versions most probably are also affected.

Comment by Sergei Golubchik [ 2021-07-12 ]

closed again. new bug is moved to MDEV-26129

Generated at Thu Feb 08 07:16:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.