Details
-
Bug
-
Status: Open (View Workflow)
-
Minor
-
Resolution: Unresolved
-
5.5.32, 5.1.67, 5.2.14, 5.3.12
-
None
-
None
Description
This script:
DROP TABLE IF EXISTS t1; |
CREATE TABLE t1 (s1 VARCHAR (60) CHARACTER SET UTF8 COLLATE UTF8_UNICODE_CI) ENGINE = MyISAM; |
CREATE FULLTEXT INDEX i ON t1 (s1);INSERT INTO t1 VALUES ('a'),('b'),('c'),('d'),('ÓÓÓÓ'),('OOOO'),(NULL),('ÓÓÓÓ ÓÓÓÓ'),('OOOOOOOO'); |
SELECT * FROM t1 WHERE MATCH(s1) AGAINST ('OOOO' COLLATE UTF8_POLISH_CI); |
returns wrong results:
+-------------------+
|
| s1 |
|
+-------------------+
|
| ÓÓÓÓ |
|
| OOOO |
|
| ÓÓÓÓ ÓÓÓÓ |
|
+-------------------+
|
The problem is that in the collation utf8_polish_ci
the letter "O WITH ACUTE" is not equal to "O":
MariaDB [test]> SELECT 'O'='Ó' COLLATE utf8_polish_ci; |
+---------------------------------+ |
| 'O'='Ó' COLLATE utf8_polish_ci | |
+---------------------------------+ |
| 0 |
|
+---------------------------------+ |
Therefore, returning the records containing accented letter
in the above query is wrong.
Further investigation with EXPLAIN:
MySQL [test]> EXPLAIN SELECT * FROM t1 WHERE MATCH(s1) AGAINST ('OOOO' COLLATE UTF8_POLISH_CI);
|
+----+-------------+-------+----------+---------------+------+---------+------+------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+----+-------------+-------+----------+---------------+------+---------+------+------+-------------+
|
| 1 | SIMPLE | t1 | fulltext | i | i | 0 | NULL | 1 | Using where |
|
+----+-------------+-------+----------+---------------+------+---------+------+------+-------------+
|
tells that MATCH erroneously chooses the non-suitable index "i"
whose collation is NOT equal to the collation of the operation.
If I further drop the index and use BOOLEAN MODE:
ALTER TABLE t1 DROP INDEX i; |
SELECT * FROM t1 WHERE MATCH(s1) AGAINST ('OOOO' COLLATE UTF8_POLISH_CI IN BOOLEAN MODE); |
it still returns wrong results:
+-------------------+
|
| s1 |
|
+-------------------+
|
| ÓÓÓÓ |
|
| OOOO |
|
| ÓÓÓÓ ÓÓÓÓ |
|
+-------------------+
|
It seems MATCH always uses the collation of the field and does not follow the
standard argument collation aggregation rules.