|
This script:
DROP TABLE IF EXISTS t1;
|
CREATE TABLE t1 (s1 VARCHAR (60) CHARACTER SET UTF8 COLLATE UTF8_UNICODE_CI) ENGINE = MyISAM;
|
CREATE FULLTEXT INDEX i ON t1 (s1);INSERT INTO t1 VALUES ('a'),('b'),('c'),('d'),('ÓÓÓÓ'),('OOOO'),(NULL),('ÓÓÓÓ ÓÓÓÓ'),('OOOOOOOO');
|
SELECT * FROM t1 WHERE MATCH(s1) AGAINST ('OOOO' COLLATE UTF8_POLISH_CI);
|
returns wrong results:
+-------------------+
|
| s1 |
|
+-------------------+
|
| ÓÓÓÓ |
|
| OOOO |
|
| ÓÓÓÓ ÓÓÓÓ |
|
+-------------------+
|
The problem is that in the collation utf8_polish_ci
the letter "O WITH ACUTE" is not equal to "O":
MariaDB [test]> SELECT 'O'='Ó' COLLATE utf8_polish_ci;
|
+---------------------------------+
|
| 'O'='Ó' COLLATE utf8_polish_ci |
|
+---------------------------------+
|
| 0 |
|
+---------------------------------+
|
Therefore, returning the records containing accented letter
in the above query is wrong.
Further investigation with EXPLAIN:
MySQL [test]> EXPLAIN SELECT * FROM t1 WHERE MATCH(s1) AGAINST ('OOOO' COLLATE UTF8_POLISH_CI);
|
+----+-------------+-------+----------+---------------+------+---------+------+------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+----+-------------+-------+----------+---------------+------+---------+------+------+-------------+
|
| 1 | SIMPLE | t1 | fulltext | i | i | 0 | NULL | 1 | Using where |
|
+----+-------------+-------+----------+---------------+------+---------+------+------+-------------+
|
tells that MATCH erroneously chooses the non-suitable index "i"
whose collation is NOT equal to the collation of the operation.
If I further drop the index and use BOOLEAN MODE:
ALTER TABLE t1 DROP INDEX i;
|
SELECT * FROM t1 WHERE MATCH(s1) AGAINST ('OOOO' COLLATE UTF8_POLISH_CI IN BOOLEAN MODE);
|
it still returns wrong results:
+-------------------+
|
| s1 |
|
+-------------------+
|
| ÓÓÓÓ |
|
| OOOO |
|
| ÓÓÓÓ ÓÓÓÓ |
|
+-------------------+
|
It seems MATCH always uses the collation of the field and does not follow the
standard argument collation aggregation rules.
|