|
Correction: it happens when the table has character set utf8, but does not happen, for example, with latin1:
MariaDB [test]> set names 'utf8mb4';
|
Query OK, 0 rows affected (0.00 sec)
|
|
MariaDB [test]> drop table if exists t1;
|
Query OK, 0 rows affected (0.51 sec)
|
|
MariaDB [test]> create table t1(c1 char(20)) character set utf8;
|
Query OK, 0 rows affected (0.01 sec)
|
|
MariaDB [test]> insert into t1 values('a');
|
Query OK, 1 row affected (0.00 sec)
|
|
MariaDB [test]> update t1 set c1 = 'b' where c1 like X'FF';
|
ERROR 1300 (HY000): Invalid utf8 character string: 'FF'
|
MariaDB [test]> set names 'utf8mb4';
|
Query OK, 0 rows affected (0.00 sec)
|
|
MariaDB [test]> drop table if exists t1;
|
Query OK, 0 rows affected (0.25 sec)
|
|
MariaDB [test]> create table t1(c1 char(20)) character set latin1;
|
Query OK, 0 rows affected (0.01 sec)
|
|
MariaDB [test]> insert into t1 values('a');
|
Query OK, 1 row affected (0.00 sec)
|
|
MariaDB [test]> update t1 set c1 = 'b' where c1 like X'FF';
|
Query OK, 0 rows affected (0.00 sec)
|
Rows matched: 0 Changed: 0 Warnings: 0
|
According to this bug report, it might be intentional. Assigning to bar to confirm (or object).
|
|
X'FF" is a not valid utf8 sequence. So this behavior is correct.
|
|
Which records do you expect to be returned by this LIKE condition?
|
|
X'FF' is not a valid utf8 character, but here in the where clause as a string constant, it is expected to be with collation binary.
Let's use the following example to verify this.
Recreate:
set names 'utf8mb4';
|
drop table if exists t1;
|
create table t1(c1 char(20));
|
insert into t1 values('a'),(X'FF');
|
select * from t1;
|
update t1 set c1 = 'b' where c1 like X'FF';
|
update t1 set c1 = 'b' where collation(X'FF') = 'binary';
|
select * from t1;
|
Output:
mysql> create table t1(c1 char(20));
|
Query OK, 0 rows affected (0.01 sec)
|
|
mysql> insert into t1 values('a'),(X'FF');
|
Query OK, 2 rows affected, 1 warning (0.00 sec)
|
Records: 2 Duplicates: 0 Warnings: 1
|
|
mysql> select * from t1;
|
+------+
|
| c1 |
|
+------+
|
| a |
|
| ? |
|
+------+
|
2 rows in set (0.00 sec)
|
|
mysql> update t1 set c1 = 'b' where c1 like X'FF';
|
ERROR 1300 (HY000): Invalid utf8mb4 character string: 'FF'
|
mysql> update t1 set c1 = 'b' where collation(X'FF') = 'binary';
|
Query OK, 2 rows affected (0.00 sec)
|
Rows matched: 2 Changed: 2 Warnings: 0
|
|
mysql> select * from t1;
|
+------+
|
| c1 |
|
+------+
|
| b |
|
| b |
|
+------+
|
2 rows in set (0.00 sec)
|
|
|
Can you please post the result of "SHOW CREATE TABLE t1"?
I'm curious what is the character set of the column "c1".
Thanks.
|
|
Also, please send:
SELECT c1, HEX(c1) FROM t1;
|
|
Hi Alex,
Here it is:
mysql> SELECT c1, HEX(c1) FROM t1;
|
+------+---------+
|
| c1 | HEX(c1) |
|
+------+---------+
|
| b | 62 |
|
| b | 62 |
|
+------+---------+
|
2 rows in set (0.00 sec)
|
|
mysql> show create table t1;
|
+-------+------------------------------------------------------------------------------------------+
|
| Table | Create Table |
|
+-------+------------------------------------------------------------------------------------------+
|
| t1 | CREATE TABLE `t1` (
|
`c1` char(20) DEFAULT NULL
|
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
|
+-------+------------------------------------------------------------------------------------------+
|
1 row in set (0.00 sec)
|
|
|
Thanks, Dylan.
Still can't reproduce the problem.
Can you please run this script and send its output:
SET NAMES utf8mb4;
|
DROP TABLE IF EXISTS t1;
|
CREATE TABLE t1 (c1 CHAR(20) CHARACTER SET utf8mb4);
|
INSERT INTO t1 VALUES ('a'),(X'FF');
|
SELECT c1, HEX(c1) FROM t1;
|
Thanks.
|
|
Okey, I think I reproduced the problem.
The above script returns this result in MariaDB-10.1.4 and higher:
+------+---------+
|
| c1 | HEX(c1) |
|
+------+---------+
|
| a | 61 |
|
| ? | 3F |
|
+------+---------+
|
Notice, the wrong byte sequence was replaced to question mark on INSERT, and a warning was produced.
(in the versions before 10.1.4 it was replaced to an empty string, with a warning)
So if you want to delete the bad string, you can actually do:
DELETE FROM t1 WHERE c1='?';
|
This behavior is intentional. I can't see bugs here.
Why did you need to insert X'FF' into an utf8mb4 column?
Perhaps you need a different data type.
What does X'FF' stand for? Is it byte with value 0xFF, or did you mean the character ÿ (U+00FF LATIN SMALL LETTER Y WITH DIAERESIS) ?
- In case if you need to store bytes with arbitrary values (including 0xFF), then consider using BINARY/VARBINARY/BLOB
- If you need to store U+00FF LATIN SMALL LETTER Y WITH DIAERESIS, then the correct syntax would be either of these:
INSERT INTO t1 VALUES ('ÿ');
|
INSERT INTO t1 VALUES (_latin1 X'FF');
|
INSERT INTO t1 VALUES (_ucs2 X'00FF');
|
INSERT INTO t1 VALUES (X'C3BF');
|
INSERT INTO t1 VALUES (_utf8m X'C3BF');
|
INSERT INTO t1 VALUES (_utf8mb4 X'C3BF');
|
|
|
Hi Alex,
Actually this X'FF' is meaningless. This is from random test using RQG(Random Query Generator).
It generates random input for LIKE operator, maybe valid or invalid.
This original statement got different result between MariaDB 10.1.10 and MySQL 5.7, which is the original reason that I raise this issue.
Thanks for all the information.
I can accept it if this behavior is by design and already known.
Dylan
|
|
Hi Dylan,
This change was made under terms of:
MDEV-6566 Different INSERT behaviour on bad bytes with and without character set conversion
Thanks.
|