[MDEV-8922] Bug#20238729 MySQL sometimes produces no warning when it's unable to interpret a character in a given character set Created: 2015-10-08  Updated: 2022-09-08

Status: Open
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 5.5, 10.0, 10.1
Fix Version/s: 5.5, 10.1

Type: Bug Priority: Minor
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File charset.diff    
Issue Links:
Relates
relates to MDEV-6643 Improve performance of string process... Stalled
Sprint: 5.5.58

 Description   

Merge this patch from MySQL
https://github.com/mysql/mysql-server/commit/33a2e5abd



 Comments   
Comment by Alexander Barkov [ 2017-10-13 ]

This patch adds warnings in some cases where they are questionable (but probably tolerable):

Inserting a string literal with bad multi-byte sequences into a BLOB:

set names utf8; -- notice, not utf8mb4
drop table if exists t1;
create table t1 (a blob);
insert into t1 values ('a?b'); -- a string with a non-BMP character in the middle (utf8mb4)
show warnings;

+---------+------+-----------------------------------------+
| Level   | Code | Message                                 |
+---------+------+-----------------------------------------+
| Warning | 1300 | Invalid utf8 character string: 'F09F98' |
+---------+------+-----------------------------------------+

The string can be a result of mysql_escape_string(), and it goes to a BLOB. It's not used as a text string.

In the above example, a non-BMP character was replaced to the question mark, because JIRA does not support non-BMP characters.

Inserting a 8-bit string literal with unassigned characters into a BLOB:

set names cp1251;
drop table if exists t1;
create table t1 (a blob);
set @stmt=CONCAT('INSERT INTO t1 VALUES ("', 0x98,'")'); -- 0x98 is not assigned in cp1251
prepare stmt from @stmt;
show warnings;

+---------+------+---------------------------------------+
| Level   | Code | Message                               |
+---------+------+---------------------------------------+
| Warning | 1300 | Invalid cp1251 character string: '98' |
+---------+------+---------------------------------------+

As in the example above, the string can be result of mysql_escap_string(), and it goes to a BLOB. It's not used as a text string.

Using a 8-bit HEX character string literal with unassigned characters

set names cp1251;
set @stmt=CONCAT('SELECT HEX(_cp1251"', 0x98,'") AS x');
prepare stmt from @stmt;
show warnings;

+---------+------+---------------------------------------+
| Level   | Code | Message                               |
+---------+------+---------------------------------------+
| Warning | 1300 | Invalid cp1251 character string: '98' |
+---------+------+---------------------------------------+

In this example, the literal IS used as a text (notice the character set introducer). But it should be OK to use unassigned characters in a single character set environment (when the client, the server, and all tables and columns use the same character set).

Generated at Thu Feb 08 07:30:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.