[MDEV-6752] Trailing incomplete characters are not replaced to question marks on conversion - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 5.5.39, 10.0.13
Fix Version/s: 10.0.14
Component/s: Character Sets
Labels:
None

Description

This script:

SET NAMES utf8;

SET @query=CONCAT(_binary"INSERT INTO t1 VALUES('", 0xC2, "\'),('",0xC223,"')");

SELECT @query;

DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8mb4);

PREPARE stmt FROM @query;

EXECUTE stmt;

SHOW WARNINGS;

SELECT HEX(a),a FROM t1;

returns the following output:

MariaDB [test]> SHOW WARNINGS;

+---------+------+---------------------------------------------------------+

| Level   | Code | Message                                                 |

+---------+------+---------------------------------------------------------+

| Warning | 1265 | Data truncated for column 'a' at row 1                  |

| Warning | 1366 | Incorrect string value: '\xC2#' for column 'a' at row 2 |

+---------+------+---------------------------------------------------------+

2 rows in set (0.00 sec)

MariaDB [test]> SELECT HEX(a),a FROM t1;

+--------+------+

| HEX(a) | a    |

+--------+------+

|        |      |

| 3F23   | ?#   |

+--------+------+

2 rows in set (0.00 sec)

Notice:

0xC2 is an incomplete UTF8 character (a valid mbhead not followed by an mbtail).
0xC223 is an invalid sequence (a valid mbhead followed by a 7-bit ASCII character instead of an mbtail)

Observations:

The second row correctly replaced mbhead to question mark and appended '#' to it.
The first row did not replace mbhead to '?', it just truncated.
The warnings are different. The warning for the second row is more descriptive

The same effect can be achieved using a Latin1 terminal window.
The idea is exactly the same. It just uses direct Latin1 input instead of creating a bad sequence using CONCAT and executing it with a prepared statement.

SET NAMES utf8;

DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET latin1);

INSERT INTO t1 VALUES ('Â'),('Â#');

SHOW WARNINGS;

SELECT HEX(a),a FROM t1;

The column can have any other character set other than utf8, to enable conversion.

The expected behaviour would be to replace trailing incomplete characters to question marks,
so the first row returns '?' instead of an empty string, with a more descriptive warning, similar
to the one returned for the second row.

Attachments

Activity

There are no comments yet on this issue.

People

Assignee:: Alexander Barkov

Reporter:: Alexander Barkov

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2014-09-18 09:32

Updated:: 2015-01-28 18:21

Resolved:: 2014-09-18 11:45

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server