[MDEV-10867] PREPARE..EXECUTE is not consistent about non-ASCII characters Created: 2016-09-22  Updated: 2017-04-09  Resolved: 2016-10-04

Status: Closed
Project: MariaDB Server
Component/s: Character Sets, Prepared Statements
Affects Version/s: 5.5, 10.0, 10.1, 10.2
Fix Version/s: 10.2.3, 10.3.0

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None


 Description   

I start a 8-bit console.
In gnome-terminal I make sure that the console character set is properly set to iso-8859-1:
Terminal -> Character Set Encoding -> Western (ISO-8859-1).

Now I run this command in shell:

$ LANG=en_US.iso88591 mysql test

and check that the client correctly detected the session character set as latin1:

SHOW VARIABLES LIKE 'character\_set\_%';

+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| character_set_client     | latin1 | <--
| character_set_connection | latin1 | <--
| character_set_database   | latin1 |
| character_set_filesystem | binary |
| character_set_results    | latin1 | <--
| character_set_server     | latin1 |
| character_set_system     | utf8   |
+--------------------------+--------+

Now I run:

SELECT HEX('ä');

+----------+
| HEX('ä') |
+----------+
| E4       |
+----------+

It correctly returns 0xE4, which is a latin1 code for "U+00E4 SMALL LETTER A WITH DIAERESIS".

Now I use a prepared statement with a user variable as a source:

SET @src='SELECT HEX(''ä'')';
PREPARE stmt FROM @src; EXECUTE stmt;

+----------+
| HEX('ä') |
+----------+
| E4       |
+----------+

It also returns the same latin1 code. So far so good.

Now I use a prepared statement with a string literal as a source:

PREPARE stmt FROM 'SELECT HEX(''ä'')';
EXECUTE stmt;

+-----------+
| HEX('ä') |
+-----------+
| C3A4      |
+-----------+

It returns a different result. C3A4 is a utf8 code for "U+00E4 SMALL LETTER A WITH DIAERESIS". Conversion from latin1 to utf8 happened.



 Comments   
Comment by Alexander Barkov [ 2016-09-23 ]

A similar problem is repeatable using a non-BMP character.
Due to a bug in JIRA, the real BMP character was replaced to 'X' in the below script.
To get the real character displayed on a utf8 Linux console, run the following query:

SELECT _utf8mb4 0xF09F988E;

Then copy and paste the character instead of all 'X' below.

1. Convensional execution works fine:

SET NAMES utf8mb4;
SELECT 'X' AS c;

+------+
| c    |
+------+
| X     |
+------+

2. Prepared execution using a user variable as a source also works fine:

SET @src='SELECT ''X'' AS c';
PREPARE stmt FROM @src;
EXECUTE stmt;

+------+
| c    |
+------+
| X     |
+------+

3. Prepared execution using a string literal as a source return a wrong result:

PREPARE stmt FROM 'SELECT ''X'' AS c';
EXECUTE stmt;

+---+
| c |
+---+
| ? |
+---+

Generated at Thu Feb 08 07:45:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.