Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10867

PREPARE..EXECUTE is not consistent about non-ASCII characters

Details

    Description

      I start a 8-bit console.
      In gnome-terminal I make sure that the console character set is properly set to iso-8859-1:
      Terminal -> Character Set Encoding -> Western (ISO-8859-1).

      Now I run this command in shell:

      $ LANG=en_US.iso88591 mysql test
      

      and check that the client correctly detected the session character set as latin1:

      SHOW VARIABLES LIKE 'character\_set\_%';
      

      +--------------------------+--------+
      | Variable_name            | Value  |
      +--------------------------+--------+
      | character_set_client     | latin1 | <--
      | character_set_connection | latin1 | <--
      | character_set_database   | latin1 |
      | character_set_filesystem | binary |
      | character_set_results    | latin1 | <--
      | character_set_server     | latin1 |
      | character_set_system     | utf8   |
      +--------------------------+--------+
      

      Now I run:

      SELECT HEX('ä');
      

      +----------+
      | HEX('ä') |
      +----------+
      | E4       |
      +----------+
      

      It correctly returns 0xE4, which is a latin1 code for "U+00E4 SMALL LETTER A WITH DIAERESIS".

      Now I use a prepared statement with a user variable as a source:

      SET @src='SELECT HEX(''ä'')';
      PREPARE stmt FROM @src; EXECUTE stmt;
      

      +----------+
      | HEX('ä') |
      +----------+
      | E4       |
      +----------+
      

      It also returns the same latin1 code. So far so good.

      Now I use a prepared statement with a string literal as a source:

      PREPARE stmt FROM 'SELECT HEX(''ä'')';
      EXECUTE stmt;
      

      +-----------+
      | HEX('ä') |
      +-----------+
      | C3A4      |
      +-----------+
      

      It returns a different result. C3A4 is a utf8 code for "U+00E4 SMALL LETTER A WITH DIAERESIS". Conversion from latin1 to utf8 happened.

      Attachments

        Activity

          bar Alexander Barkov created issue -
          bar Alexander Barkov made changes -
          Field Original Value New Value
          Description I start a 8-bit console.
          In gnome-terminal I make sure that the console character set is properly set to iso-8859-1:
          Terminal -> Character Set Encoding -> Western (ISO-8859-1).

          Now I run this command in shell:
          {noformat}
          $ LANG=en_US.iso88591 mysql test
          {noformat}
          and check that the cloent correctly detected the session character set as latin1:

          {code:sql}
          SHOW VARIABLES LIKE 'character\_set\_%';
          {code}
          {noformat}
          +--------------------------+--------+
          | Variable_name | Value |
          +--------------------------+--------+
          | character_set_client | latin1 | <--
          | character_set_connection | latin1 | <--
          | character_set_database | latin1 |
          | character_set_filesystem | binary |
          | character_set_results | latin1 | <--
          | character_set_server | latin1 |
          | character_set_system | utf8 |
          +--------------------------+--------+
          {noformat}

          Now I run:
          {code:sql}
          SELECT HEX('ä');
          {code}
          {noformat}
          +----------+
          | HEX('ä') |
          +----------+
          | E4 |
          +----------+
          {noformat}
          It correctly returns 0xE4, which is a latin1 code for "U+00E4 SMALL LETTER A WITH DIAERESIS".


          Now I use a prepared statement with a user variable as a source:
          {code:sql}
          SET @src='SELECT HEX(''ä'')';
          PREPARE stmt FROM @src; EXECUTE stmt;
          {code}
          {noformat}
          +----------+
          | HEX('ä') |
          +----------+
          | E4 |
          +----------+
          {noformat}
          It also returns the same latin1 code. So far so good.

          Now I use a prepared statement with a string literal as a source:
          {code:sql}
          PREPARE stmt FROM 'SELECT HEX(''ä'')';
          EXECUTE stmt;
          {code}
          {noformat}
          +-----------+
          | HEX('ä') |
          +-----------+
          | C3A4 |
          +-----------+
          {noformat}
          It returns a different result. C3A4 is a utf8 code for "U+00E4 SMALL LETTER A WITH DIAERESIS". Conversion from latin1 to utf8 happened.

          I start a 8-bit console.
          In gnome-terminal I make sure that the console character set is properly set to iso-8859-1:
          Terminal -> Character Set Encoding -> Western (ISO-8859-1).

          Now I run this command in shell:
          {noformat}
          $ LANG=en_US.iso88591 mysql test
          {noformat}
          and check that the client correctly detected the session character set as latin1:

          {code:sql}
          SHOW VARIABLES LIKE 'character\_set\_%';
          {code}
          {noformat}
          +--------------------------+--------+
          | Variable_name | Value |
          +--------------------------+--------+
          | character_set_client | latin1 | <--
          | character_set_connection | latin1 | <--
          | character_set_database | latin1 |
          | character_set_filesystem | binary |
          | character_set_results | latin1 | <--
          | character_set_server | latin1 |
          | character_set_system | utf8 |
          +--------------------------+--------+
          {noformat}

          Now I run:
          {code:sql}
          SELECT HEX('ä');
          {code}
          {noformat}
          +----------+
          | HEX('ä') |
          +----------+
          | E4 |
          +----------+
          {noformat}
          It correctly returns 0xE4, which is a latin1 code for "U+00E4 SMALL LETTER A WITH DIAERESIS".


          Now I use a prepared statement with a user variable as a source:
          {code:sql}
          SET @src='SELECT HEX(''ä'')';
          PREPARE stmt FROM @src; EXECUTE stmt;
          {code}
          {noformat}
          +----------+
          | HEX('ä') |
          +----------+
          | E4 |
          +----------+
          {noformat}
          It also returns the same latin1 code. So far so good.

          Now I use a prepared statement with a string literal as a source:
          {code:sql}
          PREPARE stmt FROM 'SELECT HEX(''ä'')';
          EXECUTE stmt;
          {code}
          {noformat}
          +-----------+
          | HEX('ä') |
          +-----------+
          | C3A4 |
          +-----------+
          {noformat}
          It returns a different result. C3A4 is a utf8 code for "U+00E4 SMALL LETTER A WITH DIAERESIS". Conversion from latin1 to utf8 happened.

          bar Alexander Barkov added a comment - - edited

          A similar problem is repeatable using a non-BMP character.
          Due to a bug in JIRA, the real BMP character was replaced to 'X' in the below script.
          To get the real character displayed on a utf8 Linux console, run the following query:

          SELECT _utf8mb4 0xF09F988E;
          

          Then copy and paste the character instead of all 'X' below.

          1. Convensional execution works fine:

          SET NAMES utf8mb4;
          SELECT 'X' AS c;
          

          +------+
          | c    |
          +------+
          | X     |
          +------+
          

          2. Prepared execution using a user variable as a source also works fine:

          SET @src='SELECT ''X'' AS c';
          PREPARE stmt FROM @src;
          EXECUTE stmt;
          

          +------+
          | c    |
          +------+
          | X     |
          +------+
          

          3. Prepared execution using a string literal as a source return a wrong result:

          PREPARE stmt FROM 'SELECT ''X'' AS c';
          EXECUTE stmt;
          

          +---+
          | c |
          +---+
          | ? |
          +---+
          

          bar Alexander Barkov added a comment - - edited A similar problem is repeatable using a non-BMP character. Due to a bug in JIRA, the real BMP character was replaced to 'X' in the below script. To get the real character displayed on a utf8 Linux console, run the following query: SELECT _utf8mb4 0xF09F988E; Then copy and paste the character instead of all 'X' below. 1. Convensional execution works fine: SET NAMES utf8mb4; SELECT 'X' AS c; +------+ | c | +------+ | X | +------+ 2. Prepared execution using a user variable as a source also works fine: SET @src= 'SELECT ' 'X' ' AS c' ; PREPARE stmt FROM @src; EXECUTE stmt; +------+ | c | +------+ | X | +------+ 3. Prepared execution using a string literal as a source return a wrong result: PREPARE stmt FROM 'SELECT ' 'X' ' AS c' ; EXECUTE stmt; +---+ | c | +---+ | ? | +---+
          bar Alexander Barkov made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          bar Alexander Barkov made changes -
          Fix Version/s 10.3.0 [ 22127 ]
          Fix Version/s 10.2 [ 14601 ]
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Closed [ 6 ]
          bar Alexander Barkov made changes -
          Fix Version/s 10.2.3 [ 22115 ]
          bar Alexander Barkov made changes -
          Component/s Prepared Statements [ 10804 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 77177 ] MariaDB v4 [ 150950 ]

          People

            bar Alexander Barkov
            bar Alexander Barkov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.