Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15624

Changing the default character set to utf8mb4 changes query evaluation in a very surprising way

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.2.13, 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL)
    • 5.5.60, 10.0.35, 10.1.33, 10.2.15, 10.3.6
    • Character Sets
    • Darwin crest.fritz.box 17.4.0 Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 x86_64

    Description

      Changing the default character set to utf8mb4 changes query evaluation in a very surprising way. (Please feel free to set a better title).

      This is actually a followup bug report to this: https://bitbucket.org/zzzeek/sqlalchemy/issues/4222/query-yields-different-result-via-sqla (see there for more detail)

      Here's the problem: This query: ```select seq, replace(uuid(), "-", "") from seq_0_to_9;``` has a completely different result, wether you connect to mysql with a character set of utf8 or utf8mb4.

      Here's an example:

      % mysql --default-character-set=utf8mb4 yeepa_test --execute 'select seq, replace(uuid(), "-", "") from seq_0_to_9;'
      +-----+----------------------------------+
      | seq | replace(uuid(), "-", "")         |
      +-----+----------------------------------+
      |   0 | 49734e8c2d5111e881616c4008b0d40e |
      |   1 | 49734e8c2d5111e881616c4008b0d40e |
      |   2 | 49734e8c2d5111e881616c4008b0d40e |
      |   3 | 49734e8c2d5111e881616c4008b0d40e |
      |   4 | 49734e8c2d5111e881616c4008b0d40e |
      |   5 | 49734e8c2d5111e881616c4008b0d40e |
      |   6 | 49734e8c2d5111e881616c4008b0d40e |
      |   7 | 49734e8c2d5111e881616c4008b0d40e |
      |   8 | 49734e8c2d5111e881616c4008b0d40e |
      |   9 | 49734e8c2d5111e881616c4008b0d40e |
      +-----+----------------------------------+
      

      This returns 10 UUIDs that are all the same. *I believe this to be wrong*

      Calling that same query with utf8 as the format yields 10 different uuids - which I believe to be correct:

      % mysql --default-character-set=utf8 yeepa_test --execute 'select seq, replace(uuid(), "-", "") from seq_0_to_9;' 
      +-----+----------------------------------+
      | seq | replace(uuid(), "-", "")         |
      +-----+----------------------------------+
      |   0 | 797f1cf02d5111e881616c4008b0d40e |
      |   1 | 797f1d042d5111e881616c4008b0d40e |
      |   2 | 797f1d0e2d5111e881616c4008b0d40e |
      |   3 | 797f1d0f2d5111e881616c4008b0d40e |
      |   4 | 797f1d182d5111e881616c4008b0d40e |
      |   5 | 797f1d222d5111e881616c4008b0d40e |
      |   6 | 797f1d232d5111e881616c4008b0d40e |
      |   7 | 797f1d2c2d5111e881616c4008b0d40e |
      |   8 | 797f1d2d2d5111e881616c4008b0d40e |
      |   9 | 797f1d362d5111e881616c4008b0d40e |
      +-----+----------------------------------+
      

      This also happens on on at least on current Fedora. (I can look up the details tomorrow)

      On a further note, this result is

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov added a comment - - edited

            The problem is repeatable with DISTINCT. All SELECT queries in the below scripts return one row, while they are expected to return three rows:

            SET NAMES utf8mb4;
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (id INT);
            INSERT INTO t1 VALUES (1),(2),(3);
            SELECT DISTINCT INSERT(uuid(), 9, 1, "X") FROM t1;
            

            SET NAMES utf8;
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (id INT);
            INSERT INTO t1 VALUES (1),(2),(3);
            SELECT DISTINCT REPLACE(uuid_short(), '0', CAST('o' AS CHAR CHARACTER SET ucs2)) AS c FROM t1;
            SELECT DISTINCT INSERT(uuid_short(), 1, 1, CAST('0' AS CHAR CHARACTER SET ucs2)) AS c FROM t1;
            SELECT DISTINCT CONCAT(uuid_short(), CAST('0' AS CHAR CHARACTER SET ucs2)) AS c FROM t1;
            

            bar Alexander Barkov added a comment - - edited The problem is repeatable with DISTINCT . All SELECT queries in the below scripts return one row, while they are expected to return three rows: SET NAMES utf8mb4; DROP TABLE IF EXISTS t1; CREATE TABLE t1 (id INT ); INSERT INTO t1 VALUES (1),(2),(3); SELECT DISTINCT INSERT (uuid(), 9, 1, "X" ) FROM t1; SET NAMES utf8; DROP TABLE IF EXISTS t1; CREATE TABLE t1 (id INT ); INSERT INTO t1 VALUES (1),(2),(3); SELECT DISTINCT REPLACE (uuid_short(), '0' , CAST ( 'o' AS CHAR CHARACTER SET ucs2)) AS c FROM t1; SELECT DISTINCT INSERT (uuid_short(), 1, 1, CAST ( '0' AS CHAR CHARACTER SET ucs2)) AS c FROM t1; SELECT DISTINCT CONCAT(uuid_short(), CAST ( '0' AS CHAR CHARACTER SET ucs2)) AS c FROM t1;

            Out of interest, could you link the commit / the commits you fixed this in? I can't seem to find them at https://github.com/MariaDB/server

            dwt Martin Häcker added a comment - Out of interest, could you link the commit / the commits you fixed this in? I can't seem to find them at https://github.com/MariaDB/server

            dwt, you should be able to see the commit link in the JIRA issue itself, right panel, "Development" section.

            https://github.com/mariadb/server/commit/6beb08c7b67ed7610e95c0350f9f93005db1e055

            elenst Elena Stepanova added a comment - dwt , you should be able to see the commit link in the JIRA issue itself, right panel, "Development" section. https://github.com/mariadb/server/commit/6beb08c7b67ed7610e95c0350f9f93005db1e055

            Got it, Thanks!

            dwt Martin Häcker added a comment - Got it, Thanks!

            You didn't see it because https://github.com/MariaDB/server displays the current branch, with is 10.3.

            I pushed the change to 5.5, so it's visible here: https://github.com/MariaDB/server/tree/5.5

            Note, the patch has not been propagated to 10.3 yet. It will be, when we merge from 5.5 up to 10.3 next time.

            bar Alexander Barkov added a comment - You didn't see it because https://github.com/MariaDB/server displays the current branch, with is 10.3. I pushed the change to 5.5, so it's visible here: https://github.com/MariaDB/server/tree/5.5 Note, the patch has not been propagated to 10.3 yet. It will be, when we merge from 5.5 up to 10.3 next time.

            People

              bar Alexander Barkov
              dwt Martin Häcker
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.