Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6598

GROUP_CONCAT accepts malformed SEPARATOR string

    XMLWordPrintable

Details

    • Bug
    • Status: Confirmed (View Workflow)
    • Minor
    • Resolution: Unresolved
    • 5.3.12, 5.5.39, 10.0.13
    • 10.6
    • None
    • None

    Description

      SET NAMES utf8;
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8);
      INSERT INTO t1 VALUES ('a'),('b');
      SELECT
        GROUP_CONCAT(a SEPARATOR '��') AS a,
        GROUP_CONCAT(a SEPARATOR 0xFF) AS b,
        GROUP_CONCAT(a SEPARATOR X'FF') AS c,
        GROUP_CONCAT(a SEPARATOR B'11111111') AS d
      FROM t1;

      The output is:

      +--------+------+------+------+
      | a      | b    | c    | d    |
      +--------+------+------+------+
      | a��b     | a�b   | a�b   | a�b   |
      +--------+------+------+------+

      Note, '��' is a 4-byte UTF-8 character, which is not supported by MariaDB's utf8 (one should use utf8mb4 to make 4-byte characters work).

      So all separators in the above query are wrong MariaDB's utf8 sequences.

      They should be replaced to question marks, instead of being copied
      into the result as is.

      The same result is returned if I do "SET NAMES binary":

      SET NAMES binary;
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8);
      INSERT INTO t1 VALUES ('a'),('b');
      SELECT
        GROUP_CONCAT(a SEPARATOR '��') AS a,
        GROUP_CONCAT(a SEPARATOR 0xFF) AS b,
        GROUP_CONCAT(a SEPARATOR X'FF') AS c,
        GROUP_CONCAT(a SEPARATOR B'11111111') AS d
      FROM t1;

      The output is:

      +--------+------+------+------+
      | a      | b    | c    | d    |
      +--------+------+------+------+
      | a��b     | a�b   | a�b   | a�b   |
      +--------+------+------+------+

      The invalid separators are in the output again.

      Note, CHARSET() for all above GROUP_COINCATs:

      SET NAMES binary;
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8);
      INSERT INTO t1 VALUES ('a'),('b');
      SELECT
        CHARSET(GROUP_CONCAT(a SEPARATOR '��')) AS a,
        CHARSET(GROUP_CONCAT(a SEPARATOR 0xFF)) AS b,
        CHARSET(GROUP_CONCAT(a SEPARATOR X'FF')) AS c,
        CHARSET(GROUP_CONCAT(a SEPARATOR B'11111111')) AS d
      FROM t1;

      is reported as "utf8", not "binary":

      +------+------+------+------+
      | a    | b    | c    | d    |
      +------+------+------+------+
      | utf8 | utf8 | utf8 | utf8 |
      +------+------+------+------+

      It should be fixed:

      • either to return BINARY
        This is probably not a good idea, because the SEPATATOR syntax accepts only quoted string,
        hex and bit literals, but does not accept neither expressions nor literals with character set introducers.
        Therefore, the syntax implies that SEPARATOR should not affect the result character set.
      • or to replace bad separators to question marks.
      • or allow strings with introducers (or even all constant expressions) in SEPARATOR and make separator participate in character set and collation aggregation

      Attachments

        Activity

          People

            bar Alexander Barkov
            bar Alexander Barkov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.