[MDEV-6598] GROUP_CONCAT accepts malformed SEPARATOR string Created: 2014-08-18  Updated: 2023-11-28

Status: Confirmed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.3.12, 5.5.39, 10.0.13
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Minor
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Unresolved Votes: 0
Labels: None


 Description   

SET NAMES utf8;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8);
INSERT INTO t1 VALUES ('a'),('b');
SELECT
  GROUP_CONCAT(a SEPARATOR '��') AS a,
  GROUP_CONCAT(a SEPARATOR 0xFF) AS b,
  GROUP_CONCAT(a SEPARATOR X'FF') AS c,
  GROUP_CONCAT(a SEPARATOR B'11111111') AS d
FROM t1;

The output is:

+--------+------+------+------+
| a      | b    | c    | d    |
+--------+------+------+------+
| a��b     | a�b   | a�b   | a�b   |
+--------+------+------+------+

Note, '��' is a 4-byte UTF-8 character, which is not supported by MariaDB's utf8 (one should use utf8mb4 to make 4-byte characters work).

So all separators in the above query are wrong MariaDB's utf8 sequences.

They should be replaced to question marks, instead of being copied
into the result as is.

The same result is returned if I do "SET NAMES binary":

SET NAMES binary;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8);
INSERT INTO t1 VALUES ('a'),('b');
SELECT
  GROUP_CONCAT(a SEPARATOR '��') AS a,
  GROUP_CONCAT(a SEPARATOR 0xFF) AS b,
  GROUP_CONCAT(a SEPARATOR X'FF') AS c,
  GROUP_CONCAT(a SEPARATOR B'11111111') AS d
FROM t1;

The output is:

+--------+------+------+------+
| a      | b    | c    | d    |
+--------+------+------+------+
| a��b     | a�b   | a�b   | a�b   |
+--------+------+------+------+

The invalid separators are in the output again.

Note, CHARSET() for all above GROUP_COINCATs:

SET NAMES binary;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8);
INSERT INTO t1 VALUES ('a'),('b');
SELECT
  CHARSET(GROUP_CONCAT(a SEPARATOR '��')) AS a,
  CHARSET(GROUP_CONCAT(a SEPARATOR 0xFF)) AS b,
  CHARSET(GROUP_CONCAT(a SEPARATOR X'FF')) AS c,
  CHARSET(GROUP_CONCAT(a SEPARATOR B'11111111')) AS d
FROM t1;

is reported as "utf8", not "binary":

+------+------+------+------+
| a    | b    | c    | d    |
+------+------+------+------+
| utf8 | utf8 | utf8 | utf8 |
+------+------+------+------+

It should be fixed:

  • either to return BINARY
    This is probably not a good idea, because the SEPATATOR syntax accepts only quoted string,
    hex and bit literals, but does not accept neither expressions nor literals with character set introducers.
    Therefore, the syntax implies that SEPARATOR should not affect the result character set.
  • or to replace bad separators to question marks.
  • or allow strings with introducers (or even all constant expressions) in SEPARATOR and make separator participate in character set and collation aggregation

Generated at Thu Feb 08 07:13:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.