Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34725

JSON_ARRAYAGG corrupts Unicode value

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 10.9.8
    • N/A
    • Character Sets, JSON
    • None
    • OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024

    Description

      In this example, JSON_ARRAYAGG changes Unicode character 0xC2A0 (non-breaking space) to 0xA0:

      MariaDB [t]> CREATE TABLE `t1` (`a` text);
      Query OK, 0 rows affected (0.001 sec)
       
      MariaDB [t]> INSERT INTO `t1` (`a`) VALUES (UNHEX('58C2A059')); -- X Y
      Query OK, 1 row affected (0.000 sec)
       
      MariaDB [t]> SELECT HEX(`a`) FROM `t1`;
      +----------+
      | HEX(`a`) |
      +----------+
      | 58C2A059 |
      +----------+
      1 row in set (0.000 sec)
       
      MariaDB [t]> SELECT `a` FROM `t1`;
      +------+
      | a    |
      +------+
      | X Y  |
      +------+
      1 row in set (0.000 sec)
       
      MariaDB [t]> SELECT HEX(JSON_ARRAY(`a`)) FROM `t1`;
      +----------------------+
      | HEX(JSON_ARRAY(`a`)) |
      +----------------------+
      | 5B2258C2A059225D     |
      +----------------------+
      1 row in set (0.000 sec)
       
      MariaDB [t]> SELECT JSON_ARRAY(`a`) FROM `t1`;
      +-----------------+
      | JSON_ARRAY(`a`) |
      +-----------------+
      | ["X Y"]         |
      +-----------------+
      1 row in set (0.000 sec)
       
      MariaDB [t]> SELECT HEX(JSON_ARRAYAGG(`a`)) FROM `t1`;
      +-------------------------+
      | HEX(JSON_ARRAYAGG(`a`)) |
      +-------------------------+
      | 5B2258A059225D          |
      +-------------------------+
      1 row in set (0.000 sec)
       
      MariaDB [t]> SELECT JSON_ARRAYAGG(`a`) FROM `t1`;
      +--------------------+
      | JSON_ARRAYAGG(`a`) |
      +--------------------+
      | ["X?Y"]             |
      +--------------------+
      1 row in set (0.000 sec)
      

      Attachments

        Activity

          danblack Daniel Black added a comment -

          I'm not seeing this on 10.5, 10.6, 10.11 (below), 11.4 https://sqlize.online/sql/mariadb114/10c1f8e4ab2be588741941f1a3393ac8/ or very latest ( 11.8.0 beta)

          MariaDB [test]> SELECT HEX(JSON_ARRAYAGG(`a`)) FROM `t1`;
          +-------------------------+
          | HEX(JSON_ARRAYAGG(`a`)) |
          +-------------------------+
          | 5B2258C2A059225D        |
          +-------------------------+
          1 row in set (0.003 sec)
           
          MariaDB [test]> SELECT JSON_ARRAYAGG(`a`) FROM `t1`;
          +--------------------+
          | JSON_ARRAYAGG(`a`) |
          +--------------------+
          | ["X Y"]            |
          +--------------------+
          1 row in set (0.002 sec)
           
          MariaDB [test]> \s
          --------------
          client/mariadb  Ver 15.1 Distrib 10.11.11-MariaDB, for Linux (x86_64) using  EditLine wrapper
           
          Connection id:		3
          Current database:	test
          Current user:		dan@localhost
          SSL:			Not in use
          Current pager:		stdout
          Using outfile:		''
          Using delimiter:	;
          Server:			MariaDB
          Server version:		10.11.11-MariaDB Source distribution
          Protocol version:	10
          Connection:		Localhost via UNIX socket
          Server characterset:	latin1
          Db     characterset:	latin1
          Client characterset:	utf8mb3
          Conn.  characterset:	utf8mb3
          

          So I'm assuming its been fixed in our maintained versions. If there's a charset/aggregation setting having an effect please reopen this.

          danblack Daniel Black added a comment - I'm not seeing this on 10.5, 10.6, 10.11 (below), 11.4 https://sqlize.online/sql/mariadb114/10c1f8e4ab2be588741941f1a3393ac8/ or very latest ( 11.8.0 beta) MariaDB [test]> SELECT HEX(JSON_ARRAYAGG(`a`)) FROM `t1`; +-------------------------+ | HEX(JSON_ARRAYAGG(`a`)) | +-------------------------+ | 5B2258C2A059225D | +-------------------------+ 1 row in set (0.003 sec)   MariaDB [test]> SELECT JSON_ARRAYAGG(`a`) FROM `t1`; +--------------------+ | JSON_ARRAYAGG(`a`) | +--------------------+ | ["X Y"] | +--------------------+ 1 row in set (0.002 sec)   MariaDB [test]> \s -------------- client/mariadb Ver 15.1 Distrib 10.11.11-MariaDB, for Linux (x86_64) using EditLine wrapper   Connection id: 3 Current database: test Current user: dan@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server: MariaDB Server version: 10.11.11-MariaDB Source distribution Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: latin1 Db characterset: latin1 Client characterset: utf8mb3 Conn. characterset: utf8mb3 So I'm assuming its been fixed in our maintained versions. If there's a charset/aggregation setting having an effect please reopen this.

          People

            Unassigned Unassigned
            libertyit Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.