Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34872

Inconsistency of json fields character sets

Details

    Description

      In CREATE TABLE JSON fields default to utf8mb4, even if not specified, resulting in its overriding the charset of the table. However, the overriding does not happen in ALTER TABLE when the table charset is changed, resulting in the inconsistency shown below:

      CREATE TABLE t ( cj JSON ) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci;
      select character_set_name from information_schema.columns where table_name = 't' and column_name='cj';
      character_set_name
      utf8mb4
      alter table t CONVERT TO CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci;
      select character_set_name from information_schema.columns where table_name = 't' and column_name='cj';
      character_set_name
      utf8mb3
      

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment -

            > JSON columns behave exactly as TEXT columns, except that they get utf8mb4 charset by default.

            Sure, but the "except" part is the main point of this issue.

            ycp Yuchen Pei added a comment - > JSON columns behave exactly as TEXT columns, except that they get utf8mb4 charset by default. Sure, but the "except" part is the main point of this issue.

            I'd say that for all practical purposes it behaves as if COLLATE utf8mb4_bin was explicitly specified.
            It behaves in CREATE TABLE as a column with explicitly specified COLLATE clause. And it behaves in following ALTER TABLE also as a column with explicitly specified COLLATE clause, as far as I can see. There is no inconsistency.

            serg Sergei Golubchik added a comment - I'd say that for all practical purposes it behaves as if COLLATE utf8mb4_bin was explicitly specified. It behaves in CREATE TABLE as a column with explicitly specified COLLATE clause. And it behaves in following ALTER TABLE also as a column with explicitly specified COLLATE clause, as far as I can see. There is no inconsistency.

            I agree with @serg. Everything seems to be working as expected.

            bar Alexander Barkov added a comment - I agree with @serg. Everything seems to be working as expected.
            ycp Yuchen Pei added a comment -

            > I'd say that for all practical purposes it behaves as if COLLATE utf8mb4_bin was explicitly specified.

            But it is not explicitly specified, and the kb documentation on CREATE TABLE linked in a previous comment says that "[DEFAULT] CHARACTER SET (or [DEFAULT] CHARSET) is used to set a default character set for the table. This is the character set used for all columns where an explicit character set is not specified"

            The following example shows that non-JSON columns follow the documentation, but JSON columns don't.

            CREATE TABLE t1 ( a text, b varchar(30), c enum(''), d json ) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci;
            select column_name, character_set_name from information_schema.columns where table_name = 't1';
            column_name	character_set_name
            a	utf8mb3
            b	utf8mb3
            c	utf8mb3
            d	utf8mb4
            CREATE TABLE t2 ( a text, b varchar(30), c enum(''), d json ) character set latin1 collate latin1_general_ci;
            select column_name, character_set_name from information_schema.columns where table_name = 't2';
            column_name	character_set_name
            a	latin1
            b	latin1
            c	latin1
            d	utf8mb4

            The question remains unanswered (at least in this MDEV, but let me know if it has been answered elsewhere): what is the justification for json having a default character set that overrides the table character set in table creation?

            If this difference is expected, then the kb needs to be updated to reflect it.

            ycp Yuchen Pei added a comment - > I'd say that for all practical purposes it behaves as if COLLATE utf8mb4_bin was explicitly specified. But it is not explicitly specified, and the kb documentation on CREATE TABLE linked in a previous comment says that " [DEFAULT] CHARACTER SET (or [DEFAULT] CHARSET) is used to set a default character set for the table. This is the character set used for all columns where an explicit character set is not specified" The following example shows that non-JSON columns follow the documentation, but JSON columns don't. CREATE TABLE t1 ( a text, b varchar(30), c enum(''), d json ) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci; select column_name, character_set_name from information_schema.columns where table_name = 't1'; column_name character_set_name a utf8mb3 b utf8mb3 c utf8mb3 d utf8mb4 CREATE TABLE t2 ( a text, b varchar(30), c enum(''), d json ) character set latin1 collate latin1_general_ci; select column_name, character_set_name from information_schema.columns where table_name = 't2'; column_name character_set_name a latin1 b latin1 c latin1 d utf8mb4 The question remains unanswered (at least in this MDEV, but let me know if it has been answered elsewhere): what is the justification for json having a default character set that overrides the table character set in table creation? If this difference is expected, then the kb needs to be updated to reflect it.
            serg Sergei Golubchik added a comment - updated https://mariadb.com/kb/en/json-data-type/ and https://mariadb.com/kb/en/create-table/

            People

              serg Sergei Golubchik
              ycp Yuchen Pei
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.