Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32192

@@character_set_collations to resolve IMPLICIT collation on JOINs

Details

    Description

      If character_set_collations is for resolving implicit collation types then in can be used in a JOIN too.

      join.test

      create table t520 (t varchar(30) character set  utf8mb4 collate utf8mb4_unicode_520_ci);
      create table t1400 (t varchar(30) character set  utf8mb4 collate utf8mb4_uca1400_ai_ci);
      insert into t520 values ('bob'),('jack'), ('jane');
      insert into t1400 values ('bob'),('jack'), ('jane');
      --error ER_CANT_AGGREGATE_2COLLATIONS
      select * from t1400 join t520 on t1400.t = t520.t;
      ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_uca1400_ai_ci,IMPLICIT) and (utf8mb4_unicode_520_ci,IMPLICIT) for operation '='
       
      SET @@character_set_collations = 'utf8mb4=uca1400_ai_ci';
      select * from t1400 join t520 on t1400.t = t520.t;
       
      ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_uca1400_ai_ci,IMPLICIT) and (utf8mb4_unicode_520_ci,IMPLICIT) for operation '='
      

      Last case shouldn't be an error as the resolution mechanism is possible.

      recent documentation addition by greenman indicates its use "in all cases".
      https://mariadb.com/kb/en/setting-character-sets-and-collations/

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov added a comment - - edited

            You mix two different phenomena.
            1. Implicit collation assignment during CREATE TABLE. Here the word "implicit" is used in its usual English meaning: take the collaton automatically from some defaults.
            2. IMPLICIT collation derivation of an expression. This derivation takes effect when mixing two or more expressions for comparison or for result in CASE-alike expressions and in UNION. Here the word "IMPLICIT" means a special term from the SQL Standard.

            Implicit collation assignment during CREATE TABLE

            A column collation gets assigned at CREATE TABLE time.
            1. If there is a COLLATE clause in the column definition, the the collation is taken from that COLLATE clause
            2. If there is CHARACTER SET without COLLATE, the collation is implicitly chosen as the default collation of the specified character set. In this case @@character_set_collations is used.
            3. If there are no CHARACTER SET / COLLATE clauses, the collation is implicitly taken from the higher level defaults.

            After CREATE TABLE happened, it does not matter how the column was created: with or without COLLATE.

            IMPLICIT collation derivation when mixing expressions

            Every expression of a string data type has a collation precedence level, which determines which side wins when mixing expressions of different collations:

            • for comparison
            • for result, such as CASE-alike expressions, UNION

            The SQL standard calls this precedence level as collation derivation.

            MariaDB implements the following collation derivation levels (some of them are standard, some of them are MariaDB extensions):

            • IGNORABLE - NULL literal (the weakest)
            • NUMERIC - when a number or a datetime acts in string context
            • COERCIBLE - a string literal
            • SYSCONST - functions DATABASE(), USER()
            • IMPLICIT - table column, CAST( AS CHAR)
            • NONE - a concatenation of two expressions of different collations
            • EXPLICIT - an explicit COLLATE clause (the strongest)

            You can see these collation derivation levels in "Illegal mix" error messages.

            A table column has always IMPLICIT collation derivation.
            No matter how it was created in CREATE TABLE (without or with COLLATE).

            Why this MDEV was closed as Won't do

            The new variable @@character_set_collations is intended to change the default collation when a column specified in CREATE TABLE has the CHARACTER SET clause but does not have the COLLATE clause:

            CREATE TABLE t1
            (
              -- This had been latin1_swedish_ci for years - the hard coded default
              -- Now it honors @@character_set_collations - a soft default
              c1 VARCHAR(32) CHARACTER SET latin1
            );
            

            The new variable is not intended to resolve illegal mix of collations.

            In your example you have two columns of different collations. The right way to resolve this conflict is to use an explicit COLLATE clause.

            bar Alexander Barkov added a comment - - edited You mix two different phenomena. 1. Implicit collation assignment during CREATE TABLE. Here the word "implicit" is used in its usual English meaning: take the collaton automatically from some defaults. 2. IMPLICIT collation derivation of an expression. This derivation takes effect when mixing two or more expressions for comparison or for result in CASE-alike expressions and in UNION. Here the word "IMPLICIT" means a special term from the SQL Standard. Implicit collation assignment during CREATE TABLE A column collation gets assigned at CREATE TABLE time. 1. If there is a COLLATE clause in the column definition, the the collation is taken from that COLLATE clause 2. If there is CHARACTER SET without COLLATE, the collation is implicitly chosen as the default collation of the specified character set. In this case @@character_set_collations is used. 3. If there are no CHARACTER SET / COLLATE clauses, the collation is implicitly taken from the higher level defaults. After CREATE TABLE happened, it does not matter how the column was created: with or without COLLATE. IMPLICIT collation derivation when mixing expressions Every expression of a string data type has a collation precedence level, which determines which side wins when mixing expressions of different collations: for comparison for result, such as CASE-alike expressions, UNION The SQL standard calls this precedence level as collation derivation . MariaDB implements the following collation derivation levels (some of them are standard, some of them are MariaDB extensions): IGNORABLE - NULL literal (the weakest) NUMERIC - when a number or a datetime acts in string context COERCIBLE - a string literal SYSCONST - functions DATABASE(), USER() IMPLICIT - table column, CAST( AS CHAR) NONE - a concatenation of two expressions of different collations EXPLICIT - an explicit COLLATE clause (the strongest) You can see these collation derivation levels in "Illegal mix" error messages. A table column has always IMPLICIT collation derivation. No matter how it was created in CREATE TABLE (without or with COLLATE). Why this MDEV was closed as Won't do The new variable @@character_set_collations is intended to change the default collation when a column specified in CREATE TABLE has the CHARACTER SET clause but does not have the COLLATE clause: CREATE TABLE t1 ( -- This had been latin1_swedish_ci for years - the hard coded default -- Now it honors @@character_set_collations - a soft default c1 VARCHAR (32) CHARACTER SET latin1 ); The new variable is not intended to resolve illegal mix of collations. In your example you have two columns of different collations. The right way to resolve this conflict is to use an explicit COLLATE clause.

            People

              bar Alexander Barkov
              danblack Daniel Black
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.