Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35041

Simple comparison causes "Illegal mix of collations" even with default server settings

Details

    Description

      A default server installation on UTF-8 system has these character-set-related settings (taken from 11.5.2 Fedora 38 RPM):

      > show variables like 'character%';
      +--------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
      | Variable_name            | Value                                                                                                                                   |
      +--------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
      | character_set_client     | utf8mb3                                                                                                                                 |
      | character_set_collations | utf8mb3=utf8mb3_uca1400_ai_ci,ucs2=ucs2_uca1400_ai_ci,utf8mb4=utf8mb4_uca1400_ai_ci,utf16=utf16_uca1400_ai_ci,utf32=utf32_uca1400_ai_ci |
      | character_set_connection | utf8mb3                                                                                                                                 |
      | character_set_database   | latin1                                                                                                                                  |
      | character_set_filesystem | binary                                                                                                                                  |
      | character_set_results    | utf8mb3                                                                                                                                 |
      | character_set_server     | latin1                                                                                                                                  |
      | character_set_system     | utf8mb3                                                                                                                                 |
      | character_sets_dir       | /usr/share/mariadb/charsets/                                                                                                            |
      +--------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
      

      With this, even fairly basic operations appear to be impossible:

      MariaDB [(none)]> select COLUMN_TYPE INTO @col_type from INFORMATION_SCHEMA.COLUMNS where TABLE_SCHEMA = 'mysql' limit 1;
      Query OK, 1 row affected (0.003 sec)
       
      MariaDB [(none)]> select @col_type != 'binary(128)';
      ERROR 1267 (HY000): Illegal mix of collations (utf8mb3_general_ci,COERCIBLE) and (utf8mb3_uca1400_ai_ci,COERCIBLE) for operation '<>'
      

      The example above was taken from Spider initialization, which now produces the error upon startup.

      In MTR, we don't see it, because it changes clients' default-character-set to latin1.
      But by restoring utf8mb3 value, we get the same:

      --exec $MYSQL --default-character-set=utf8mb3 -e "select COLUMN_TYPE INTO @col_type from INFORMATION_SCHEMA.COLUMNS where TABLE_SCHEMA = 'mysql' limit 1; select @col_type != 'binary(128)'"
      

      --------------
      select @col_type != 'binary(128)'
      --------------
       
      ERROR 1267 (HY000) at line 1: Illegal mix of collations (utf8mb3_general_ci,COERCIBLE) and (utf8mb3_uca1400_ai_ci,COERCIBLE) for operation '<>'
      mysqltest: At line 1: exec of '/mnt8t/bld/11.5-asan/client//mariadb --defaults-file=/mnt8t/bld/11.5-asan/mysql-test/var/my.cnf --default-character-set=utf8mb3 -e "select COLUMN_TYPE INTO @col_type from INFORMATION_SCHEMA.COLUMNS where TABLE_SCHEMA = 'mysql' limit 1; select @col_type != 'binary(128)'; show variables like 'character%'"' failed, error: 256, status: 1, errno: 11
      

      I assume it is an effect of changing the default collation to UCA or surrounding changes, I didn't check which exact commit caused it.

      Attachments

        Issue Links

          Activity

            mzahan Mircea Zahan added a comment - - edited

            This has literally killed our production servers and definitely needs to be fixed and published ASAP.

            After hours of trying to figure out what happened we finally got here and fixed the problem by changing the charset in the connection string and executing set names on each new connection. Details bellow, for whoever stumbles here:

            mysql:host=localhost;dbname=dbname;charset=utf8mb4

            set names utf8mb4 collate utf8mb4_unicode_ci

            mzahan Mircea Zahan added a comment - - edited This has literally killed our production servers and definitely needs to be fixed and published ASAP. After hours of trying to figure out what happened we finally got here and fixed the problem by changing the charset in the connection string and executing set names on each new connection. Details bellow, for whoever stumbles here: mysql:host=localhost;dbname=dbname;charset= utf8mb4 set names utf8mb4 collate utf8mb4_unicode_ci

            In the test case it is <my_uca_collation_handler_utf8mb3> vs <my_collation_utf8mb3_general_ci_handler>

            sanja Oleksandr Byelkin added a comment - In the test case it is <my_uca_collation_handler_utf8mb3> vs <my_collation_utf8mb3_general_ci_handler>

            Unicode collation algorithm (UCA)

            sanja Oleksandr Byelkin added a comment - Unicode collation algorithm (UCA)
            ycp Yuchen Pei added a comment - - edited

            I can confirm that the mtr test mentioned in the description reproduces at the following commit and passes at its parent commit:

            903b5d6a83c @ MDEV-25829 Change default Unicode collation to uca1400_ai_ci
            

            The testcase can be simplified to:

            set names utf8mb3;
            select COLUMN_TYPE INTO @col_type from INFORMATION_SCHEMA.COLUMNS limit 1;
            select @col_type != 'binary(128)';
            

            Note that select @col_type != 'binary(128)' causes failure, but if we place it in WHERE it is fine:

            SET NAMES utf8mb3;
            SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_TYPE != 'binary(128)';
            

            From the description of MDEV-25829, it seems to be caused by the change of derivation from IMPLICIT to COERCIBLE for string user variables. The derivation of user variables / literals is COERCIBLE, whereas table columns are IMPLICIT. To compare,

            • in SELECT ... WHERE COLUMN_TYPE != 'binary(128)', the column has IMPLICIT utf8mb3_general_ci and the literal has COERCIVE utf8mb3_uca1400_ai_ci. The comparison is ok because one is IMPLICIT and the other COERCIBLE
            • in SELECT COLUMN_TYPE INTO @col_type ..., the column has IMPLICIT utf8mb3_general_ci which results in @col_type having COERCIBLE utf8mb3_general_ci. The subsequent SELECT statement comparison is not ok because both have COERCIBLE derivations.

            So one possible solution would be to pass the IMPLICIT derivation from the column to the user variable too in SELECT ... INTO .... I wonder whether that would cause any regressions. Here's a proof of concept:

            f31931f5dce upstream/bb-11.6-mdev-35041 MDEV-35041 [poc] Pass derivation from column to uservar in SELECT ... INTO ...
            

            This caused failures in user_var.test. In fact, it seems to be an intentional design choice to have a fixed derivation for user vars independent of where the value comes from: MDEV-28221.

            The only other solution I can think of is to make INFORMATION_SCHEMA columns default to uca1400_ai_ci too, but that is a big change according to MDEV-25829:

            Under terms of this task we won't change the collation of INFORMATION_SCHEMA columns. They'll remain utf8mb3_general_ci. They'll probably also change eventually, but not now. Just one big change at a time!

            ycp Yuchen Pei added a comment - - edited I can confirm that the mtr test mentioned in the description reproduces at the following commit and passes at its parent commit: 903b5d6a83c @ MDEV-25829 Change default Unicode collation to uca1400_ai_ci The testcase can be simplified to: set names utf8mb3; select COLUMN_TYPE INTO @col_type from INFORMATION_SCHEMA.COLUMNS limit 1; select @col_type != 'binary(128)' ; Note that select @col_type != 'binary(128)' causes failure, but if we place it in WHERE it is fine: SET NAMES utf8mb3; SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_TYPE != 'binary(128)' ; From the description of MDEV-25829 , it seems to be caused by the change of derivation from IMPLICIT to COERCIBLE for string user variables. The derivation of user variables / literals is COERCIBLE, whereas table columns are IMPLICIT. To compare, in SELECT ... WHERE COLUMN_TYPE != 'binary(128)', the column has IMPLICIT utf8mb3_general_ci and the literal has COERCIVE utf8mb3_uca1400_ai_ci. The comparison is ok because one is IMPLICIT and the other COERCIBLE in SELECT COLUMN_TYPE INTO @col_type ..., the column has IMPLICIT utf8mb3_general_ci which results in @col_type having COERCIBLE utf8mb3_general_ci. The subsequent SELECT statement comparison is not ok because both have COERCIBLE derivations. So one possible solution would be to pass the IMPLICIT derivation from the column to the user variable too in SELECT ... INTO ... . I wonder whether that would cause any regressions. Here's a proof of concept: f31931f5dce upstream/bb-11.6-mdev-35041 MDEV-35041 [poc] Pass derivation from column to uservar in SELECT ... INTO ... This caused failures in user_var.test. In fact, it seems to be an intentional design choice to have a fixed derivation for user vars independent of where the value comes from: MDEV-28221 . The only other solution I can think of is to make INFORMATION_SCHEMA columns default to uca1400_ai_ci too, but that is a big change according to MDEV-25829 : Under terms of this task we won't change the collation of INFORMATION_SCHEMA columns. They'll remain utf8mb3_general_ci. They'll probably also change eventually, but not now. Just one big change at a time!
            bar Alexander Barkov added a comment - - edited

            Currently string user variables and string literals have equal derivation level, hence the error.

            Let's add another collation derivation level for string user variables.

            I think making user variables stronger than literals is more appropriate, as this behavior will be closer to pre-MDEV-25829:
            Before MDEV-25829, string user variables had IMPLICIT collation derivation, so they were stronger than literals.

            bar Alexander Barkov added a comment - - edited Currently string user variables and string literals have equal derivation level, hence the error. Let's add another collation derivation level for string user variables. I think making user variables stronger than literals is more appropriate, as this behavior will be closer to pre- MDEV-25829 : Before MDEV-25829 , string user variables had IMPLICIT collation derivation, so they were stronger than literals.
            bar Alexander Barkov added a comment - Hi Sergei, Please review https://github.com/MariaDB/server/commit/593f35548aa554faa9d4336ba900fa938d73ac36 Thanks.
            serg Sergei Golubchik added a comment - https://github.com/MariaDB/server/commit/593f35548aa554faa9d4336ba900fa938d73ac36 is ok to push

            People

              bar Alexander Barkov
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.