Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 5.6.2, 6.2.1
    • 6.2.3
    • ?
    • None
    • mariadb/columnstore Docker container
    • 2021-16

    Description

      I currently write some queries during migration to ColumnStore from another Database and stumbled upon a probable bug concerning collation.

      We have columns where we use WHERE clauses where the matching has to be case sensitive. Therefore I created the database with a binary collation as default settings.

      When filtering with a WHERE clause using the LIKE operator, the collation settings seem to be used. When using with IN this seem to be not the case.

      Expected result: IN operator should also be case sensitive when set in the database settings.

      MariaDB [dac]> CREATE DATABASE test CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
      Query OK, 1 row affected (0.028 sec)
       
      MariaDB [dac]> USE test;
      Database changed
      MariaDB [test]> DROP TABLE IF EXISTS collation_test;
      Query OK, 0 rows affected, 1 warning (0.325 sec)
       
      MariaDB [test]>
      MariaDB [test]> CREATE TABLE collation_test (
          -> LETTERS TEXT NULL
          -> )
          -> ENGINE=ColumnStore;
      Query OK, 0 rows affected (0.878 sec)
       
      MariaDB [test]>
      MariaDB [test]> INSERT INTO collation_test (LETTERS) VALUES ('a'), ('A'), ('b'), ('B');
      Query OK, 4 rows affected (0.827 sec)
      Records: 4  Duplicates: 0  Warnings: 0
       
      MariaDB [test]>
      MariaDB [test]> SELECT * FROM collation_test;
      +---------+
      | LETTERS |
      +---------+
      | a       |
      | A       |
      | b       |
      | B       |
      +---------+
      4 rows in set (0.396 sec)
       
      MariaDB [test]>
      MariaDB [test]> SELECT * FROM collation_test WHERE LETTERS LIKE 'a';
      +---------+
      | LETTERS |
      +---------+
      | a       |
      +---------+
      1 row in set (0.029 sec)
       
      MariaDB [test]> SELECT * FROM collation_test WHERE LETTERS IN ('a');
      +---------+
      | LETTERS |
      +---------+
      | a       |
      | A       |
      +---------+
      2 rows in set (0.047 sec)
       
      MariaDB [test]>
      

      Setting the collation explicitly does not help either

      MariaDB [test]> SELECT * FROM collation_test WHERE LETTERS IN ('a' COLLATE utf8mb4_bin);
      ERROR 1253 (42000): COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'latin1'
      

      The setting is though set to utf8mb4_bin. (checked via SHOW TABLE STATUS LIKE 'collation_test'.

      Attachments

        Activity

          This is how it works for InnoDB:

          MariaDB [test]> CREATE TABLE `collation_test` (
              ->   `LETTERS` text COLLATE utf8mb4_bin DEFAULT NULL
              -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
          Query OK, 0 rows affected (1.051 sec)
           
          MariaDB [test]> insert into collation_test values ('a'), ('A');
          Query OK, 2 rows affected (0.267 sec)
          Records: 2  Duplicates: 0  Warnings: 0
           
          MariaDB [test]> select * from collation_test where LETTERS like 'a';
          +---------+
          | LETTERS |
          +---------+
          | a       |
          +---------+
          1 row in set (0.032 sec)
           
          MariaDB [test]> select * from collation_test where LETTERS = 'a';
          +---------+
          | LETTERS |
          +---------+
          | a       |
          +---------+
          1 row in set (0.051 sec)
           
          MariaDB [test]> select * from collation_test where LETTERS in ('a');
          +---------+
          | LETTERS |
          +---------+
          | a       |
          +---------+
          1 row in set (0.002 sec)
          

          As we have a difference in behavior here for Columnstore and I can not find any obvious explanation or logic (LIKE vs IN behavior) I'd agree that this is a bug.

          valerii Valerii Kravchuk added a comment - This is how it works for InnoDB: MariaDB [test]> CREATE TABLE `collation_test` ( -> `LETTERS` text COLLATE utf8mb4_bin DEFAULT NULL -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; Query OK, 0 rows affected (1.051 sec)   MariaDB [test]> insert into collation_test values ('a'), ('A'); Query OK, 2 rows affected (0.267 sec) Records: 2 Duplicates: 0 Warnings: 0   MariaDB [test]> select * from collation_test where LETTERS like 'a'; +---------+ | LETTERS | +---------+ | a | +---------+ 1 row in set (0.032 sec)   MariaDB [test]> select * from collation_test where LETTERS = 'a'; +---------+ | LETTERS | +---------+ | a | +---------+ 1 row in set (0.051 sec)   MariaDB [test]> select * from collation_test where LETTERS in ('a'); +---------+ | LETTERS | +---------+ | a | +---------+ 1 row in set (0.002 sec) As we have a difference in behavior here for Columnstore and I can not find any obvious explanation or logic (LIKE vs IN behavior) I'd agree that this is a bug.
          drrtuy Roman added a comment -

          valerii tdeussing_appian MCS doesn't leverage encoding settings set at database level so I am not sure the target table has encoding set. That is why MDB returns ERROR 1253 (42000): COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'latin1'. I suggest to explicitly set encoding/collation creating the table.

          drrtuy Roman added a comment - valerii tdeussing_appian MCS doesn't leverage encoding settings set at database level so I am not sure the target table has encoding set. That is why MDB returns ERROR 1253 (42000): COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'latin1'. I suggest to explicitly set encoding/collation creating the table.
          drrtuy Roman added a comment -

          After certain testing I need to confess there must be wrong collation propagation somewhere in MCS so that latin1_swedish_ci is used as the collation.
          This is a legitimate bug and we will address this.

          drrtuy Roman added a comment - After certain testing I need to confess there must be wrong collation propagation somewhere in MCS so that latin1_swedish_ci is used as the collation. This is a legitimate bug and we will address this.
          drrtuy Roman added a comment -

          Plz review.

          drrtuy Roman added a comment - Plz review.

          Build verified: 6.2.3-1 (#3675)

          Verified the fix using the test case in the bug description.

          dleeyh Daniel Lee (Inactive) added a comment - Build verified: 6.2.3-1 (#3675) Verified the fix using the test case in the bug description.

          People

            dleeyh Daniel Lee (Inactive)
            tdeussing_appian Tobias Deussing
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.