Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4727

Queries which use GROUP BY with ORDER BY HEX return more rows than the rules of character identity expects in some collations.

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 5.5.1
    • Icebox
    • PrimProc
    • None

    Description

      I run this script:

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (c1 VARCHAR(20) CHARACTER SET latin1 COLLATE latin1_swedish_ci) ENGINE=ColumnStore;
      INSERT INTO t1 VALUES ('a'),('A'),('ä'),('Ä'),('ã'),('Ã');
      SELECT c1, COUNT(*) FROM t1 GROUP BY c1;
      

      +------+----------+
      | c1   | COUNT(*) |
      +------+----------+
      | ä    |        2 |
      | a    |        4 |
      +------+----------+
      

      SELECT DISTINCT c1 FROM t1;
      

      +------+
      | c1   |
      +------+
      | a    |
      | ä    |
      +------+
      

      Looks good so far. In the given collation:

      • A WITH TILDE ABOVE is equal to accent-less A
      • A WITH DIAERESIS is a separate letter (sorted in the end of the Swedish alphabet after Z)

      If I do not use HEX in ORDER BY - everything is fine also.

      MariaDB [gjd]> SELECT c1, COUNT(*) FROM t1 GROUP BY c1 ORDER BY c1;
      +------+----------+
      | c1   | COUNT(*) |
      +------+----------+
      | a    |        4 |
      | ä    |        2 |
      +------+----------+
      

      Things go wrong if do ORDER BY HEX(c1):

      SELECT c1, COUNT(*) FROM t1 GROUP BY c1 ORDER BY HEX(c1);
      

      +------+----------+
      | c1   | COUNT(*) |
      +------+----------+
      | A    |        1 |
      | a    |        1 |
      | Ã    |        1 |
      | Ä    |        1 |
      | ã    |        1 |
      | ä    |        1 |
      +------+----------+
      

      SELECT DISTINCT c1 FROM t1 ORDER BY HEX(c1);
      

      +------+
      | c1   |
      +------+
      | A    |
      | a    |
      | Ã    |
      | Ä    |
      | ã    |
      | ä    |
      +------+
      

      Notice, the numbers of records in the result sets have changed. These two queries are expected to return two rows each.

      Attachments

        Activity

          People

            Unassigned Unassigned
            bar Alexander Barkov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.