Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4727

Queries which use GROUP BY with ORDER BY HEX return more rows than the rules of character identity expects in some collations.

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 23.10, 25.10
    • Icebox
    • Compability
    • None

    Description

      I run this script:

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (c1 VARCHAR(20) CHARACTER SET latin1 COLLATE latin1_swedish_ci) ENGINE=ColumnStore;
      INSERT INTO t1 VALUES ('a'),('A'),('ä'),('Ä'),('ã'),('Ã');
      SELECT c1, COUNT(*) FROM t1 GROUP BY c1;
      

      +------+----------+
      | c1   | COUNT(*) |
      +------+----------+
      | ä    |        2 |
      | a    |        4 |
      +------+----------+
      

      SELECT DISTINCT c1 FROM t1;
      

      +------+
      | c1   |
      +------+
      | a    |
      | ä    |
      +------+
      

      Looks good so far. In the given collation:

      • A WITH TILDE ABOVE is equal to accent-less A
      • A WITH DIAERESIS is a separate letter (sorted in the end of the Swedish alphabet after Z)

      If I do not use HEX in ORDER BY - everything is fine also.

      MariaDB [gjd]> SELECT c1, COUNT(*) FROM t1 GROUP BY c1 ORDER BY c1;
      +------+----------+
      | c1   | COUNT(*) |
      +------+----------+
      | a    |        4 |
      | ä    |        2 |
      +------+----------+
      

      Things go wrong if do ORDER BY HEX(c1):

      SELECT c1, COUNT(*) FROM t1 GROUP BY c1 ORDER BY HEX(c1);
      

      +------+----------+
      | c1   | COUNT(*) |
      +------+----------+
      | A    |        1 |
      | a    |        1 |
      | Ã    |        1 |
      | Ä    |        1 |
      | ã    |        1 |
      | ä    |        1 |
      +------+----------+
      

      SELECT DISTINCT c1 FROM t1 ORDER BY HEX(c1);
      

      +------+
      | c1   |
      +------+
      | A    |
      | a    |
      | Ã    |
      | Ä    |
      | ã    |
      | ä    |
      +------+
      

      Notice, the numbers of records in the result sets have changed. These two queries are expected to return two rows each.

      Attachments

        Issue Links

          Activity

            People

              alexey.antipovsky Aleksei Antipovskii
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.