Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19028

Addressing the contraction problem with Engine Independent statistics

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.2, 10.3, 10.4
    • 10.4
    • Optimizer
    • None

    Description

      Filing the contraction problem mentioned in MDEV-18899 as a seperate issue

      Contraction problem

      Also, the underlying code should be checked for contraction compatibility. The code copying to column_statistics.min_value should make sure not to break contractions in the middle, otherwise max_value can be very far from the actual maximum value.

      For example, consider this data in combination with Czech collation:

      CONCAT(REPEAT('x',254), 'ch'))
      

      'ch' is a separate letter which is sorted between 'h' and 'i':
      http://collation-charts.org/mysql60/mysql604.utf8_czech_ci.html

      'ch' should not be broken into parts when copying to column_statistics.min_value:
      'c' cannot be the last 255-th byte in column_statistics.min_value, because it was followed by 'h' in the original full-length data. The copying code should store only the REPEAT('x',254) part.

      For column_statistics.max_value, the copying code will be even harder: it should replace 'ch' to the character which immediately follows 'ch' in the collation, which is 'i'.

      An example

      CREATE OR REPLACE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8, comment TEXT);
      INSERT INTO t1 VALUES ('aa','This is MIN'), ('aë','This is MAX');
      

      SELECT a,comment FROM t1 ORDER BY a;
      +------+-------------+
      | a    | comment     |
      +------+-------------+
      | aa   | This is MIN |
      | aë   | This is MAX |
      +------+-------------+
       
      SELECT CASE WHEN a='aë' THEN 'a' ELSE a END AS a_2_byte,comment FROM t1 ORDER BY 1;
      +----------+-------------+
      | a_2_byte | comment     |
      +----------+-------------+
      | a        | This is MAX |
      | aa       | This is MIN |
      +----------+-------------+
      

      The limit used is 2 bytes here, instead of 255, for simplicity.

      Notice, the original MIN and MAX values are 2 bytes ('aa') and 3 bytes ('aë') respectively
      Now if we cut them to 2 bytes in the multi-byte safe way, we get:

      • 'aa' is still 'aa'
      • 'aë' becomes 'a' which makes it smaller than 'aa'
        So when the cut in a multi-byte safe way min and max can change

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            varun Varun Gupta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.