Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35391

Inconsistent Comparison Behavior with COMPRESS Function

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 11.4.0
    • N/A
    • Character Sets
    • None

    Description

      A discrepancy occurs when comparing the output of the COMPRESS function in MariaDB. In the original table t0, querying with COMPRESS(c0) > c1 returns a result as expected. However, after storing COMPRESS(c0) in a derived table t1, applying the equivalent condition on t1 yields no results, even though the data and conditions are identical.

      Reproduce Steps:

      -- Step 1: Create table t0 and insert data
      CREATE TABLE t0 (c0 DATETIME, c1 VARCHAR(16), c2 VARCHAR(9));
      INSERT INTO t0 (c0, c1, c2) VALUES ('2003-01-18 19:04:14', 'o4oFjkHVT8LcDfnY', 'djXhVwU7m');
       
      -- Step 2: Query in the original table
      MariaDB [test]> SELECT c2 FROM t0 WHERE ((COMPRESS(c0)) > c1);
       
      +-----------+
      | c2        |
      +-----------+
      | djXhVwU7m |
      +-----------+
      1 row in set, 1 warning (0.012 sec)
       
      -- Step 3: Create derived table t1 with COMPRESS result
      CREATE TABLE t1 AS (SELECT (COMPRESS(c0)) AS c0, c1 AS c1, c2 AS c2 FROM t0);
       
      -- Step 4: Apply equivalent comparison in derived table t1
      MariaDB [test]> SELECT c2 FROM t1 WHERE ((c0) > c1);
      0 rows in set (0.007 sec)
      
      

      Expected Result:
      Both queries should return consistent results.

      Attachments

        Activity

          danblack Daniel Black added a comment - - edited

          SHOW WARNINGS (from first expression)
          +---------+------+-------------------------------------------------------------------------------------------------+
          | Level   | Code | Message                                                                                         |
          +---------+------+-------------------------------------------------------------------------------------------------+
          | Warning | 1300 | Invalid utf8mb4 character string: '\x9C3200\xD650\xD45\xB4P0\xB4\xB420\xB124\x01\x00$G\x03\xB1' |
          +---------+------+-------------------------------------------------------------------------------------------------+
           
           
          MariaDB [test]> SELECT COERCIBILITY(COMPRESS(c0)) as hex from t0;
          +-----+
          | hex |
          +-----+
          |   6 |
          +-----+
           
          MariaDB [test]> SELECT COERCIBILITY(CAST(COMPRESS(c0) as BINARY)) as hex from t0;
          +-----+
          | hex |
          +-----+
          |   2 |
          +-----+
          

          Based on https://mariadb.com/kb/en/coercibility/ rules the type 6 is converted up to 2 (coercibility of c1) and fails because the compressed binary isn't a valid UTF8MB4 so becomes null.

          With an explicit cast, coercibility is 2 so I assume these get a binary comparison:

          MariaDB [test]> SELECT c2 FROM t0 WHERE (CAST(COMPRESS(c0) AS BINARY) > c1);
          Field   1:  `c2`
          Org_field:  `c2`
          Catalog:    `def`
          Database:   `test`
          Table:      `t0`
          Org_table:  `t0`
          Type:       VAR_STRING
          Collation:  utf8mb4_unicode_ci (224)
          Length:     36
          Max_length: 0
          Decimals:   0
          Flags:      
           
           
          0 rows in set (0.001 sec)
          

          A 6 coercibility seems to match other functions like concat:

           SELECT COERCIBILITY(CONCAT(c0)) as hex from t0;
          | hex |
          +-----+
          |   6 |
          +-----+
           
          For the varchar column:
          SELECT COERCIBILITY(CONCAT(c1)) as hex from t0;
          +-----+
          | hex |
          +-----+
          |   2 |
          +-----+
          

          So I don't think its a bug, just the way coecibility works according to documented implementations.

          Though if COMPRESS(c0) is NULL, why did the expression become true?

          danblack Daniel Black added a comment - - edited SHOW WARNINGS ( from first expression) + ---------+------+-------------------------------------------------------------------------------------------------+ | Level | Code | Message | + ---------+------+-------------------------------------------------------------------------------------------------+ | Warning | 1300 | Invalid utf8mb4 character string: '\x9C3200\xD650\xD45\xB4P0\xB4\xB420\xB124\x01\x00$G\x03\xB1' | + ---------+------+-------------------------------------------------------------------------------------------------+     MariaDB [test]> SELECT COERCIBILITY(COMPRESS(c0)) as hex from t0; + -----+ | hex | + -----+ | 6 | + -----+   MariaDB [test]> SELECT COERCIBILITY( CAST (COMPRESS(c0) as BINARY )) as hex from t0; + -----+ | hex | + -----+ | 2 | + -----+ Based on https://mariadb.com/kb/en/coercibility/ rules the type 6 is converted up to 2 (coercibility of c1) and fails because the compressed binary isn't a valid UTF8MB4 so becomes null. With an explicit cast, coercibility is 2 so I assume these get a binary comparison: MariaDB [test]> SELECT c2 FROM t0 WHERE (CAST(COMPRESS(c0) AS BINARY) > c1); Field 1: `c2` Org_field: `c2` Catalog: `def` Database: `test` Table: `t0` Org_table: `t0` Type: VAR_STRING Collation: utf8mb4_unicode_ci (224) Length: 36 Max_length: 0 Decimals: 0 Flags:     0 rows in set (0.001 sec) A 6 coercibility seems to match other functions like concat: SELECT COERCIBILITY(CONCAT(c0)) as hex from t0; | hex | + -----+ | 6 | + -----+   For the varchar column : SELECT COERCIBILITY(CONCAT(c1)) as hex from t0; + -----+ | hex | + -----+ | 2 | + -----+ So I don't think its a bug, just the way coecibility works according to documented implementations. Though if COMPRESS(c0) is NULL , why did the expression become true?

          This is not a bug.

          In the original query:

          SELECT c2 FROM t0 WHERE ((COMPRESS(c0)) > c1);
          

          you compare a function to a column. Column has stronger collation that a function. So comparison is done according to the column collation.

          In the query with t1:

          SELECT c2 FROM t1 WHERE ((c0) > c1);
          

          you compare two columns. They have equal collation strength. But as c0 is VARBINARY, it wins. So binary comparison is done.

          To get equal results from the two queries, I suggest to use VARBINARY(16) instead of VARCHAR(16) for the column c1.

          bar Alexander Barkov added a comment - This is not a bug. In the original query: SELECT c2 FROM t0 WHERE ((COMPRESS(c0)) > c1); you compare a function to a column. Column has stronger collation that a function. So comparison is done according to the column collation. In the query with t1: SELECT c2 FROM t1 WHERE ((c0) > c1); you compare two columns. They have equal collation strength. But as c0 is VARBINARY, it wins. So binary comparison is done. To get equal results from the two queries, I suggest to use VARBINARY(16) instead of VARCHAR(16) for the column c1.
          dwenking chaos added a comment -

          Thank you for your kind explanation. I have a question: without changing the type of c1, is there any other way to apply constraints so that the results of these two SELECT statements are equal?

          dwenking chaos added a comment - Thank you for your kind explanation. I have a question: without changing the type of c1, is there any other way to apply constraints so that the results of these two SELECT statements are equal?

          You can use the COLLATE clause or CAST

          serg Sergei Golubchik added a comment - You can use the COLLATE clause or CAST

          People

            bar Alexander Barkov
            dwenking chaos
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.