Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35391

Inconsistent Comparison Behavior with COMPRESS Function

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 11.4.0
    • N/A
    • Character Sets
    • None

    Description

      A discrepancy occurs when comparing the output of the COMPRESS function in MariaDB. In the original table t0, querying with COMPRESS(c0) > c1 returns a result as expected. However, after storing COMPRESS(c0) in a derived table t1, applying the equivalent condition on t1 yields no results, even though the data and conditions are identical.

      Reproduce Steps:

      -- Step 1: Create table t0 and insert data
      CREATE TABLE t0 (c0 DATETIME, c1 VARCHAR(16), c2 VARCHAR(9));
      INSERT INTO t0 (c0, c1, c2) VALUES ('2003-01-18 19:04:14', 'o4oFjkHVT8LcDfnY', 'djXhVwU7m');
       
      -- Step 2: Query in the original table
      MariaDB [test]> SELECT c2 FROM t0 WHERE ((COMPRESS(c0)) > c1);
       
      +-----------+
      | c2        |
      +-----------+
      | djXhVwU7m |
      +-----------+
      1 row in set, 1 warning (0.012 sec)
       
      -- Step 3: Create derived table t1 with COMPRESS result
      CREATE TABLE t1 AS (SELECT (COMPRESS(c0)) AS c0, c1 AS c1, c2 AS c2 FROM t0);
       
      -- Step 4: Apply equivalent comparison in derived table t1
      MariaDB [test]> SELECT c2 FROM t1 WHERE ((c0) > c1);
      0 rows in set (0.007 sec)
      
      

      Expected Result:
      Both queries should return consistent results.

      Attachments

        Activity

          You can use the COLLATE clause or CAST

          serg Sergei Golubchik added a comment - You can use the COLLATE clause or CAST
          dwenking chaos added a comment -

          Thank you for your kind explanation. I have a question: without changing the type of c1, is there any other way to apply constraints so that the results of these two SELECT statements are equal?

          dwenking chaos added a comment - Thank you for your kind explanation. I have a question: without changing the type of c1, is there any other way to apply constraints so that the results of these two SELECT statements are equal?

          This is not a bug.

          In the original query:

          SELECT c2 FROM t0 WHERE ((COMPRESS(c0)) > c1);
          

          you compare a function to a column. Column has stronger collation that a function. So comparison is done according to the column collation.

          In the query with t1:

          SELECT c2 FROM t1 WHERE ((c0) > c1);
          

          you compare two columns. They have equal collation strength. But as c0 is VARBINARY, it wins. So binary comparison is done.

          To get equal results from the two queries, I suggest to use VARBINARY(16) instead of VARCHAR(16) for the column c1.

          bar Alexander Barkov added a comment - This is not a bug. In the original query: SELECT c2 FROM t0 WHERE ((COMPRESS(c0)) > c1); you compare a function to a column. Column has stronger collation that a function. So comparison is done according to the column collation. In the query with t1: SELECT c2 FROM t1 WHERE ((c0) > c1); you compare two columns. They have equal collation strength. But as c0 is VARBINARY, it wins. So binary comparison is done. To get equal results from the two queries, I suggest to use VARBINARY(16) instead of VARCHAR(16) for the column c1.
          danblack Daniel Black added a comment - - edited

          SHOW WARNINGS (from first expression)
          +---------+------+-------------------------------------------------------------------------------------------------+
          | Level   | Code | Message                                                                                         |
          +---------+------+-------------------------------------------------------------------------------------------------+
          | Warning | 1300 | Invalid utf8mb4 character string: '\x9C3200\xD650\xD45\xB4P0\xB4\xB420\xB124\x01\x00$G\x03\xB1' |
          +---------+------+-------------------------------------------------------------------------------------------------+
           
           
          MariaDB [test]> SELECT COERCIBILITY(COMPRESS(c0)) as hex from t0;
          +-----+
          | hex |
          +-----+
          |   6 |
          +-----+
           
          MariaDB [test]> SELECT COERCIBILITY(CAST(COMPRESS(c0) as BINARY)) as hex from t0;
          +-----+
          | hex |
          +-----+
          |   2 |
          +-----+
          

          Based on https://mariadb.com/kb/en/coercibility/ rules the type 6 is converted up to 2 (coercibility of c1) and fails because the compressed binary isn't a valid UTF8MB4 so becomes null.

          With an explicit cast, coercibility is 2 so I assume these get a binary comparison:

          MariaDB [test]> SELECT c2 FROM t0 WHERE (CAST(COMPRESS(c0) AS BINARY) > c1);
          Field   1:  `c2`
          Org_field:  `c2`
          Catalog:    `def`
          Database:   `test`
          Table:      `t0`
          Org_table:  `t0`
          Type:       VAR_STRING
          Collation:  utf8mb4_unicode_ci (224)
          Length:     36
          Max_length: 0
          Decimals:   0
          Flags:      
           
           
          0 rows in set (0.001 sec)
          

          A 6 coercibility seems to match other functions like concat:

           SELECT COERCIBILITY(CONCAT(c0)) as hex from t0;
          | hex |
          +-----+
          |   6 |
          +-----+
           
          For the varchar column:
          SELECT COERCIBILITY(CONCAT(c1)) as hex from t0;
          +-----+
          | hex |
          +-----+
          |   2 |
          +-----+
          

          So I don't think its a bug, just the way coecibility works according to documented implementations.

          Though if COMPRESS(c0) is NULL, why did the expression become true?

          danblack Daniel Black added a comment - - edited SHOW WARNINGS ( from first expression) + ---------+------+-------------------------------------------------------------------------------------------------+ | Level | Code | Message | + ---------+------+-------------------------------------------------------------------------------------------------+ | Warning | 1300 | Invalid utf8mb4 character string: '\x9C3200\xD650\xD45\xB4P0\xB4\xB420\xB124\x01\x00$G\x03\xB1' | + ---------+------+-------------------------------------------------------------------------------------------------+     MariaDB [test]> SELECT COERCIBILITY(COMPRESS(c0)) as hex from t0; + -----+ | hex | + -----+ | 6 | + -----+   MariaDB [test]> SELECT COERCIBILITY( CAST (COMPRESS(c0) as BINARY )) as hex from t0; + -----+ | hex | + -----+ | 2 | + -----+ Based on https://mariadb.com/kb/en/coercibility/ rules the type 6 is converted up to 2 (coercibility of c1) and fails because the compressed binary isn't a valid UTF8MB4 so becomes null. With an explicit cast, coercibility is 2 so I assume these get a binary comparison: MariaDB [test]> SELECT c2 FROM t0 WHERE (CAST(COMPRESS(c0) AS BINARY) > c1); Field 1: `c2` Org_field: `c2` Catalog: `def` Database: `test` Table: `t0` Org_table: `t0` Type: VAR_STRING Collation: utf8mb4_unicode_ci (224) Length: 36 Max_length: 0 Decimals: 0 Flags:     0 rows in set (0.001 sec) A 6 coercibility seems to match other functions like concat: SELECT COERCIBILITY(CONCAT(c0)) as hex from t0; | hex | + -----+ | 6 | + -----+   For the varchar column : SELECT COERCIBILITY(CONCAT(c1)) as hex from t0; + -----+ | hex | + -----+ | 2 | + -----+ So I don't think its a bug, just the way coecibility works according to documented implementations. Though if COMPRESS(c0) is NULL , why did the expression become true?

          People

            bar Alexander Barkov
            dwenking chaos
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.