Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8785

Wrong results for EXPLAIN EXTENDED...WHERE NULLIF(latin1_col, _utf8'a' COLLATE utf8_bin) IS NOT NULL

    XMLWordPrintable

    Details

      Description

      NULLIF(a,b) is implemented according to the SQL standard as a synonym for:

      CASE WHEN a=b THEN NULL ELSE a END

      This script proves that NULLIF(a,b) is actually implemented in the CASE-style expression that operates on three! (rather that two) arguments:

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a VARCHAR(10));
      INSERT INTO t1 VALUES ('a'),('A');
      SELECT a, NULLIF(a,_utf8'a' COLLATE utf8_bin) IS NULL FROM t1;
      SELECT CHARSET(NULLIF(a,_utf8'a' COLLATE utf8_bin)) FROM t1;

      returns:

      MariaDB [test]> SELECT a, NULLIF(a,_utf8'a' COLLATE utf8_bin) IS NULL FROM t1;
      +------+---------------------------------------------+
      | a    | NULLIF(a,_utf8'a' COLLATE utf8_bin) IS NULL |
      +------+---------------------------------------------+
      | a    |                                           1 |
      | A    |                                           0 |
      +------+---------------------------------------------+
      2 rows in set (0.00 sec)
       
      MariaDB [test]> SELECT CHARSET(NULLIF(a,_utf8'a' COLLATE utf8_bin)) FROM t1;
      +----------------------------------------------+
      | CHARSET(NULLIF(a,_utf8'a' COLLATE utf8_bin)) |
      +----------------------------------------------+
      | latin1                                       |
      | latin1                                       |
      +----------------------------------------------+
      2 rows in set (0.00 sec)

      Notice:

      • the values were compared using the utf8_bin collations and the results for 'a' and for 'A' are different, which is an evidence that "a" was internally converted to utf8 for comparison and then compares as utf8_bin
      • the character set for NULLIF is still latin1, therefore "a" was NOT converted to utf8 for the return value

      Summary: we have actually two "a"s, one "a" for comparison and one "a" for the return value. This is correct and conforms the Standard.

      However, if I now do EXPLAIN EXTENDED with the same NULLIF:

      EXPLAIN EXTENDED SELECT NULLIF(a,_utf8'a' COLLATE utf8_bin) IS NULL AS expr FROM t1;
      SHOW WARNINGS;

      It returns:

      +-------+------+-------------------------------------------------------------------------------------------------------------------+
      | Level | Code | Message                                                                                                           |
      +-------+------+-------------------------------------------------------------------------------------------------------------------+
      | Note  | 1003 | select isnull(nullif(convert(`test`.`t1`.`a` using utf8),(_utf8'a' collate utf8_bin))) AS `expr` from `test`.`t1` |
      +-------+------+-------------------------------------------------------------------------------------------------------------------+

      which is wrong and does not match the original NULLIF() and assumes that the return value of this NULLIF() has character set utf8.

      The expected result would be to use the CASE style in the EXPLAIN EXTENDED, something like this:

      select isnull(CASE WHEN convert(`test`.`t1`.`a` using utf8)=(_utf8'a' collate utf8_bin))) THEN NULL ELSE `test`.`t1`.`a` END) AS `expr` from `test`.`t1`

      The same problem is repeatable with FORMAT=JSON:

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a VARCHAR(10));
      EXPLAIN FORMAT=JSON SELECT * FROM t1 WHERE NULLIF(a,_utf8'a' COLLATE utf8_bin);
      SHOW WARNINGS;

      The above script returns:

      +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | EXPLAIN                                                                                                                                                                                                                                                               |
      +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | {
        "query_block": {
          "select_id": 1,
          "table": {
            "table_name": "t1",
            "access_type": "ALL",
            "rows": 1,
            "filtered": 100,
            "attached_condition": "nullif(convert(t1.a using utf8),<cache>((_utf8'a' collate utf8_bin)))"
          }
        }
      } |
      +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              bar Alexander Barkov
              Reporter:
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: