Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4817

Optimizer fails to optimize expression of the form 'FOO' IS NULL

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 5.5.32
    • 5.5.33
    • None
    • None
    • openSUSE 12.1
      mariadb-5.5.28a-1.4.1.x86_64

    Description

      This is a copy of this MySQL bug: http://bugs.mysql.com/bug.php?id=69359

      I am filing it here in hopes it will get more attention than from the MySQL folks.

      Note, this bug occurs in BOTH MySQL and MariaDB.

      Restatement of bug follows...

      See

      http://stackoverflow.com/questions/16848190/mysql-why-isnt-foo-is-null-optimized-away

      for a description. Quoting that here:

      I have two tables Person and Message and the latter has a foreign key to the former. Each table has id as the primary key column, and the Person table also has a column personId which is (uniquely) indexed.

      The query below should take advantage of the personId key index, but instead MySQL requires scanning the entire Message table for some reason:

      mysql> EXPLAIN SELECT `m`.*
          -> FROM
          ->   `Message` AS `m`
          -> LEFT JOIN
          ->   `Person` AS `p` ON (`m`.`person` = `p`.`id`)
          -> WHERE
          ->   'M002649397' IS NULL OR
          ->   `p`.`personId` = 'M002649397';
      +----+-------------+-------+--------+---------------+---------+---------+----------------+--------+-------------+
      | id | select_type | table | type   | possible_keys | key     | key_len | ref            | rows   | Extra       |
      +----+-------------+-------+--------+---------------+---------+---------+----------------+--------+-------------+
      |  1 | SIMPLE      | m     | ALL    | NULL          | NULL    | NULL    | NULL           | 273220 |             |
      |  1 | SIMPLE      | p     | eq_ref | PRIMARY       | PRIMARY | 8       | pcom.m.person  |      1 | Using where |
      +----+-------------+-------+--------+---------------+---------+---------+----------------+--------+-------------+
      2 rows in set (0.00 sec)

      But when I comment out the 'M002649397' IS NULL OR clause (which has no effect on the result), the query suddenly gets more efficient:

      mysql> EXPLAIN SELECT `m`.*
          -> FROM
          ->   `Message` AS `m`
          -> LEFT JOIN
          ->   `Person` AS `p` ON (`m`.`person` = `p`.`id`)
          -> WHERE
          -> --  'M002649397' IS NULL OR
          ->   `p`.`personId` = 'M002649397';
      +----+-------------+-------+-------+--------------------+--------------------+---------+-------+------+-------------+
      | id | select_type | table | type  | possible_keys      | key                | key_len | ref   | rows | Extra       |
      +----+-------------+-------+-------+--------------------+--------------------+---------+-------+------+-------------+
      |  1 | SIMPLE      | p     | const | PRIMARY,personId   | personId           | 767     | const |    1 | Using index |
      |  1 | SIMPLE      | m     | ref   | FK9C2397E7A0F6ED11 | FK9C2397E7A0F6ED11 | 9       | const |    3 | Using where |
      +----+-------------+-------+-------+--------------------+--------------------+---------+-------+------+-------------+
      2 rows in set (0.01 sec)

      The bug is that the 'M002649397' IS NULL expression, which is always false, is not being optimized away.

      Here is a schema to test with:

      create table Message (
          type char(1) not null,
          id bigint not null auto_increment,
          createTime datetime not null,
          updateTime datetime not null,
          person bigint,
          primary key (id)
      ) ENGINE=InnoDB default charset=utf8 collate=utf8_bin;
       
      create table Person (
          id bigint not null auto_increment,
          createTime datetime not null,
          updateTime datetime not null,
          firstName varchar(255),
          lastName varchar(255),
          middleName varchar(255),
          personId varchar(255) not null unique,
          primary key (id)
      ) ENGINE=InnoDB default charset=utf8 collate=utf8_bin;
       
      create index idx_Message_createTime on Message (createTime);
       
      alter table Message
          add index FK9C2397E7A0F6ED11 (person),
          add constraint FK9C2397E7A0F6ED11
          foreign key (person)
          references Person (id);
       
      create index idx_Person_lastName on Person (lastName);

      Attachments

        Issue Links

          Activity

            The query has form

            m LEFT JOIN p ON ...

            LEFT JOIN allows only one join order:

            m, p.

            (This is because LEFT JOIN needs to find all records in `m`, regardless
            of whether they have a match in `p`).
            When we use that join order, the ON expression allows to access table `p`
            using eq_ref access method (that is, a lookup on a primary key).
            However, the condition `p`.`personId` = 'M002649397' is not useful.

            If we take a careful look at the WHERE clause, we can see that is only
            satisfied when

            `p`.`personId` = 'M002649397'

            This means that we're not interested in results of LEFT JOIN that have a
            NULL-complemented row for table `p`. In other words, we can replace the
            LEFT JOIN with INNER JOIN.

            Once we have INNER JOIN, we can use both join orders:

            • m, p
            • p, m

            When we use the join order "p, m" (like the second EXPLAIN does), we can
            use condition to limit the number of rows we get from table `p`.

            The problem here seems to be that the optimizer is unable to convert left
            join into inner join.

            When the WHERE clause is just

            `p`.`personId` = 'M002649397'

            then conversion works (check out EXPLAIN EXTENDED - it shows "JOIN")

            When the WHERE clause is

            'M002649397' IS NULL OR `p`.`personId` = 'M002649397';

            then conversion doesn't work.

            psergei Sergei Petrunia added a comment - The query has form m LEFT JOIN p ON ... LEFT JOIN allows only one join order: m, p. (This is because LEFT JOIN needs to find all records in `m`, regardless of whether they have a match in `p`). When we use that join order, the ON expression allows to access table `p` using eq_ref access method (that is, a lookup on a primary key). However, the condition `p`.`personId` = 'M002649397' is not useful. If we take a careful look at the WHERE clause, we can see that is only satisfied when `p`.`personId` = 'M002649397' This means that we're not interested in results of LEFT JOIN that have a NULL-complemented row for table `p`. In other words, we can replace the LEFT JOIN with INNER JOIN. Once we have INNER JOIN, we can use both join orders: m, p p, m When we use the join order "p, m" (like the second EXPLAIN does), we can use condition to limit the number of rows we get from table `p`. The problem here seems to be that the optimizer is unable to convert left join into inner join. When the WHERE clause is just `p`.`personId` = 'M002649397' then conversion works (check out EXPLAIN EXTENDED - it shows "JOIN") When the WHERE clause is 'M002649397' IS NULL OR `p`.`personId` = 'M002649397'; then conversion doesn't work.

            I was able to replicate:

            MariaDB [j9]> EXPLAIN SELECT `m`.*
            -> FROM
            -> `Message` AS `m`
            -> LEFT JOIN
            -> `Person` AS `p` ON (`m`.`person` = `p`.`id`)
            -> WHERE
            -> 'M002649397' IS NULL OR
            -> `p`.`personId` = 'M002649397';
            ---------------------------------------------------------------------------------+

            id select_type table type possible_keys key key_len ref rows Extra

            ---------------------------------------------------------------------------------+

            1 SIMPLE m ALL NULL NULL NULL NULL 1974  
            1 SIMPLE p eq_ref PRIMARY PRIMARY 8 j9.m.person 1 Using where

            ---------------------------------------------------------------------------------+
            2 rows in set (0.00 sec)

            MariaDB [j9]>
            MariaDB [j9]> EXPLAIN SELECT `m`.*
            -> FROM
            -> `Message` AS `m`
            -> INNER JOIN
            -> `Person` AS `p` ON (`m`.`person` = `p`.`id`)
            -> WHERE
            -> 'M002649397' IS NULL OR
            -> `p`.`personId` = 'M002649397';
            ------------------------------------------------------------------------------------------+

            id select_type table type possible_keys key key_len ref rows Extra

            ------------------------------------------------------------------------------------------+

            1 SIMPLE p const PRIMARY,personId personId 767 const 1 Using index
            1 SIMPLE m ref FK9C2397E7A0F6ED11 FK9C2397E7A0F6ED11 9 const 2  

            ------------------------------------------------------------------------------------------+
            2 rows in set (0.00 sec)

            psergei Sergei Petrunia added a comment - I was able to replicate: MariaDB [j9] > EXPLAIN SELECT `m`.* -> FROM -> `Message` AS `m` -> LEFT JOIN -> `Person` AS `p` ON (`m`.`person` = `p`.`id`) -> WHERE -> 'M002649397' IS NULL OR -> `p`.`personId` = 'M002649397'; ----- ----------- ----- ------ ------------- ------- ------- ----------- ---- ------------+ id select_type table type possible_keys key key_len ref rows Extra ----- ----------- ----- ------ ------------- ------- ------- ----------- ---- ------------+ 1 SIMPLE m ALL NULL NULL NULL NULL 1974   1 SIMPLE p eq_ref PRIMARY PRIMARY 8 j9.m.person 1 Using where ----- ----------- ----- ------ ------------- ------- ------- ----------- ---- ------------+ 2 rows in set (0.00 sec) MariaDB [j9] > MariaDB [j9] > EXPLAIN SELECT `m`.* -> FROM -> `Message` AS `m` -> INNER JOIN -> `Person` AS `p` ON (`m`.`person` = `p`.`id`) -> WHERE -> 'M002649397' IS NULL OR -> `p`.`personId` = 'M002649397'; ----- ----------- ----- ----- ------------------ ------------------ ------- ----- ---- ------------+ id select_type table type possible_keys key key_len ref rows Extra ----- ----------- ----- ----- ------------------ ------------------ ------- ----- ---- ------------+ 1 SIMPLE p const PRIMARY,personId personId 767 const 1 Using index 1 SIMPLE m ref FK9C2397E7A0F6ED11 FK9C2397E7A0F6ED11 9 const 2   ----- ----------- ----- ----- ------------------ ------------------ ------- ----- ---- ------------+ 2 rows in set (0.00 sec)

            How to fill the test dataset:
            insert into Person (personID, lastName) select concat('id-', a), concat('id-', a) from test.one_k;
            insert into Message (person) select a from test.one_k where a>0;
            insert into Message (person) select a from test.one_k where a>0;

            psergei Sergei Petrunia added a comment - How to fill the test dataset: insert into Person (personID, lastName) select concat('id-', a), concat('id-', a) from test.one_k; insert into Message (person) select a from test.one_k where a>0; insert into Message (person) select a from test.one_k where a>0;

            One may ask, doesn't MySQL (or MariaDB) optimizer has module that removes parts of WHERE condition that are known to be false. Yes, it does. The problem is that outer-to-inner join conversion step is run before the constant-condition-removal module is run (there are reasons for this).

            psergei Sergei Petrunia added a comment - One may ask, doesn't MySQL (or MariaDB) optimizer has module that removes parts of WHERE condition that are known to be false. Yes, it does. The problem is that outer-to-inner join conversion step is run before the constant-condition-removal module is run (there are reasons for this).

            Indeed

            Breakpoint 1, simplify_joins (join=0x7fff9803cfe8, join_list=0x7fff98007ca0, conds=0x7fff98008238, top=true, in_sj=false) at /home/psergey/dev2/5.5/sql/sql_select.cc:12788
            (gdb) p table->alias
            $3 = 0x7fff98006cc0 "p"

            (gdb) p dbug_print_item(conds)
            $4 = 0x14b33a0 "(isnull('M002649397') or (`j9`.`p`.`personId` = 'M002649397'))"
            (gdb) p conds->not_null_tables()
            $5 = 0
            (gdb) p ((Item_cond_or*)conds)>argument_list()>head()
            $10 = (Item_func_isnull *) 0x7fff98007df8
            (gdb) p ((Item_cond_or*)conds)>argument_list()>head()->not_null_tables()
            $11 = 0

            (gdb) set $second= ((Item_cond_or*)conds)>argument_list()>first->next->info
            (gdb) p ((Item*)$second)
            $14 = (Item_func_eq *) 0x7fff980080c0
            (gdb) p ((Item*)$second)->not_null_tables()
            $15 = 2

            psergei Sergei Petrunia added a comment - Indeed Breakpoint 1, simplify_joins (join=0x7fff9803cfe8, join_list=0x7fff98007ca0, conds=0x7fff98008238, top=true, in_sj=false) at /home/psergey/dev2/5.5/sql/sql_select.cc:12788 (gdb) p table->alias $3 = 0x7fff98006cc0 "p" (gdb) p dbug_print_item(conds) $4 = 0x14b33a0 "(isnull('M002649397') or (`j9`.`p`.`personId` = 'M002649397'))" (gdb) p conds->not_null_tables() $5 = 0 (gdb) p ((Item_cond_or*)conds) >argument_list() >head() $10 = (Item_func_isnull *) 0x7fff98007df8 (gdb) p ((Item_cond_or*)conds) >argument_list() >head()->not_null_tables() $11 = 0 (gdb) set $second= ((Item_cond_or*)conds) >argument_list() >first->next->info (gdb) p ((Item*)$second) $14 = (Item_func_eq *) 0x7fff980080c0 (gdb) p ((Item*)$second)->not_null_tables() $15 = 2

            (gdb) set $first= ((Item_cond_or*)conds)>argument_list()>head()
            (gdb) p $first->basic_const_item()
            $19 = false
            (gdb) p $first->used_tables()
            $20 = 0
            (gdb) p $first->is_expensive()
            $21 = false

            psergei Sergei Petrunia added a comment - (gdb) set $first= ((Item_cond_or*)conds) >argument_list() >head() (gdb) p $first->basic_const_item() $19 = false (gdb) p $first->used_tables() $20 = 0 (gdb) p $first->is_expensive() $21 = false

            Maybe, we could remove items that have

            item->const_item() && !item->is_expensive() && item->val_int() == 0

            from having item->not_null_tables() put into not_null_tables_cache.

            psergei Sergei Petrunia added a comment - Maybe, we could remove items that have item->const_item() && !item->is_expensive() && item->val_int() == 0 from having item->not_null_tables() put into not_null_tables_cache.

            Committed a patch. It needs to be tested and reviewed.

            psergei Sergei Petrunia added a comment - Committed a patch. It needs to be tested and reviewed.
            psergei Sergei Petrunia added a comment - - edited

            The fix was pushed into 5.5, and will be available in MariaDB 5.5.33.

            psergei Sergei Petrunia added a comment - - edited The fix was pushed into 5.5, and will be available in MariaDB 5.5.33.

            Archie, thanks for reporting this bug here. The fix should be an improvement for the MariaDB optimizer.

            psergei Sergei Petrunia added a comment - Archie, thanks for reporting this bug here. The fix should be an improvement for the MariaDB optimizer.

            Thanks for the quick turnaround!

            archie Archie Cobbs (Inactive) added a comment - Thanks for the quick turnaround!

            People

              psergei Sergei Petrunia
              archie Archie Cobbs (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.