Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34665

Simplify IN predicate processing for NULL-aware materialization involving only one column

Details

    Description

      Looks like it is possible to skip the building sorted Ordered_key structures for subselect_rowid_merge_engine when there is only one key in the materialized subquery.

      We have predicate

      left_expr IN (SELECT <subq1>)
      

      where left_expr is a scalar (not a tuple). The logic in this case looks as follows:

      if (left_expr is NULL) {
        if (subq1 produced any rows) {
          // note that we don't care if subq1 has produced
          // NULLs or not.
          NULL IN (<some values>) -> UNKNOWN, i.e. NULL.
        } else {
          NULL IN ({empty-set}) -> FALSE. 
        }
      } else {
        // left_expr is a non-NULL value
        if (subq1 output has a match for left_expr) {
          left_expr IN (..., left_expr ...) -> TRUE
        } else {
          // no "known" matches.
          if (subq1 output has a NULL) {
            left_expr IN ( ... NULL ...) -> 
             (NULL could have been a match or not)
             -> NULL.
          } else {
            // subq1 didn't produce any "UNKNOWNs" so
            // we're positive there weren't any matches
            -> FALSE.
          }
        }
      }
      

      Attachments

        Issue Links

          Activity

            oleg.smirnov Oleg Smirnov created issue -
            oleg.smirnov Oleg Smirnov made changes -
            Field Original Value New Value
            Description Looks like it is possible to skip complex logic of building sorted Ordered_key structures of subselect_rowid_merge_engine when there is only one key in the materialized subquery.
            In this scenario subselect_partial_match_engine::exec() can choose:
            - index lookup when left_expr of the IN predicate is not NULL
            - return NULL (unknown) when left_expr is NULL and the materialized table has at least one NULL value
            - return FALSE otherwise (todo: is it correct if there are no rows at all?)
            Looks like it is possible to skip complex logic of building sorted Ordered_key structures of subselect_rowid_merge_engine when there is only one key in the materialized subquery.
            In this scenario subselect_partial_match_engine::exec() can choose:
            - index lookup when left_expr of the IN predicate is not NULL
            - return NULL (unknown) when left_expr is NULL and the materialized table has at least one NULL value
            - return TRUE otherwise (todo: is it correct when there are no rows in the table?)
            oleg.smirnov Oleg Smirnov made changes -
            Description Looks like it is possible to skip complex logic of building sorted Ordered_key structures of subselect_rowid_merge_engine when there is only one key in the materialized subquery.
            In this scenario subselect_partial_match_engine::exec() can choose:
            - index lookup when left_expr of the IN predicate is not NULL
            - return NULL (unknown) when left_expr is NULL and the materialized table has at least one NULL value
            - return TRUE otherwise (todo: is it correct when there are no rows in the table?)
            Looks like it is possible to skip the building sorted Ordered_key structures for subselect_rowid_merge_engine when there is only one key in the materialized subquery.

            The logic in this case looks as follows:

            if (left_expr is NULL) {
              if (subq1 produced any rows) {
                // note that we don't care if subq1 has produced
                // NULLs or not.
                NULL IN (<some values>) -> UNKNOWN, i.e. NULL.
              } else {
                NULL IN ({empty-set}) -> FALSE.
              }
            } else {
              // left_expr is a non-NULL value
              if (subq1 output has a match for left_expr) {
                left_expr IN (..., left_expr ...) -> TRUE
              } else {
                // no "known" matches.
                if (subq1 output has a NULL) {
                  left_expr IN ( ... NULL ...) ->
                   (NULL could have been a match or not)
                   -> NULL.
                } else {
                  // subq1 didn't produce any "UNKNOWNs" so
                  // we're positive there weren't any matches
                  -> FALSE.
                }
              }
            }
            oleg.smirnov Oleg Smirnov made changes -
            Description Looks like it is possible to skip the building sorted Ordered_key structures for subselect_rowid_merge_engine when there is only one key in the materialized subquery.

            The logic in this case looks as follows:

            if (left_expr is NULL) {
              if (subq1 produced any rows) {
                // note that we don't care if subq1 has produced
                // NULLs or not.
                NULL IN (<some values>) -> UNKNOWN, i.e. NULL.
              } else {
                NULL IN ({empty-set}) -> FALSE.
              }
            } else {
              // left_expr is a non-NULL value
              if (subq1 output has a match for left_expr) {
                left_expr IN (..., left_expr ...) -> TRUE
              } else {
                // no "known" matches.
                if (subq1 output has a NULL) {
                  left_expr IN ( ... NULL ...) ->
                   (NULL could have been a match or not)
                   -> NULL.
                } else {
                  // subq1 didn't produce any "UNKNOWNs" so
                  // we're positive there weren't any matches
                  -> FALSE.
                }
              }
            }
            Looks like it is possible to skip the building sorted Ordered_key structures for subselect_rowid_merge_engine when there is only one key in the materialized subquery.

            The logic in this case looks as follows:
            {code}
            if (left_expr is NULL) {
              if (subq1 produced any rows) {
                // note that we don't care if subq1 has produced
                // NULLs or not.
                NULL IN (<some values>) -> UNKNOWN, i.e. NULL.
              } else {
                NULL IN ({empty-set}) -> FALSE.
              }
            } else {
              // left_expr is a non-NULL value
              if (subq1 output has a match for left_expr) {
                left_expr IN (..., left_expr ...) -> TRUE
              } else {
                // no "known" matches.
                if (subq1 output has a NULL) {
                  left_expr IN ( ... NULL ...) ->
                   (NULL could have been a match or not)
                   -> NULL.
                } else {
                  // subq1 didn't produce any "UNKNOWNs" so
                  // we're positive there weren't any matches
                  -> FALSE.
                }
              }
            }
            {code}
            oleg.smirnov Oleg Smirnov made changes -
            Description Looks like it is possible to skip the building sorted Ordered_key structures for subselect_rowid_merge_engine when there is only one key in the materialized subquery.

            The logic in this case looks as follows:
            {code}
            if (left_expr is NULL) {
              if (subq1 produced any rows) {
                // note that we don't care if subq1 has produced
                // NULLs or not.
                NULL IN (<some values>) -> UNKNOWN, i.e. NULL.
              } else {
                NULL IN ({empty-set}) -> FALSE.
              }
            } else {
              // left_expr is a non-NULL value
              if (subq1 output has a match for left_expr) {
                left_expr IN (..., left_expr ...) -> TRUE
              } else {
                // no "known" matches.
                if (subq1 output has a NULL) {
                  left_expr IN ( ... NULL ...) ->
                   (NULL could have been a match or not)
                   -> NULL.
                } else {
                  // subq1 didn't produce any "UNKNOWNs" so
                  // we're positive there weren't any matches
                  -> FALSE.
                }
              }
            }
            {code}
            Looks like it is possible to skip the building sorted Ordered_key structures for subselect_rowid_merge_engine when there is only one key in the materialized subquery.

            The logic in this case looks as follows.
            We have predicate
            {code}
            left_expr IN (SELECT <subq1>)
            {code}
            where left_expr is a scalar (not a tuple)

            {code}
            if (left_expr is NULL) {
              if (subq1 produced any rows) {
                // note that we don't care if subq1 has produced
                // NULLs or not.
                NULL IN (<some values>) -> UNKNOWN, i.e. NULL.
              } else {
                NULL IN ({empty-set}) -> FALSE.
              }
            } else {
              // left_expr is a non-NULL value
              if (subq1 output has a match for left_expr) {
                left_expr IN (..., left_expr ...) -> TRUE
              } else {
                // no "known" matches.
                if (subq1 output has a NULL) {
                  left_expr IN ( ... NULL ...) ->
                   (NULL could have been a match or not)
                   -> NULL.
                } else {
                  // subq1 didn't produce any "UNKNOWNs" so
                  // we're positive there weren't any matches
                  -> FALSE.
                }
              }
            }
            {code}
            oleg.smirnov Oleg Smirnov made changes -
            Summary NULL-aware materialization: implement short-curcuit for the case of only one key NULL-aware materialization: implement short-curcuit for scalar left_expr in IN-predicate
            oleg.smirnov Oleg Smirnov made changes -
            Description Looks like it is possible to skip the building sorted Ordered_key structures for subselect_rowid_merge_engine when there is only one key in the materialized subquery.

            The logic in this case looks as follows.
            We have predicate
            {code}
            left_expr IN (SELECT <subq1>)
            {code}
            where left_expr is a scalar (not a tuple)

            {code}
            if (left_expr is NULL) {
              if (subq1 produced any rows) {
                // note that we don't care if subq1 has produced
                // NULLs or not.
                NULL IN (<some values>) -> UNKNOWN, i.e. NULL.
              } else {
                NULL IN ({empty-set}) -> FALSE.
              }
            } else {
              // left_expr is a non-NULL value
              if (subq1 output has a match for left_expr) {
                left_expr IN (..., left_expr ...) -> TRUE
              } else {
                // no "known" matches.
                if (subq1 output has a NULL) {
                  left_expr IN ( ... NULL ...) ->
                   (NULL could have been a match or not)
                   -> NULL.
                } else {
                  // subq1 didn't produce any "UNKNOWNs" so
                  // we're positive there weren't any matches
                  -> FALSE.
                }
              }
            }
            {code}
            Looks like it is possible to skip the building sorted Ordered_key structures for subselect_rowid_merge_engine when there is only one key in the materialized subquery.

            We have predicate
            {code}
            left_expr IN (SELECT <subq1>)
            {code}
            where left_expr is a scalar (not a tuple). The logic in this case looks as follows:
            {code}
            if (left_expr is NULL) {
              if (subq1 produced any rows) {
                // note that we don't care if subq1 has produced
                // NULLs or not.
                NULL IN (<some values>) -> UNKNOWN, i.e. NULL.
              } else {
                NULL IN ({empty-set}) -> FALSE.
              }
            } else {
              // left_expr is a non-NULL value
              if (subq1 output has a match for left_expr) {
                left_expr IN (..., left_expr ...) -> TRUE
              } else {
                // no "known" matches.
                if (subq1 output has a NULL) {
                  left_expr IN ( ... NULL ...) ->
                   (NULL could have been a match or not)
                   -> NULL.
                } else {
                  // subq1 didn't produce any "UNKNOWNs" so
                  // we're positive there weren't any matches
                  -> FALSE.
                }
              }
            }
            {code}
            oleg.smirnov Oleg Smirnov made changes -
            Fix Version/s 10.5 [ 23123 ]
            oleg.smirnov Oleg Smirnov made changes -
            Summary NULL-aware materialization: implement short-curcuit for scalar left_expr in IN-predicate Simplify IN predicate processing for materialization involving only one column
            oleg.smirnov Oleg Smirnov made changes -
            Summary Simplify IN predicate processing for materialization involving only one column Simplify IN predicate processing for NULL-aware materialization involving only one column
            oleg.smirnov Oleg Smirnov made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            oleg.smirnov Oleg Smirnov added a comment -

            psergei, can you please review the PR?

            oleg.smirnov Oleg Smirnov added a comment - psergei , can you please review the PR ?
            oleg.smirnov Oleg Smirnov made changes -
            Assignee Oleg Smirnov [ JIRAUSER50405 ] Sergei Petrunia [ psergey ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -

            Reviewing...

            Checking what would subselect_table_scan_engine do when computing

            /*col1=*/NULL  IN  (SELECT ... FROM non_empty_select_without_nulls) 
            

            It would get into subselect_table_scan_engine::partial_match; would start a table scan and then would find it has a match after reading just one row.

            psergei Sergei Petrunia added a comment - Reviewing... Checking what would subselect_table_scan_engine do when computing /*col1=*/NULL IN (SELECT ... FROM non_empty_select_without_nulls) It would get into subselect_table_scan_engine::partial_match ; would start a table scan and then would find it has a match after reading just one row.

            Review input provided in the PR.

            psergei Sergei Petrunia added a comment - Review input provided in the PR.
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ] Oleg Smirnov [ JIRAUSER50405 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            oleg.smirnov Oleg Smirnov made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            oleg.smirnov Oleg Smirnov added a comment -

            Code review fixes are pushed.

            oleg.smirnov Oleg Smirnov added a comment - Code review fixes are pushed.
            oleg.smirnov Oleg Smirnov made changes -
            Assignee Oleg Smirnov [ JIRAUSER50405 ] Sergei Petrunia [ psergey ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            Review input provided

            psergei Sergei Petrunia added a comment - Review input provided
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ] Oleg Smirnov [ JIRAUSER50405 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            oleg.smirnov Oleg Smirnov made changes -
            Status Stalled [ 10000 ] In Testing [ 10301 ]
            oleg.smirnov Oleg Smirnov made changes -
            Assignee Oleg Smirnov [ JIRAUSER50405 ] Lena Startseva [ JIRAUSER50478 ]
            oleg.smirnov Oleg Smirnov added a comment -

            lstartseva, could you please test the branch bb-10.5-MDEV-34665? psergei has already approved the pull request.

            oleg.smirnov Oleg Smirnov added a comment - lstartseva , could you please test the branch bb-10.5- MDEV-34665 ? psergei has already approved the pull request.

            Testinп done. Ok to push. But, before pushing it will be better to add some cases for "update" and "delete". This will allow you to check that everything is fine after merging in later versions.

            lstartseva Lena Startseva added a comment - Testinп done. Ok to push. But, before pushing it will be better to add some cases for "update" and "delete". This will allow you to check that everything is fine after merging in later versions.
            lstartseva Lena Startseva made changes -
            Assignee Lena Startseva [ JIRAUSER50478 ] Oleg Smirnov [ JIRAUSER50405 ]
            Status In Testing [ 10301 ] Stalled [ 10000 ]
            oleg.smirnov Oleg Smirnov added a comment -

            OK, added tests for UPDATE and DELETE and pushed to 10.5.

            oleg.smirnov Oleg Smirnov added a comment - OK, added tests for UPDATE and DELETE and pushed to 10.5.
            oleg.smirnov Oleg Smirnov made changes -
            Fix Version/s 10.5.27 [ 29902 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            JIraAutomate JiraAutomate made changes -
            Fix Version/s 10.6.20 [ 29903 ]
            Fix Version/s 10.11.10 [ 29904 ]
            Fix Version/s 11.2.6 [ 29906 ]
            Fix Version/s 11.4.4 [ 29907 ]
            oleg.smirnov Oleg Smirnov added a comment -

            ralf.gebhardt, do you mean this commit won't make it into the upcoming releases? No problem for me to update the version but I don't know where the cut off point is. Who may know it for sure?

            oleg.smirnov Oleg Smirnov added a comment - ralf.gebhardt , do you mean this commit won't make it into the upcoming releases? No problem for me to update the version but I don't know where the cut off point is. Who may know it for sure?
            dbart Daniel Bartholomew made changes -
            Fix Version/s 10.5.28 [ 29952 ]
            Fix Version/s 10.6.21 [ 29953 ]
            Fix Version/s 10.11.11 [ 29954 ]
            Fix Version/s 11.4.5 [ 29956 ]
            Fix Version/s 10.5.27 [ 29902 ]
            Fix Version/s 10.6.20 [ 29903 ]
            Fix Version/s 10.11.10 [ 29904 ]
            Fix Version/s 11.2.6 [ 29906 ]
            Fix Version/s 11.4.4 [ 29907 ]

            People

              oleg.smirnov Oleg Smirnov
              oleg.smirnov Oleg Smirnov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.