Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9964

Optimize DISTINCT away when it is not needed

    XMLWordPrintable

Details

    Description

      Consider two tables like this:

      CREATE TABLE `t1` (
       `i` int(11) NOT NULL,
       `j` int(11) NOT NULL,
       `l` int(11) NOT NULL,
       `m` int(11) DEFAULT NULL,
       PRIMARY KEY (`i`,`j`,`l`)
      );
       
      CREATE TABLE `t2` (
       `i` int(11) NOT NULL,
       `k` int(11) DEFAULT NULL,
       PRIMARY KEY (`i`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
      

      Now, I issue a query is this:

      SELECT DISTINCT
          t2.i, t2.k, t1.j, t1.l, t1.m
      FROM t2 LEFT JOIN t1 ON t2.i=t1.i
      WHERE
          t2.k < 123;
      

      Assuming that the table has a sizable amount of rows, then the query is slow because the DISTINCT clause will likely force the query to use a temporary table.

      Now, it is technically impossible for the query to return non-unique rows even if the DISTINCT clause is missing. If the left join matched nothing, then t2.i is unique because it is a PK. If the left join does match, then the selected combination (t2.i, t1.j, t1.l) is unique because we defined t2.i=t1.i and (t1.i, t1.j, t1.l) is a PK.

      In fact, removing DISTINCT makes the query 100x faster in my real-world example. Normally, I would simply remove the DISTINCT, but this is a query issued by software over which I have no control. It would be really cool if the optimizer could detect cases where DISTINCT is unnecessary.

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            ErikCederstrand Erik Cederstrand
            Votes:
            3 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.