Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30766

Issue with NOT IN (subquery that violates "FULL GROUP BY")

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Not a Bug
    • 10.5, 10.6, 10.11, 10.3(EOL), 10.4(EOL), 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 11.0(EOL)
    • N/A
    • Optimizer
    • None

    Description

      With the following database:

      CREATE TABLE PLAYBACK (
        SESSION_ID INTEGER primary key,
        CUSTOMER_ID INTEGER,
        START_TIME INTEGER,
        END_TIME INTEGER
      );
      INSERT INTO PLAYBACK VALUES (-1, 0, 1, 1);
      INSERT INTO PLAYBACK VALUES (0, 0, 1, 1);
      CREATE TABLE ADS (
        AD_ID INTEGER primary key,
        CUSTOMER_ID INTEGER,
        TIMESTAMP INTEGER
      );
      INSERT INTO ADS VALUES (0, 0, 1);
      INSERT INTO ADS VALUES (1, 0, 1);
      

      After running Q1

      -- Q1
      SELECT ANY_VALUE(SESSION_ID) FROM PLAYBACK JOIN ADS ON PLAYBACK.CUSTOMER_ID=ADS.CUSTOMER_ID WHERE TIMESTAMP >= START_TIME AND TIMESTAMP <= END_TIME GROUP BY PLAYBACK.CUSTOMER_ID;
      

      the output is a table with one column and one row, and the only value is 0:

      +-----------------------+
      | ANY_VALUE(SESSION_ID) |
      +-----------------------+
      |                     0 |
      +-----------------------+
      

      And we have Q2:

      -- Q2
      SELECT SESSION_ID FROM PLAYBACK WHERE SESSION_ID NOT IN (SELECT ANY_VALUE(SESSION_ID) FROM PLAYBACK JOIN ADS ON PLAYBACK.CUSTOMER_ID=ADS.CUSTOMER_ID WHERE TIMESTAMP >= START_TIME AND TIMESTAMP <= END_TIME GROUP BY PLAYBACK.CUSTOMER_ID);
      

      Q2 is basically a query of `SELECT SESSION_ID FROM PLAYBACK WHERE SESSION_ID NOT IN Q1`

      Therefore, we would expect it outputs one row with SESSION_ID of -1 . However, the output is an empty set. We can further investigate this by running Q3:

      -- Q3
      SELECT SESSION_ID, (SELECT ANY_VALUE(SESSION_ID) FROM PLAYBACK JOIN ADS ON PLAYBACK.CUSTOMER_ID=ADS.CUSTOMER_ID WHERE TIMESTAMP >= START_TIME AND TIMESTAMP <= END_TIME GROUP BY PLAYBACK.CUSTOMER_ID) AS Q1, SESSION_ID NOT IN (SELECT ANY_VALUE(SESSION_ID) FROM PLAYBACK JOIN ADS ON PLAYBACK.CUSTOMER_ID=ADS.CUSTOMER_ID WHERE TIMESTAMP >= START_TIME AND TIMESTAMP <= END_TIME GROUP BY PLAYBACK.CUSTOMER_ID) AS 'SESSION_ID NOT IN Q1' FROM PLAYBACK;
      

      The output is:

      +------------+------+----------------------+
      | SESSION_ID | Q1   | SESSION_ID NOT IN Q1 |
      +------------+------+----------------------+
      |         -1 |    0 |                    0 |
      |          0 |    0 |                    0 |
      +------------+------+----------------------+
      

      We can see the result doesn't make sense, because -1 NOT IN (0) shouldn't be false.

      Attachments

        Issue Links

          Activity

            People

              serg Sergei Golubchik
              pinhan Pinhan Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.