[MDEV-25168] NULL values from LEFT JOIN Created: 2021-03-16  Updated: 2021-03-28

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.3.27
Fix Version/s: 10.3

Type: Bug Priority: Major
Reporter: Rich Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Focal (docker) and Debian 10


Attachments: File table-create.sql    

 Description   

Fundamentally I have a query like this:

SELECT outer.id, leftJoinedSubQuery.something
FROM innoDbTable outer
LEFT JOIN (
   SELECT id, COUNT(*) something
   FROM anotherTable
   ...
   GROUP BY id
) leftJoinedSubQuery ON leftJoinedSubQuery.id = outer.id
WHERE
   -- various conditions on outer --
  AND EXISTS (
      SELECT id FROM yetAnotherTable WHERE yetAnotherTable.id = outer.id
  )

Where the leftJoinedSubQuery returns rows like this:

id            something
1             123

And the outer returns a row with ID 1 too.

Now depending on the WHERE clauses - the something value might show up as NULL for *all* rows, including the row where ID = 1.

The actual query is at
https://dba.stackexchange.com/q/287133/4628

and copied below:

SELECT c.id contact_id, engagement.sent
FROM contact c
LEFT JOIN (
    /* ➊ Inner query */
    SELECT eq.contact_id contact_id, COUNT(*) sent
    FROM mailing_job j
    INNER JOIN mailing_event_queue eq ON eq.job_id = j.id
    WHERE j.end_date > NOW() - INTERVAL 3 MONTH
    GROUP BY eq.contact_id
  ) engagement ON engagement.contact_id = c.id
 
 WHERE
  /* ➋ various conditions on table 'c' */
 do_not_email = 0 AND is_opt_out = 0 AND is_deceased = 0 AND is_deleted = 0
   AND contact_type = 'Individual'
 
   /* ➌ */
   AND EXISTS (
      SELECT gc.contact_id
      FROM group_contact gc
      WHERE gc.contact_id = c.id AND gc.group_id IN (30,386,14,6,214,5,88,361,334,18,9,17,240,7,13,10,292,291,290,12)
    )
 
   /* ➍ AND c.id=1 */
ORDER BY c.id

For example,

  • ommitting ➋ makes it work
  • ommitting ➌ makes it work
  • un-commenting ➍ makes it work

I can't reproduce this with dummy data. The tables in use have between 200k and 10M rows each and the query takes ~40s+ to run.

I'm not used to Jira, so apols if I've not done this corectly.

Note, this rewrite of the query (by dba.stackexchange.com user ypercube - see link above) returns the correct results with and without the ➍ clause. It's very slow compared to the original though.

-- Query Z
  SELECT z.contact_id, engagement.sent
  FROM
      (
      SELECT c.id contact_id
      FROM contact c
 
       WHERE
        /* ➋ various conditions on table 'c' */
       c.do_not_email = 0 AND c.is_opt_out = 0 AND c.is_deceased = 0 AND c.is_deleted = 0
         AND c.contact_type = 'Individual'
 
         /* ➌ */
         AND EXISTS (
            SELECT gc.contact_id
            FROM group_contact gc
            WHERE gc.contact_id = c.id AND gc.group_id IN (30,386,14,6,214,5,88,361,334,18,9,17,240,7,13,10,292,291,290,12)
          )
 
         /* ➍ */
         AND c.id=1
      ) AS z
      LEFT JOIN (
          /* ➊ Inner query */
          SELECT eq.contact_id contact_id, COUNT(*) sent
          FROM mailing_job j
          INNER JOIN mailing_event_queue eq ON eq.job_id = j.id
          WHERE j.end_date > NOW() - INTERVAL 3 MONTH
          GROUP BY eq.contact_id
        ) engagement ON engagement.contact_id = z.contact_id
      ;

I've attached the create table statements.



 Comments   
Comment by Rich [ 2021-03-17 ]

I just ran this on 10.4.12 9 from docker image and it works. Interestingly it's waaaaay slower, but at least it doesn't lie.

Probably can close this?

Generated at Thu Feb 08 09:35:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.