[MCOL-5543] Aggregated count on left join result is incorrect after switching join order Created: 2023-07-27  Updated: 2023-09-22  Resolved: 2023-08-02

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 23.02
Fix Version/s: 23.10.0

Type: Bug Priority: Major
Reporter: Richard Stracke Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: rm_invalid_data

Attachments: File ebi-886.sql     File import_data.sql     File t1.zip.001     File t1.zip.002     Zip Archive t2.zip     Zip Archive t3.zip     Zip Archive t4.zip    
Issue Links:
Duplicate
is duplicated by MCOL-5522 PM join exceeded the join match count... Closed

 Description   

In some cases the count throw wrong results , when join order was switched

 
select count(t1.ID)
from t1  -- 2
left join t2  on t2.ID = t1.ID -- 104
left join t3  on t3.ID = t1.ID -- 42
left join t4  on t4.ID = t1.ID -- 149
where t1.ID = 509583314; -- 71.565.312 != 2*104*42*149 = 1.301.664
 
select count(t1.ID)
from t1 -- 2
left join t2 on t2.ID = t1.ID -- 104
left join t4  on t4.ID = t1.ID -- 149          !!! switched t3 and t4 !!!
left join t3  on t3.ID = t1.ID -- 42
where t1.ID = 509583314; -- 1.301.664 = 2*104*149*42 = 1.301.664

Maybe it is caused from wrong extent elimination according trace.

 
select calSetTrace(1);
SELECT count(*)
from t1 -- 2
left join t2 on t2.ID = t1.ID -- 104
left join t3 on t3.ID = t1.ID -- 42
left join t4 on t4.ID = t1.ID -- 149
where t1.ID = 509583314; -- 71.565.312 != 2*104*42*149 = 1.301.664
 
select calGetTrace();
 
Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows
BPS PM t1 3311 (id) 0 3580 0 0.005 2
BPS PM t2 3314 (id) 0 520 0 0.016 0
HJS PM t2-t1 3314 - - - - ----- -
BPS PM t3 3318 (id) 0 664 0 0.022 0
HJS PM t3-t1 3318 - - - - ----- -
BPS PM t4 3321 (id) 0 4128 0 1.508 6655
HJS PM t4-t1 3321 - - - - ----- -
TAS UM - - - - - - 1.399 1
TNS UM - - - - - - 0.000 1
|
 
select calSetTrace(1);
SELECT count(*)
from t1 -- 2
left join t2 on t2.ID = t1.ID -- 104
left join t4 on t4.ID = t1.ID -- 149 !!! switched t3 and t4 !!!
left join t3 on t3.ID = t1.ID -- 42
where t1.ID = 509583314; -- 1.301.664 = 2*104*149*42 = 1.301.664
 
select calGetTrace();
 
Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows
BPS PM t2 3314 (id) 0 392 0 0.020 0
HJS PM t2-t1 3314 - - - - ----- -
BPS PM t4 3321 (id) 0 2000 0 0.114 0
HJS PM t4-t1 3321 - - - - ----- -
BPS PM t3 3318 (id) 0 4193 0 0.211 4291957
BPS PM t1 3311 (id) 0 3580 0 0.018 2
HJS UM t1-t3 3311 - - - - ----- -
TAS UM - - - - - - 0.001 1
TNS UM - - - - - - 0.000 1

Setup with data to reproduce is attached.



 Comments   
Comment by alexey vorovich (Inactive) [ 2023-07-27 ]

dleeyh could u please confirm that the issue still happens in latest develop after the fixes done by denis0x0D

Comment by Daniel Lee (Inactive) [ 2023-07-27 ]

Build tested: develop branch (with fix for MCOL-5522)

engine: 1a49a09af3f215dc9b177ac50cfdd234476c7d04
server: 62d6100a913699fec9ff48284a76bfe6226e70bc
buildNo: 8270

Reproduced the issue in 23.02.4. The issue no longer exist in the latest of the develop branch. The counts are now matching that from the InnoDB engine.

Release 23.02.4

MariaDB [mytest]> select count(t1.ID)
    -> from t1  -- 2
    -> left join t2  on t2.ID = t1.ID -- 104
    -> left join t3  on t3.ID = t1.ID -- 42
    -> left join t4  on t4.ID = t1.ID -- 149
    -> where t1.ID = 509583314; -- 71.565.312 != 2*104*42*149 = 1.301.664
+--------------+
| count(t1.ID) |
+--------------+
|     71565312 |
+--------------+
1 row in set (2.024 sec)
 
MariaDB [mytest]> select count(t1.ID)
    -> from t1 -- 2
    -> left join t2 on t2.ID = t1.ID -- 104
    -> left join t4  on t4.ID = t1.ID -- 149          !!! switched t3 and t4 !!!
    -> left join t3  on t3.ID = t1.ID -- 42
    -> where t1.ID = 509583314; -- 1.301.664 = 2*104*149*42 = 1.301.664
+--------------+
| count(t1.ID) |
+--------------+
|      1301664 |
+--------------+
1 row in set (0.817 sec)

develop branch (with fix for MCOL-5522)

MariaDB [mytest]> select count(t1.ID)
    -> from t1  -- 2
    -> left join t2  on t2.ID = t1.ID -- 104
    -> left join t3  on t3.ID = t1.ID -- 42
    -> left join t4  on t4.ID = t1.ID -- 149
    -> where t1.ID = 509583314; -- 1.301.664 = 2*104*149*42 = 1.301.664;
+--------------+
| count(t1.ID) |
+--------------+
|      1301664 |
+--------------+
1 row in set (1.676 sec)
 
MariaDB [mytest]> select count(t1.ID)
    -> from t1 -- 2
    -> left join t2 on t2.ID = t1.ID -- 104
    -> left join t4  on t4.ID = t1.ID -- 149          !!! switched t3 and t4 !!!
    -> left join t3  on t3.ID = t1.ID -- 42
    -> where t1.ID = 509583314; -- 1.301.664 = 2*104*149*42 = 1.301.664;
+--------------+
| count(t1.ID) |
+--------------+
|      1301664 |
+--------------+
1 row in set (1.905 sec)

Comment by Daniel Lee (Inactive) [ 2023-08-02 ]

Closing this ticket per my last comment.

Generated at Thu Feb 08 02:58:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.