[MCOL-1205] support queries with circular INNER joins Created: 2018-02-06  Updated: 2023-03-06  Resolved: 2021-07-06

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 1.0.12
Fix Version/s: 6.1.1

Type: Bug Priority: Blocker
Reporter: David Thompson (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 8
Labels: None

Issue Links:
Blocks
blocks MCOL-4530 TPC-H query 19: - semi joins / 'li... Closed
is blocked by MCOL-4713 Optimizer statistics Closed
Duplicate
is duplicated by MCOL-1420 Loos spanning tree check when ON clau... Closed
Issue split
split to MCOL-4699 support queries with circular OUTER j... Closed
PartOf
includes MCOL-4513 Investigate the original reasons why ... Closed
Relates
relates to MCOL-4902 Views disable certain outer join opti... Closed
Epic Link: Tech debt
Sprint: 2018-10, 2018-11, 2018-12, 2018-13, 2018-14, 2018-15, 2018-16, 2021-5, 2021-6, 2021-7, 2021-8, 2021-9

 Description   

Certain types of query will result in the following error:
ERROR 1815 (HY000) at line 6: Internal error: IDB-1003: Circular joins are not supported.

This happens if a given table is joined more than once. This is a common pattern and can be generated by bi tools including microstrategy.

dbt3 query 5 is one such example that should be supported:

execute the query #5
 
select
n_name,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
customer,
orders,
lineitem,
supplier,
nation,
region
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_suppkey = s_suppkey
and c_nationkey = s_nationkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
and o_orderdate >= '1993-01-01'
and o_orderdate < date_add( '1993-01-01' , interval 1 year)
group by
n_name
order by
revenue desc;



 Comments   
Comment by Dalu (Inactive) [ 2018-07-03 ]

we are facing the same issue currently, is there an ETA for this feature?

Comment by Dalu (Inactive) [ 2019-06-14 ]

no update since over a year?

Comment by Andrew Hutchings (Inactive) [ 2019-06-14 ]

Hi DanielW,

We have a PoC fix for this but it can make bad decisions on join order so we never merged it. A fix with a new optimizer for this would be a very large task and there hasn't been enough demand for us to work on a proper fix.

In the mean time there is a workaround, for the example in the description add "+0" to the c_nationkey / s_nationkey join as follows:

and c_nationkey = s_nationkey + 0

This makes a separate virtual column internally allowing the join to work, but you may still need to manually order the joins if you find this does not perform well.

Comment by Daniel Lee (Inactive) [ 2021-04-29 ]

Build tested: 5.6.1 ( Drone #2270)

Tested dbt3 query #5 on a 10g (ten gb) database. It returned an error

[centos8:root~]# mysql tpch10 -vvv </data/qa/autopilot/performance/dbt3/sql/10g/5.sql.original 
--------------
select
	n_name,
	sum(l_extendedprice * (1 - l_discount)) as revenue
from
	customer,
	orders,
	lineitem,
	supplier,
	nation,
	region
where
	c_custkey = o_custkey
	and l_orderkey = o_orderkey
	and l_suppkey = s_suppkey
	and c_nationkey = s_nationkey
	and s_nationkey = n_nationkey
	and n_regionkey = r_regionkey
	and r_name = 'MIDDLE EAST'
	and o_orderdate >= '1994-01-01'
	and o_orderdate < date_add( '1994-01-01' , interval 1 year)
group by
	n_name
order by
	revenue desc
--------------
 
ERROR 1815 (HY000) at line 6: Internal error: InetStreamSocket::readToMagic: Remote is closed
Bye

I ran the 1gb qeury again on a 1gb (one gb) database and it took 436 seconds.
The modified version of query #5 took 7 seconds to execute on a 10g.

[centos8:root~]# mysql tpch1 -vvv </data/qa/autopilot/performance/dbt3/sql/1g/5.sql.original 
--------------
select
	n_name,
	sum(l_extendedprice * (1 - l_discount)) as revenue
from
	customer,
	orders,
	lineitem,
	supplier,
	nation,
	region
where
	c_custkey = o_custkey
	and l_orderkey = o_orderkey
	and l_suppkey = s_suppkey
	and c_nationkey = s_nationkey
	and s_nationkey = n_nationkey
	and n_regionkey = r_regionkey
	and r_name = 'AMERICA'
	and o_orderdate >= '1993-01-01'
	and o_orderdate < date_add( '1993-01-01' , interval 1 year)
group by
	n_name
order by
	revenue desc
--------------
 
+---------------+--------------------+
| n_name        | revenue            |
+---------------+--------------------+
| PERU          | 8255761247026.0164 |
| CANADA        | 8246925619913.6201 |
| ARGENTINA     | 8120116982337.0609 |
| BRAZIL        | 7977931767054.3294 |
| UNITED STATES | 7772891387103.9486 |
+---------------+--------------------+
5 rows in set (7 min 36.532 sec)
 
 
Bye

I also compared the 10g dbt3 performance of this build to release 5.5.2, using the modified query #5.

10g DBT3 performance test in seconds
 
all 22 queries, with query #5 modified
 
5.6.1-1 is about 20% faster
 
           diskRun	CachedRun  Delta
5.5.2-1    341.596  273.675	   -20%
5.6.1-1    302.928  243.378    -20$

Comment by Daniel Lee (Inactive) [ 2021-07-06 ]

Build tested: 6.1.1 ( #2727)

With 32 gb memory and a 10gb dbt3 database, query #5 returned an error.

ERROR 1815 (HY000): Internal error: IDB-2001: Join or subselect exceeds memory limit.

The query worked if I used the analyze command before executing the query.

Performance is about the same as the modified query #5.

Generated at Thu Feb 08 02:27:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.