[MCOL-4530] TPC-H query 19: - semi joins / 'lineitem' and 'part' are not joined. Created: 2021-02-03  Updated: 2023-03-22  Resolved: 2023-02-28

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: 23.02.1

Type: Bug Priority: Blocker
Reporter: Todd Stoffel (Inactive) Assignee: Leonid Fedorov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
is blocked by MCOL-1205 support queries with circular INNER j... Closed
is blocked by MCOL-4713 Optimizer statistics Closed
Assigned for Testing: Daniel Lee Daniel Lee (Inactive)

 Description   

Query 19: IDB-1000: 'lineitem' and 'part' are not joined.

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#23'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 2 and l_quantity <= 2 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#24'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 20 and l_quantity <= 20 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#44'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 26 and l_quantity <= 26 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);



 Comments   
Comment by Roman [ 2021-04-19 ]

The permanent solution will be three different cases. One of which namely circular joins support is WIP now. Two others: non-equi JOINs and semi-join with aggregation (in this particular case) on top are not yet started.

Comment by Kuijun Cui [ 2022-01-10 ]

Hi, could you share the progress of fixing this bug?

Comment by Gregory Dorman (Inactive) [ 2022-01-10 ]

#4 is fixed. The rest in progress.

Comment by Kuijun Cui [ 2022-01-17 ]

@Gregory Dorman, thank you for the sharing progress.

Comment by alexey vorovich (Inactive) [ 2022-08-03 ]

dleeyh Please retest JOINs with 22.08

Comment by Daniel Lee (Inactive) [ 2022-08-03 ]

Build tested: 22.08-1 (#5138)

Query 5 has been fixed in the release. The other 3 still have the same issues.

Comment by Leonid Fedorov [ 2022-12-28 ]

Now we have MVP version of rewrite.
Initial query 19
Rewritten Query 19

We tried to turn in on, and got original query 19 working, with the same result, that got from manual SQL rewrite

Now we need do this action points

  • Write some amount of unit tests, covering different queries, and prove, that rewrite doesn't break anything
  • Write proper node comparator, now rewrite works only with full string equality of conditions
  • Research some approach for NOT node, and implement in rewrite
  • Write more tests
  • Get rid of recursion: MVP uses recursive calls, it's not very good for production
  • Write more tests
  • Check rewrite with inner queries
  • Write more tests
  • Check MTR and regressions can be passed
  • Merge and release it
Comment by Leonid Fedorov [ 2023-02-03 ]

for now we have:

  • nice amount of unit tests
  • non recursive algorithm that matches TCPH query and much more patterns
  • non leaking memory management, covered by ASAN
  • draft version of node equivalence comparator, that works but needs performance improvements
  • MTR and regression tests passed

what we have to do

  • more unittests
  • regression tests for TCP-H
  • support of NOT nodes
  • performant comparator
  • review of full logical operators set like XOR
  • review of full set of operators like IN, and LIKE
  • more tests
  • code review
  • feature branch testing
Comment by Andrey Piskunov (Inactive) [ 2023-02-15 ]

As a result, our rewrite transforms this query:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#23'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 2 and l_quantity <= 2 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#24'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 20 and l_quantity <= 20 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = 'Brand#44'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 26 and l_quantity <= 26 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);

into this query:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
(
		p_partkey = l_partkey
                and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
)
and
(
	(
		p_brand = 'Brand#23'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= 2 and l_quantity <= 2 + 10
		and p_size between 1 and 5
	)
	or
	(
		p_brand = 'Brand#24'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= 20 and l_quantity <= 20 + 10
		and p_size between 1 and 10
	)
	or
	(
		p_brand = 'Brand#44'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= 26 and l_quantity <= 26 + 10
		and p_size between 1 and 15
	)
);

with the same answer of

revenue
3319053.2591

Comment by Daniel Lee (Inactive) [ 2023-02-28 ]

Build verified:

engine: b6808c97f1baa9ebc59c7b75dca039c3d3046820
server: d296529db9a1e31eab398b5c65fc72e33d0d6a8a
buildNo: 6960

Executed the original dbt3 (tpch) query #19 on a 1gb dbt3 database. Results from both InnoDB and ColumnStore matched. Also executed the modified query that has been supported by ColumnStore before this change. Result also matched.

[rocky8:root@rocky8~]# mariadb inn1 < /data/qa/shares/19.sql 
revenue
4024954.8299
[rocky8:root@rocky8~]# mariadb tpch1 < /data/qa/shares/19.sql 
revenue
4024954.8299
[rocky8:root@rocky8~]# mariadb tpch1 < /data/qa/shares/19_modified.sql 
revenue
4024954.8299

Generated at Thu Feb 08 02:51:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.