Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4530

TPC-H query 19: - semi joins / 'lineitem' and 'part' are not joined.

Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Fixed
    • None
    • 23.02.1
    • None
    • None

    Description

      Query 19: IDB-1000: 'lineitem' and 'part' are not joined.

      select
      	sum(l_extendedprice* (1 - l_discount)) as revenue
      from
      	lineitem,
      	part
      where
      	(
      		p_partkey = l_partkey
      		and p_brand = 'Brand#23'
      		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
      		and l_quantity >= 2 and l_quantity <= 2 + 10
      		and p_size between 1 and 5
      		and l_shipmode in ('AIR', 'AIR REG')
      		and l_shipinstruct = 'DELIVER IN PERSON'
      	)
      	or
      	(
      		p_partkey = l_partkey
      		and p_brand = 'Brand#24'
      		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
      		and l_quantity >= 20 and l_quantity <= 20 + 10
      		and p_size between 1 and 10
      		and l_shipmode in ('AIR', 'AIR REG')
      		and l_shipinstruct = 'DELIVER IN PERSON'
      	)
      	or
      	(
      		p_partkey = l_partkey
      		and p_brand = 'Brand#44'
      		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
      		and l_quantity >= 26 and l_quantity <= 26 + 10
      		and p_size between 1 and 15
      		and l_shipmode in ('AIR', 'AIR REG')
      		and l_shipinstruct = 'DELIVER IN PERSON'
      	);
      

      Attachments

        Issue Links

          Activity

            Build tested: 22.08-1 (#5138)

            Query 5 has been fixed in the release. The other 3 still have the same issues.

            dleeyh Daniel Lee (Inactive) added a comment - Build tested: 22.08-1 (#5138) Query 5 has been fixed in the release. The other 3 still have the same issues.

            Now we have MVP version of rewrite.
            Initial query 19
            Rewritten Query 19

            We tried to turn in on, and got original query 19 working, with the same result, that got from manual SQL rewrite

            Now we need do this action points

            • Write some amount of unit tests, covering different queries, and prove, that rewrite doesn't break anything
            • Write proper node comparator, now rewrite works only with full string equality of conditions
            • Research some approach for NOT node, and implement in rewrite
            • Write more tests
            • Get rid of recursion: MVP uses recursive calls, it's not very good for production
            • Write more tests
            • Check rewrite with inner queries
            • Write more tests
            • Check MTR and regressions can be passed
            • Merge and release it
            leonid.fedorov Leonid Fedorov added a comment - Now we have MVP version of rewrite. Initial query 19 Rewritten Query 19 We tried to turn in on, and got original query 19 working, with the same result, that got from manual SQL rewrite Now we need do this action points Write some amount of unit tests, covering different queries, and prove, that rewrite doesn't break anything Write proper node comparator, now rewrite works only with full string equality of conditions Research some approach for NOT node, and implement in rewrite Write more tests Get rid of recursion: MVP uses recursive calls, it's not very good for production Write more tests Check rewrite with inner queries Write more tests Check MTR and regressions can be passed Merge and release it
            leonid.fedorov Leonid Fedorov added a comment - - edited

            for now we have:

            • nice amount of unit tests
            • non recursive algorithm that matches TCPH query and much more patterns
            • non leaking memory management, covered by ASAN
            • draft version of node equivalence comparator, that works but needs performance improvements
            • MTR and regression tests passed

            what we have to do

            • more unittests
            • regression tests for TCP-H
            • support of NOT nodes
            • performant comparator
            • review of full logical operators set like XOR
            • review of full set of operators like IN, and LIKE
            • more tests
            • code review
            • feature branch testing
            leonid.fedorov Leonid Fedorov added a comment - - edited for now we have: nice amount of unit tests non recursive algorithm that matches TCPH query and much more patterns non leaking memory management, covered by ASAN draft version of node equivalence comparator, that works but needs performance improvements MTR and regression tests passed what we have to do more unittests regression tests for TCP-H support of NOT nodes performant comparator review of full logical operators set like XOR review of full set of operators like IN, and LIKE more tests code review feature branch testing
            Andrey Andrey Piskunov (Inactive) added a comment - - edited

            As a result, our rewrite transforms this query:

            select
            	sum(l_extendedprice* (1 - l_discount)) as revenue
            from
            	lineitem,
            	part
            where
            	(
            		p_partkey = l_partkey
            		and p_brand = 'Brand#23'
            		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
            		and l_quantity >= 2 and l_quantity <= 2 + 10
            		and p_size between 1 and 5
            		and l_shipmode in ('AIR', 'AIR REG')
            		and l_shipinstruct = 'DELIVER IN PERSON'
            	)
            	or
            	(
            		p_partkey = l_partkey
            		and p_brand = 'Brand#24'
            		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
            		and l_quantity >= 20 and l_quantity <= 20 + 10
            		and p_size between 1 and 10
            		and l_shipmode in ('AIR', 'AIR REG')
            		and l_shipinstruct = 'DELIVER IN PERSON'
            	)
            	or
            	(
            		p_partkey = l_partkey
            		and p_brand = 'Brand#44'
            		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
            		and l_quantity >= 26 and l_quantity <= 26 + 10
            		and p_size between 1 and 15
            		and l_shipmode in ('AIR', 'AIR REG')
            		and l_shipinstruct = 'DELIVER IN PERSON'
            	);
            

            into this query:

            select
            	sum(l_extendedprice* (1 - l_discount)) as revenue
            from
            	lineitem,
            	part
            where
            (
            		p_partkey = l_partkey
                            and l_shipmode in ('AIR', 'AIR REG')
            		and l_shipinstruct = 'DELIVER IN PERSON'
            )
            and
            (
            	(
            		p_brand = 'Brand#23'
            		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
            		and l_quantity >= 2 and l_quantity <= 2 + 10
            		and p_size between 1 and 5
            	)
            	or
            	(
            		p_brand = 'Brand#24'
            		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
            		and l_quantity >= 20 and l_quantity <= 20 + 10
            		and p_size between 1 and 10
            	)
            	or
            	(
            		p_brand = 'Brand#44'
            		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
            		and l_quantity >= 26 and l_quantity <= 26 + 10
            		and p_size between 1 and 15
            	)
            );
            

            with the same answer of

            revenue
            3319053.2591
            

            Andrey Andrey Piskunov (Inactive) added a comment - - edited As a result, our rewrite transforms this query: select sum (l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#23' and p_container in ( 'SM CASE' , 'SM BOX' , 'SM PACK' , 'SM PKG' ) and l_quantity >= 2 and l_quantity <= 2 + 10 and p_size between 1 and 5 and l_shipmode in ( 'AIR' , 'AIR REG' ) and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#24' and p_container in ( 'MED BAG' , 'MED BOX' , 'MED PKG' , 'MED PACK' ) and l_quantity >= 20 and l_quantity <= 20 + 10 and p_size between 1 and 10 and l_shipmode in ( 'AIR' , 'AIR REG' ) and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#44' and p_container in ( 'LG CASE' , 'LG BOX' , 'LG PACK' , 'LG PKG' ) and l_quantity >= 26 and l_quantity <= 26 + 10 and p_size between 1 and 15 and l_shipmode in ( 'AIR' , 'AIR REG' ) and l_shipinstruct = 'DELIVER IN PERSON' ); into this query: select sum (l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and l_shipmode in ( 'AIR' , 'AIR REG' ) and l_shipinstruct = 'DELIVER IN PERSON' ) and ( ( p_brand = 'Brand#23' and p_container in ( 'SM CASE' , 'SM BOX' , 'SM PACK' , 'SM PKG' ) and l_quantity >= 2 and l_quantity <= 2 + 10 and p_size between 1 and 5 ) or ( p_brand = 'Brand#24' and p_container in ( 'MED BAG' , 'MED BOX' , 'MED PKG' , 'MED PACK' ) and l_quantity >= 20 and l_quantity <= 20 + 10 and p_size between 1 and 10 ) or ( p_brand = 'Brand#44' and p_container in ( 'LG CASE' , 'LG BOX' , 'LG PACK' , 'LG PKG' ) and l_quantity >= 26 and l_quantity <= 26 + 10 and p_size between 1 and 15 ) ); with the same answer of revenue 3319053.2591

            Build verified:

            engine: b6808c97f1baa9ebc59c7b75dca039c3d3046820
            server: d296529db9a1e31eab398b5c65fc72e33d0d6a8a
            buildNo: 6960

            Executed the original dbt3 (tpch) query #19 on a 1gb dbt3 database. Results from both InnoDB and ColumnStore matched. Also executed the modified query that has been supported by ColumnStore before this change. Result also matched.

            [rocky8:root@rocky8~]# mariadb inn1 < /data/qa/shares/19.sql 
            revenue
            4024954.8299
            [rocky8:root@rocky8~]# mariadb tpch1 < /data/qa/shares/19.sql 
            revenue
            4024954.8299
            [rocky8:root@rocky8~]# mariadb tpch1 < /data/qa/shares/19_modified.sql 
            revenue
            4024954.8299
            

            dleeyh Daniel Lee (Inactive) added a comment - Build verified: engine: b6808c97f1baa9ebc59c7b75dca039c3d3046820 server: d296529db9a1e31eab398b5c65fc72e33d0d6a8a buildNo: 6960 Executed the original dbt3 (tpch) query #19 on a 1gb dbt3 database. Results from both InnoDB and ColumnStore matched. Also executed the modified query that has been supported by ColumnStore before this change. Result also matched. [rocky8:root@rocky8~]# mariadb inn1 < /data/qa/shares/19.sql revenue 4024954.8299 [rocky8:root@rocky8~]# mariadb tpch1 < /data/qa/shares/19.sql revenue 4024954.8299 [rocky8:root@rocky8~]# mariadb tpch1 < /data/qa/shares/19_modified.sql revenue 4024954.8299

            People

              leonid.fedorov Leonid Fedorov
              toddstoffel Todd Stoffel (Inactive)
              Daniel Lee Daniel Lee (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.