[MCOL-515] GROUP BY in a cross engine subquery generates bad results Created: 2017-01-18  Updated: 2017-08-16  Resolved: 2017-08-16

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.0.6, 1.0.7
Fix Version/s: 1.0.11, 1.1.0

Type: Bug Priority: Major
Reporter: Andrew Hutchings (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File Columnstore_1_0_8_aria_temp_table_view_pivot_incomplete_result_set_test_case.sql    
Issue Links:
Relates
relates to MCOL-681 Inconsistent result set when optimize... Closed
Sprint: 2017-9, 2017-15, 2017-16

 Description   

The following SSB query fails with an empty set when dateinfo is a MyISAM table and the other tables are ColumnStore (which happens when (day of year + 2) mod 6 = 2)

select   lo_shipmode, c_region,  sum(lo_extendedprice), count(*) from     
   lineorder join customer on lo_custkey = c_custkey  
     where lo_orderdate between 
          (select d_start  from 
             (select d_yearmonthnum * 100 + 1 d_start  , count(*) from dateinfo 
                                where d_weekdayfl = 1 and d_year = 1998 
                                      group by 1 order by 2 desc limit 1) 
           alias1) 
           and 
           (select d_end  from 
              (select d_yearmonthnum * 100 + 31 d_end, count(*) from dateinfo 
                                where d_weekdayfl = 1 and d_year = 1998 group by 1 order by 2 desc limit 1)
           alias2) group by 1,2 order by 1,2;

Replacing the dateinfo subqueries with fixed values the query succeeds. It appears to be an issue with the CrossEngine step.



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2017-01-18 ]

To be specific it seems that the second subquery for the BETWEEN clause being a MyISAM table causes the failure. Works fine if that table is ColumnStore even if the first BETWEEN subquery is MyISAM.

Comment by Andrew Hutchings (Inactive) [ 2017-01-18 ]

Crossengine evaluates to:

SELECT d_year, d_yearmonthnum FROM dateinfo WHERE (ssb.dateinfo.d_year = 1998) AND (ssb.dateinfo.d_weekdayfl = 1)

Modifying the query to the following shows the subquery is returning 0 for a MyISAM table, but the correct 19980301 for a ColumnStore table:

select   lo_shipmode, c_region,  sum(lo_extendedprice), count(*), (select d_start  from               (select d_yearmonthnum * 100 + 1 d_start  , count(*) from dateinfo                                  where d_weekdayfl = 1 and d_year = 1998                                        group by 1 order by 2 desc limit 1)             alias1) from         lineorder join customer on lo_custkey = c_custkey group by 1,2 order by 1,2;

Comment by Andrew Hutchings (Inactive) [ 2017-01-18 ]

Simplified test:

CREATE TABLE `t1` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL
) ENGINE=InnoDB;
 
CREATE TABLE `t2` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL
) ENGINE=Columnstore;
 
insert into t1 values (1,100),(2,200),(3,300);
insert into t1 values (1,100),(2,200),(3,300),(2,200),(2,200),(3,300);
insert into t2 values (1,100),(2,200),(3,300);
insert into t2 values (1,100),(2,200),(3,300),(2,200),(2,200),(3,300);
 
select t2.a, (select dstart from (select t1.a*100 dstart, count(*) from t1 where b=200 group by 1 order by 2 limit 1) talias) from t2 group by 1;

It is the group by in the cross engine subquery that breaks.

Comment by Adam [ 2017-04-07 ]

Andrew,

I'm running into a similar issue on 1.0.6, 1.0.7, and 1.0.8 when performing a cross engine select against a view containing a group by atop innodb tables. If I use a straight_join optimizer hint I can force the optimizer to not perform an aria temp table on disk; I get the correct result set (albeit with a query execution time of 4 sec). It is only when the optimizer decides to use a temp table on disk (aria) that the performance degrades considerably (27 sec, 10 for the temp table and 17 for sending data) and the group by result set only returns a value for the view's first pivot GROUP_CONCAT(IF()); the remaining pivot columns contain an empty string ('', not null).

The same view atop innodb tables within maraidb 10.1.19, with the same explain plan using a temp table on disk returns the correct result set in under 600 ms.

As I am able to get the cross engine select to run with a group by and return the correct result set; I'd be more inclined that the underlying problem is with the use of aria temp tables and the columnstore optimizer.

Hope this helps.

Adam

Comment by Andrew Hutchings (Inactive) [ 2017-04-07 ]

Hi Adam,

It sounds as though your problem may be slightly different. In this case the problem is that data is being lost between the group by and limit processor in ExeMgr when part of a cross join. Querying the non-ColumnStore tables is working fine, it is the post-processing that is failing. The performance should not be affected. In your case using straight_join might kick ColumnStore out of vtable mode (and therefore there will be temp tables) which would be considerably slower, I'm not 100% sure as I haven't tried it.

Also worth noting that empty string and NULL are currently the same thing in ColumnStore. There is a feature request open to solve that one too.

If you are able to put together a test case so we can reproduce your issue we can evaluate whether or not it is the same bug or open a new one if needed.

Kind Regards
Andrew

Comment by Adam [ 2017-04-07 ]

Andrew,

Thank you for the prompt response. I'll try not to hijack your bug report if this is not related. With luck the test case can reproduce the result I'm seeing and is related / helpful. I've attached a sample test case file (Columnstore_1_0_8_aria_temp_table_view_pivot_incomplete_result_set_test_case.sql).

Adam

Comment by Andrew Hutchings (Inactive) [ 2017-08-04 ]

Fix for 1.0 and 1.1.

For QA:
This should stop the occasional failures of SSB tests (every 6 days when dateinfo is MyISAM). See my comment on 18th January for a simplified test case.

For Adam:
Unfortunately your issue is something else. We do make some changes the the optimizer switches during execution which may well be related. We have fixed a few things around that since 1.0.6 so if you are still having this problem please open a new ticket.

Comment by Daniel Lee (Inactive) [ 2017-08-16 ]

Builds verified: 1.0.11-1 and 1.1.0 (GitHub source)

/root/columnstore/mariadb-columnstore-server
commit 6ed33d194819aaa5f2521c888639f44546fb7ce2
Merge: 97284ea 770537e
Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
Date: Thu Aug 3 20:54:13 2017 +0100

Merge pull request #66 from mariadb-corporation/MCOL-684

MCOL-684 call fix_fields for parameters in PREPARED statements

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit d9233bde4932bd8d621888e33a38f34c181445b4
Author: David Hill <david.hill@mariadb.com>
Date: Tue Aug 15 20:41:26 2017 +0000

moved CONFIGURE option to bottom of list

diff --git a/oam/oamcpp/liboamcpp.h b/oam/oamcpp/liboamcpp.h
index 6b24a30..8c0e3ed 100644
— a/oam/oamcpp/liboamcpp.h
+++ b/oam/oamcpp/liboamcpp.h
@@ -558,7 +558,6 @@ namespace oam
HEARTBEAT_SEND,
PROCBUILDSYSTEMTABLES,
LOCALHEARTBEAT,

  • CONFIGURE,
    RECONFIGURE,
    PROCESSRESTART,
    GETSOFTWAREINFO,
    @@ -580,7 +579,8 @@ namespace oam
    MASTERDIST,
    DISABLEREP,
    PROCGLUSTERASSIGN,
  • PROCGLUSTERUNASSIGN
    + PROCGLUSTERUNASSIGN,
    + CONFIGURE
    };
Generated at Thu Feb 08 02:21:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.