[MCOL-963] self join cte queries from tpcds alternately fail with parsing error and succeed but with incorrect results Created: 2017-10-09  Updated: 2017-11-20  Resolved: 2017-11-20

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.1.0
Fix Version/s: 1.1.1, 1.1.2

Type: Bug Priority: Critical
Reporter: David Thompson (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MCOL-964 tpcds query78 alternately fails and w... Closed
Relates
relates to MCOL-964 tpcds query78 alternately fails and w... Closed
Sprint: 2017-21, 2017-22, 2017-23

 Description   

query47 and query57 in tpcds both are complex queries using 2 level CTE, self join, and window functions. For example query47:

#QUERY_47
with v1 as(
 select i_category, i_brand,
        s_store_name, s_company_name,
        d_year, d_moy,
        sum(ss_sales_price) sum_sales,
        avg(sum(ss_sales_price)) over
          (partition by i_category, i_brand,
                     s_store_name, s_company_name, d_year)
          avg_monthly_sales,
        rank() over
          (partition by i_category, i_brand,
                     s_store_name, s_company_name
           order by d_year, d_moy) rn
 from item, store_sales, date_dim, store
 where ss_item_sk = i_item_sk and
       ss_sold_date_sk = d_date_sk and
       ss_store_sk = s_store_sk and
       (
         d_year = 2000 or
         ( d_year = 2000-1 and d_moy =12) or
         ( d_year = 2000+1 and d_moy =1)
       )
 group by i_category, i_brand,
          s_store_name, s_company_name,
          d_year, d_moy),
 v2 as(
 select v1_base.i_category
        ,v1_base.d_year, v1_base.d_moy
        ,v1_base.avg_monthly_sales
        ,v1_base.sum_sales, v1_lag.sum_sales psum, v1_lead.sum_sales nsum
 from v1 v1_base, v1 v1_lag, v1 v1_lead
 where v1_base.i_category = v1_lag.i_category and
       v1_base.i_category = v1_lead.i_category and
       v1_base.i_brand = v1_lag.i_brand and
       v1_base.i_brand = v1_lead.i_brand and
       v1_base.s_store_name = v1_lag.s_store_name and
       v1_base.s_store_name = v1_lead.s_store_name and
       v1_base.s_company_name = v1_lag.s_company_name and
       v1_base.s_company_name = v1_lead.s_company_name and
       v1_base.rn = v1_lag.rn + 1 and
       v1_base.rn = v1_lead.rn - 1)
  select  *
 from v2
 where  d_year = 2000 and    
        avg_monthly_sales > 0 and
        case when avg_monthly_sales > 0 then abs(sum_sales - avg_monthly_sales) / avg_monthly_sales else null end > 0.1
 order by sum_sales - avg_monthly_sales, 3
 limit 100;

will fail in one execution with:
ERROR 1146 (42S02) at line 2 in file: 'query47.sql': Table 'tpcds_1.v1' doesn't exist

but succeed the next and so on. I tried to come up with a simpler example wasn't able to so will need more debugging to see what is going on.



 Comments   
Comment by David Thompson (Inactive) [ 2017-10-30 ]

Confirm that the current band aid work fors query 78 and will move to MCOL-964 as the future version of this.

Comment by David Hall (Inactive) [ 2017-10-30 ]

There are two issues here, first, the oscillating between an answer and no answer. This is caused by code in ha_calpont_execplan.cpp that notices the function calls in the order by clause and tries to re-write the query, then send it back thru the parser, but that fails and gives the error. The next time it tries, the execution plan hasn't been cleaned up (not good), so it uses it. That execution plan has already been deemed not correct, but gets run anyway, thus the incorrect results. Then the execution plan is cleaned up. Rinse and Repeat.

Once the execution plan cleanup is corrected, the retry logic fails repeatedly into an infinite loop. Also not good.

So, for this bug, I created a counter that stops the retry at 3 attempts, then switches to operating mode 0 if operating mode 2 is on, else throws an error. However, this query under operating mode 0 takes about 18 hours on my machine – it only takes 35 seconds on a reference MariaDB 10.2. The correct results are returned.

The SQL that is causing the problem is the order by sum_sales - avg_monthly_sales. Using one or the other of the columns does not cause a problem but putting them into an expression in the order by confuses the engine.

MCOL-964 is being left for further fixing of the problem

Comment by David Hall (Inactive) [ 2017-10-30 ]

Pull requests:
Engine: #308
Server: #74

Comment by Daniel Lee (Inactive) [ 2017-10-31 ]

Build tested: Github source 1.1.1-1

Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
Date: Tue Oct 31 15:59:32 2017 +0000

Merge pull request #74 from mariadb-corporation/MCOL-963

MCOL-963 Do REDO_PHASE1 no more than 3 times, then turn of vtable mod…

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit 5bff4c0f133dcd10d4a34b7fd06dd141993bde25
Merge: ed018e8 21e0317
Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
Date: Tue Oct 31 11:14:56 2017 +0000

Merge pull request #308 from mariadb-corporation/MCOL-963

Mcol 963

Created and loaded tpcds database and executed test case. Repeating the query and now consistently getting this error message:

ERROR 1815 (HY000): Internal error: Query cannot be processed using operational mode 1 (vtable mode)

Is this the new expected behavior? Show we say it is an internal error? Should we say this query is not supported in mode 1 instead?

Comment by Daniel Lee (Inactive) [ 2017-10-31 ]

It has been decided the error messages is good for now.

Comment by David Hall (Inactive) [ 2017-11-13 ]

The fix broke some other queries.

Comment by David Hall (Inactive) [ 2017-11-14 ]

Fix for the regressions introduced.

Generated at Thu Feb 08 02:25:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.