Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-963

self join cte queries from tpcds alternately fail with parsing error and succeed but with incorrect results

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 1.1.0
    • 1.1.1, 1.1.2
    • None
    • None
    • 2017-21, 2017-22, 2017-23

    Description

      query47 and query57 in tpcds both are complex queries using 2 level CTE, self join, and window functions. For example query47:

      #QUERY_47
      with v1 as(
       select i_category, i_brand,
              s_store_name, s_company_name,
              d_year, d_moy,
              sum(ss_sales_price) sum_sales,
              avg(sum(ss_sales_price)) over
                (partition by i_category, i_brand,
                           s_store_name, s_company_name, d_year)
                avg_monthly_sales,
              rank() over
                (partition by i_category, i_brand,
                           s_store_name, s_company_name
                 order by d_year, d_moy) rn
       from item, store_sales, date_dim, store
       where ss_item_sk = i_item_sk and
             ss_sold_date_sk = d_date_sk and
             ss_store_sk = s_store_sk and
             (
               d_year = 2000 or
               ( d_year = 2000-1 and d_moy =12) or
               ( d_year = 2000+1 and d_moy =1)
             )
       group by i_category, i_brand,
                s_store_name, s_company_name,
                d_year, d_moy),
       v2 as(
       select v1_base.i_category
              ,v1_base.d_year, v1_base.d_moy
              ,v1_base.avg_monthly_sales
              ,v1_base.sum_sales, v1_lag.sum_sales psum, v1_lead.sum_sales nsum
       from v1 v1_base, v1 v1_lag, v1 v1_lead
       where v1_base.i_category = v1_lag.i_category and
             v1_base.i_category = v1_lead.i_category and
             v1_base.i_brand = v1_lag.i_brand and
             v1_base.i_brand = v1_lead.i_brand and
             v1_base.s_store_name = v1_lag.s_store_name and
             v1_base.s_store_name = v1_lead.s_store_name and
             v1_base.s_company_name = v1_lag.s_company_name and
             v1_base.s_company_name = v1_lead.s_company_name and
             v1_base.rn = v1_lag.rn + 1 and
             v1_base.rn = v1_lead.rn - 1)
        select  *
       from v2
       where  d_year = 2000 and    
              avg_monthly_sales > 0 and
              case when avg_monthly_sales > 0 then abs(sum_sales - avg_monthly_sales) / avg_monthly_sales else null end > 0.1
       order by sum_sales - avg_monthly_sales, 3
       limit 100;
      

      will fail in one execution with:
      ERROR 1146 (42S02) at line 2 in file: 'query47.sql': Table 'tpcds_1.v1' doesn't exist

      but succeed the next and so on. I tried to come up with a simpler example wasn't able to so will need more debugging to see what is going on.

      Attachments

        Issue Links

          Activity

            dthompson David Thompson (Inactive) created issue -
            dthompson David Thompson (Inactive) made changes -
            Field Original Value New Value
            Summary self join cte queries from tpcds alternately fail with parsing error and succeed self join cte queries from tpcds alternately fail with parsing error and succeed but with incorrect results
            dthompson David Thompson (Inactive) made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            dthompson David Thompson (Inactive) made changes -
            Fix Version/s 1.1.1 [ 22605 ]
            LinuxJedi Andrew Hutchings (Inactive) made changes -
            Assignee David Hall [ david.hall ]
            dthompson David Thompson (Inactive) made changes -
            Sprint 2017-21 [ 199 ]
            dthompson David Thompson (Inactive) made changes -
            Rank Ranked lower
            David.Hall David Hall (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            dthompson David Thompson (Inactive) made changes -
            Sprint 2017-21 [ 199 ] 2017-21, 2017-22 [ 199, 201 ]

            Confirm that the current band aid work fors query 78 and will move to MCOL-964 as the future version of this.

            dthompson David Thompson (Inactive) added a comment - Confirm that the current band aid work fors query 78 and will move to MCOL-964 as the future version of this.
            dthompson David Thompson (Inactive) made changes -

            There are two issues here, first, the oscillating between an answer and no answer. This is caused by code in ha_calpont_execplan.cpp that notices the function calls in the order by clause and tries to re-write the query, then send it back thru the parser, but that fails and gives the error. The next time it tries, the execution plan hasn't been cleaned up (not good), so it uses it. That execution plan has already been deemed not correct, but gets run anyway, thus the incorrect results. Then the execution plan is cleaned up. Rinse and Repeat.

            Once the execution plan cleanup is corrected, the retry logic fails repeatedly into an infinite loop. Also not good.

            So, for this bug, I created a counter that stops the retry at 3 attempts, then switches to operating mode 0 if operating mode 2 is on, else throws an error. However, this query under operating mode 0 takes about 18 hours on my machine – it only takes 35 seconds on a reference MariaDB 10.2. The correct results are returned.

            The SQL that is causing the problem is the order by sum_sales - avg_monthly_sales. Using one or the other of the columns does not cause a problem but putting them into an expression in the order by confuses the engine.

            MCOL-964 is being left for further fixing of the problem

            David.Hall David Hall (Inactive) added a comment - There are two issues here, first, the oscillating between an answer and no answer. This is caused by code in ha_calpont_execplan.cpp that notices the function calls in the order by clause and tries to re-write the query, then send it back thru the parser, but that fails and gives the error. The next time it tries, the execution plan hasn't been cleaned up (not good), so it uses it. That execution plan has already been deemed not correct, but gets run anyway, thus the incorrect results. Then the execution plan is cleaned up. Rinse and Repeat. Once the execution plan cleanup is corrected, the retry logic fails repeatedly into an infinite loop. Also not good. So, for this bug, I created a counter that stops the retry at 3 attempts, then switches to operating mode 0 if operating mode 2 is on, else throws an error. However, this query under operating mode 0 takes about 18 hours on my machine – it only takes 35 seconds on a reference MariaDB 10.2. The correct results are returned. The SQL that is causing the problem is the order by sum_sales - avg_monthly_sales. Using one or the other of the columns does not cause a problem but putting them into an expression in the order by confuses the engine. MCOL-964 is being left for further fixing of the problem
            David.Hall David Hall (Inactive) made changes -

            Pull requests:
            Engine: #308
            Server: #74

            David.Hall David Hall (Inactive) added a comment - Pull requests: Engine: #308 Server: #74
            David.Hall David Hall (Inactive) made changes -
            Assignee David Hall [ david.hall ] Andrew Hutchings [ linuxjedi ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            LinuxJedi Andrew Hutchings (Inactive) made changes -
            Status In Review [ 10002 ] In Testing [ 10301 ]
            LinuxJedi Andrew Hutchings (Inactive) made changes -
            Assignee Andrew Hutchings [ linuxjedi ] Daniel Lee [ dleeyh ]

            Build tested: Github source 1.1.1-1

            Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
            Date: Tue Oct 31 15:59:32 2017 +0000

            Merge pull request #74 from mariadb-corporation/MCOL-963

            MCOL-963 Do REDO_PHASE1 no more than 3 times, then turn of vtable mod…

            /root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
            commit 5bff4c0f133dcd10d4a34b7fd06dd141993bde25
            Merge: ed018e8 21e0317
            Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
            Date: Tue Oct 31 11:14:56 2017 +0000

            Merge pull request #308 from mariadb-corporation/MCOL-963

            Mcol 963

            Created and loaded tpcds database and executed test case. Repeating the query and now consistently getting this error message:

            ERROR 1815 (HY000): Internal error: Query cannot be processed using operational mode 1 (vtable mode)

            Is this the new expected behavior? Show we say it is an internal error? Should we say this query is not supported in mode 1 instead?

            dleeyh Daniel Lee (Inactive) added a comment - Build tested: Github source 1.1.1-1 Author: Andrew Hutchings <andrew@linuxjedi.co.uk> Date: Tue Oct 31 15:59:32 2017 +0000 Merge pull request #74 from mariadb-corporation/ MCOL-963 MCOL-963 Do REDO_PHASE1 no more than 3 times, then turn of vtable mod… /root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine commit 5bff4c0f133dcd10d4a34b7fd06dd141993bde25 Merge: ed018e8 21e0317 Author: Andrew Hutchings <andrew@linuxjedi.co.uk> Date: Tue Oct 31 11:14:56 2017 +0000 Merge pull request #308 from mariadb-corporation/ MCOL-963 Mcol 963 Created and loaded tpcds database and executed test case. Repeating the query and now consistently getting this error message: ERROR 1815 (HY000): Internal error: Query cannot be processed using operational mode 1 (vtable mode) Is this the new expected behavior? Show we say it is an internal error? Should we say this query is not supported in mode 1 instead?

            It has been decided the error messages is good for now.

            dleeyh Daniel Lee (Inactive) added a comment - It has been decided the error messages is good for now.
            dleeyh Daniel Lee (Inactive) made changes -
            Resolution Fixed [ 1 ]
            Status In Testing [ 10301 ] Closed [ 6 ]

            The fix broke some other queries.

            David.Hall David Hall (Inactive) added a comment - The fix broke some other queries.
            David.Hall David Hall (Inactive) made changes -
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            David.Hall David Hall (Inactive) made changes -
            Assignee Daniel Lee [ dleeyh ] David Hall [ david.hall ]
            David.Hall David Hall (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]

            Fix for the regressions introduced.

            David.Hall David Hall (Inactive) added a comment - Fix for the regressions introduced.
            David.Hall David Hall (Inactive) made changes -
            Assignee David Hall [ david.hall ] Andrew Hutchings [ linuxjedi ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            LinuxJedi Andrew Hutchings (Inactive) made changes -
            Sprint 2017-21, 2017-22 [ 199, 201 ] 2017-21, 2017-22, 2017-23 [ 199, 201, 207 ]
            LinuxJedi Andrew Hutchings (Inactive) made changes -
            Status In Review [ 10002 ] In Testing [ 10301 ]
            LinuxJedi Andrew Hutchings (Inactive) made changes -
            Assignee Andrew Hutchings [ linuxjedi ] Daniel Lee [ dleeyh ]
            dthompson David Thompson (Inactive) made changes -
            Fix Version/s 1.1.2 [ 22631 ]
            dthompson David Thompson (Inactive) made changes -
            Resolution Fixed [ 1 ]
            Status In Testing [ 10301 ] Closed [ 6 ]

            People

              dleeyh Daniel Lee (Inactive)
              dthompson David Thompson (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.