[MCOL-240] DBT3 query 11 returned an internal error, ExeMgr aborted Created: 2016-07-01  Updated: 2016-09-15  Resolved: 2016-09-15

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 1.0.1
Fix Version/s: 1.0.3

Type: Bug Priority: Critical
Reporter: Daniel Lee (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MCOL-280 Beta issues Closed
Sprint: 1.0.3

 Description   

Build tested: alpha 1.0.1

mscadmin> getsoft
getsoftwareinfo Fri Jul 1 18:33:11 2016

Name : mariadb-columnstore-platform
Version : 1.0
Release : 1
Architecture: x86_64
Install Date: Fri 24 Jun 2016 07:28:00 PM UTC

Create a DBT3 (TPCH) database and load 1gb of data

execute the query #11

select
ps_partkey,
sum(ps_supplycost * ps_availqty) as value
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'PERU'
group by
ps_partkey having
sum(ps_supplycost * ps_availqty) > (
select
sum(ps_supplycost * ps_availqty) * 0.0001000000
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'PERU'
)
order by
value desc;

ERROR 1815 (HY000): Internal error: Lost connection to ExeMgr. Please contact your administrator



 Comments   
Comment by Daniel Lee (Inactive) [ 2016-08-30 ]

If I startup the MySQL client (mcsmysql), and paste the query to execute, it worked. But if I save the query in a file, such as /tmp/test.sql, then run

mcsmysql mytest -vvv < /tmp/test.sql

then I got:
ERROR 1815 (HY000) at line 1: Internal error: Lost connection to ExeMgr. Please contact your administrator

ExeMgr got restarted.

I encountered this when running all 22 queries in the dbt3 test suite and I finally narrowed it down to this strange behavior.

Comment by Andrew Hutchings (Inactive) [ 2016-08-30 ]

My blind theory is idb_cleanQuery() in sql/sql_parse.cc is messing up the query when stripping the newline codes. Thus causing a corrupted query which spills crap into ExeMgr causing that to crash. I could be wrong, but it looks fun to debug anyway

Comment by Daniel Lee (Inactive) [ 2016-08-30 ]

The issue may not have anything to do with the stripping of the newline codes. My bad that I provided the incorrect information during the meeting today. I executed the single line query and it worked, because I executed it on a different database, which has more data <sigh>

The issue I am having is on a 1gb database. Since the same query worked on a database with more data, I am running the same test on a 10gb database. I will report back later.

Comment by Daniel Lee (Inactive) [ 2016-08-30 ]

The 10g query also did not work on the 10g dbt3 database.

On both 1g and 10g database test runs, ExeMgr did not crash. Instead, it was restarted by ProcessMonitor. The reason still needs to be determined.

Comment by Andrew Hutchings (Inactive) [ 2016-08-31 ]

No problem. It will still be an interesting one to debug.

Comment by Daniel Lee (Inactive) [ 2016-08-31 ]

I did further investigation on the issue and found that it is a bug in Columnstore. After few iterations of simplifying the original query, here is a simpler queries that would cause exemgr to restart:

select n_regionkey, sum(n_nationkey) from nation group by n_regionkey having sum(n_nationkey) > (select n_nationkey from nation where n_nationkey=100);

The issue is the subquery return a null value. If "n_nationkey = 1" is used in the subquery, a value will be returned and the query will run.

Comment by Daniel Lee (Inactive) [ 2016-09-02 ]

The subquery returned NULL because of the issue described in MCOL-281. When working on the ticket, you can adjust the subquery so that a NULL result will be returned.

MCOL-281 caused DBT3 query #11 to trigger the bug reported in this ticket.

Comment by Andrew Hutchings (Inactive) [ 2016-09-08 ]

Crash is a segment fault in

void getCorrelatedFilters(ParseTree* pt, void* obj)

Caused by the dynamic_cast of this line:

SimpleFilter* sf = dynamic_cast<SimpleFilter*>(pt->data());

Comment by Andrew Hutchings (Inactive) [ 2016-09-08 ]

Fix in https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/7

For QA I recommend tweaking the query slightly so that the subquery has an empty result, such as setting the n_name = 'PERUA'

Comment by Andrew Hutchings (Inactive) [ 2016-09-09 ]

Move to Daniel for QA.

Comment by Daniel Lee (Inactive) [ 2016-09-13 ]

Build tested:

mscadmin> getsoft
getsoftwareinfo Tue Sep 13 10:02:34 2016

Name : mariadb-columnstore-platform
Version : 1.0.3
Release : 1
Architecture: x86_64
Install Date: Tue 13 Sep 2016 09:26:32 AM CDT

select n_regionkey, sum(n_nationkey) from nation group by n_regionkey having sum(n_nationkey) > (select n_nationkey from nation where n_nationkey=1);
-----------------------------+

n_regionkey sum(n_nationkey)

-----------------------------+

3 77
0 50
1 47
2 68
4 58

-----------------------------+
5 rows in set (0.16 sec)

MariaDB [tpch1c]> select n_regionkey, sum(n_nationkey) from nation group by n_regionkey having sum(n_nationkey) > (select n_nationkey from nation where n_nationkey=100);
ERROR 1815 (HY000): Internal error: Lost connection to ExeMgr. Please contact your administrator

Comment by Andrew Hutchings (Inactive) [ 2016-09-13 ]

Needs to be reopened due to Daniel finding a similar bug with a slightly different query.

Comment by Andrew Hutchings (Inactive) [ 2016-09-13 ]

Moving back to QA. Issue is the first set of 1.0.3 RPMs don't have the fix. Daniel's test didn't actually test the fix.

GDB with a non-debug build made me thing it was a different issue, but it isn't.

Comment by Daniel Lee (Inactive) [ 2016-09-15 ]

mscadmin> getsoft
getsoftwareinfo Thu Sep 15 12:06:24 2016

Name : mariadb-columnstore-platform
Version : 1.0.3
Release : 1
Architecture: x86_64
Install Date: Thu 15 Sep 2016 11:39:01 AM CDT
Group : Applications
Size : 25431329
License : Copyright (c) 2016 MariaDB Corporation Ab., all rights reserved; redistributable under the terms of the GPL, see the file COPYING for details.
Signature : (none)
Source RPM : mariadb-columnstore-1.0.3-1.src.rpm
Build Date : Thu 15 Sep 2016 09:56:11 AM CDT

MariaDB [tpch1c]> select n_regionkey, sum(n_nationkey) from nation group by n_regionkey having sum(n_nationkey) > (select n_nationkey from nation where n_nationkey=1);
-----------------------------+

n_regionkey sum(n_nationkey)

-----------------------------+

3 77
0 50
1 47
2 68
4 58

-----------------------------+
5 rows in set (0.63 sec)

MariaDB [tpch1c]> select n_regionkey, sum(n_nationkey) from nation group by n_regionkey having sum(n_nationkey) > (select n_nationkey from nation where n_nationkey=100);
Empty set (0.01 sec)

Generated at Thu Feb 08 02:19:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.