[MCOL-240] DBT3 query 11 returned an internal error, ExeMgr aborted Created: 2016-07-01 Updated: 2016-09-15 Resolved: 2016-09-15 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ExeMgr |
| Affects Version/s: | 1.0.1 |
| Fix Version/s: | 1.0.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Daniel Lee (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Sprint: | 1.0.3 | ||||||||
| Description |
|
Build tested: alpha 1.0.1 mscadmin> getsoft Name : mariadb-columnstore-platform Create a DBT3 (TPCH) database and load 1gb of data execute the query #11 select ERROR 1815 (HY000): Internal error: Lost connection to ExeMgr. Please contact your administrator |
| Comments |
| Comment by Daniel Lee (Inactive) [ 2016-08-30 ] | ||||||||||||
|
If I startup the MySQL client (mcsmysql), and paste the query to execute, it worked. But if I save the query in a file, such as /tmp/test.sql, then run mcsmysql mytest -vvv < /tmp/test.sql then I got: ExeMgr got restarted. I encountered this when running all 22 queries in the dbt3 test suite and I finally narrowed it down to this strange behavior. | ||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2016-08-30 ] | ||||||||||||
|
My blind theory is idb_cleanQuery() in sql/sql_parse.cc is messing up the query when stripping the newline codes. Thus causing a corrupted query which spills crap into ExeMgr causing that to crash. I could be wrong, but it looks fun to debug anyway | ||||||||||||
| Comment by Daniel Lee (Inactive) [ 2016-08-30 ] | ||||||||||||
|
The issue may not have anything to do with the stripping of the newline codes. My bad that I provided the incorrect information during the meeting today. I executed the single line query and it worked, because I executed it on a different database, which has more data <sigh> The issue I am having is on a 1gb database. Since the same query worked on a database with more data, I am running the same test on a 10gb database. I will report back later. | ||||||||||||
| Comment by Daniel Lee (Inactive) [ 2016-08-30 ] | ||||||||||||
|
The 10g query also did not work on the 10g dbt3 database. On both 1g and 10g database test runs, ExeMgr did not crash. Instead, it was restarted by ProcessMonitor. The reason still needs to be determined. | ||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2016-08-31 ] | ||||||||||||
|
No problem. It will still be an interesting one to debug. | ||||||||||||
| Comment by Daniel Lee (Inactive) [ 2016-08-31 ] | ||||||||||||
|
I did further investigation on the issue and found that it is a bug in Columnstore. After few iterations of simplifying the original query, here is a simpler queries that would cause exemgr to restart: select n_regionkey, sum(n_nationkey) from nation group by n_regionkey having sum(n_nationkey) > (select n_nationkey from nation where n_nationkey=100); The issue is the subquery return a null value. If "n_nationkey = 1" is used in the subquery, a value will be returned and the query will run. | ||||||||||||
| Comment by Daniel Lee (Inactive) [ 2016-09-02 ] | ||||||||||||
|
The subquery returned NULL because of the issue described in
| ||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2016-09-08 ] | ||||||||||||
|
Crash is a segment fault in
Caused by the dynamic_cast of this line:
| ||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2016-09-08 ] | ||||||||||||
|
Fix in https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/7 For QA I recommend tweaking the query slightly so that the subquery has an empty result, such as setting the n_name = 'PERUA' | ||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2016-09-09 ] | ||||||||||||
|
Move to Daniel for QA. | ||||||||||||
| Comment by Daniel Lee (Inactive) [ 2016-09-13 ] | ||||||||||||
|
Build tested: mscadmin> getsoft Name : mariadb-columnstore-platform select n_regionkey, sum(n_nationkey) from nation group by n_regionkey having sum(n_nationkey) > (select n_nationkey from nation where n_nationkey=1);
------------
------------ MariaDB [tpch1c]> select n_regionkey, sum(n_nationkey) from nation group by n_regionkey having sum(n_nationkey) > (select n_nationkey from nation where n_nationkey=100); | ||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2016-09-13 ] | ||||||||||||
|
Needs to be reopened due to Daniel finding a similar bug with a slightly different query. | ||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2016-09-13 ] | ||||||||||||
|
Moving back to QA. Issue is the first set of 1.0.3 RPMs don't have the fix. Daniel's test didn't actually test the fix. GDB with a non-debug build made me thing it was a different issue, but it isn't. | ||||||||||||
| Comment by Daniel Lee (Inactive) [ 2016-09-15 ] | ||||||||||||
|
mscadmin> getsoft Name : mariadb-columnstore-platform MariaDB [tpch1c]> select n_regionkey, sum(n_nationkey) from nation group by n_regionkey having sum(n_nationkey) > (select n_nationkey from nation where n_nationkey=1);
------------
------------ MariaDB [tpch1c]> select n_regionkey, sum(n_nationkey) from nation group by n_regionkey having sum(n_nationkey) > (select n_nationkey from nation where n_nationkey=100); |