[MCOL-1182] Cross-engine join query failing (single node) Created: 2018-01-26  Updated: 2018-04-16  Resolved: 2018-04-16

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 1.0.12, 1.1.2
Fix Version/s: 1.1.4

Type: Bug Priority: Major
Reporter: Andrew Simkovsky Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 1
Labels: None
Environment:

AWS server, m4.xlarge. Ubuntu 16.04 launched from AMI ami-2581aa40.
100gb EBS as root volume with 5000 provisioned iops.


Attachments: File MCOL1182-core.ExeMgr.18965.gz     Text File log_tails.txt    
Sprint: 2018-08

 Description   

Hello, I'm having trouble with cross-engine join queries. I have a single node with MariaDB column store on it. This one machine has both UM and PM. Followed the very simple install process here: https://mariadb.com/kb/en/library/preparing-for-columnstore-installation-11x/

The install is extremely vanilla. No configuration changes after install.

I loaded some sample data. Lookup tables are innodb, but a large table with bulk data is "columnstore" engine. The intent is to query this columnstore table and join with an innodb lookup table to translate values.

The tables in question are:

 CREATE TABLE `keymap` (
  `keymap_id` int(11) NOT NULL,
  `keymap_group_id` int(11) NOT NULL,
  `key1` varchar(100) NOT NULL,
  `value` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`keymap_id`),
  KEY `Refkeymap_group29` (`keymap_group_id`),
  CONSTRAINT `Refkeymap_group29` FOREIGN KEY (`keymap_group_id`) REFERENCES `keymap_group` (`keymap_group_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
 
CREATE TABLE `study_response_1` (
  `record_id` int(11) NOT NULL,
  `col1` varchar(1000) DEFAULT NULL,
  `col2` varchar(1000) DEFAULT NULL,
  `col3` varchar(1000) DEFAULT NULL,
  `col4` varchar(1000) DEFAULT NULL
) ENGINE=Columnstore DEFAULT CHARSET=latin1'

The keymap table has about 5000 rows. The study_response_1 table has about 7.7 million rows.

If I query only the study_response_1 table by itself, it works just fine:

select col4, count(*) from study_response_1 group by col4;

But if I join keymap and study_response_1, the ExeMgr process dies and forces restart. I get this error message:

select k.value as make, count(*) 
from study_response_1 s 
left join keymap k on k.key1 = s.col25 and k.keymap_group_id = 1 
group by k.value;

ERROR 1815 (HY000): Internal error: Lost connection to ExeMgr. Please contact your administrator

I've tried this on both 1.0.12 and 1.1.2, both fresh installs, and the same thing happens. I used mariadb-columnstore-1.0.12-1-xenial.x86_64.deb.tar.gz and mariadb-columnstore-1.1.2-1-xenial.x86_64.deb.tar.gz.

The only difference between the versions is that version 1.0.12 required CrossEngineSupport to be manually set up in Columnstore.xml, but 1.1.2 already had it set up and working. That section of Columnstore.xml is here (default from 1.1.2, I didn't change anything):

        <CrossEngineSupport>
                <Host>127.0.0.1</Host>
                <Port>3306</Port>
                <User>root</User>
                <Password/>
        </CrossEngineSupport>

If I convert the keymap table to ENGINE=columnstore, then the query works just fine. So its clearly only an issue when joining two tables with the different storage engines.

I'm attaching tails of the various logs from /var/log/mariadb/columnstore.

As soon as I execute the query, I see this in warning.log:

Jan 25 21:40:10 s_columnstore@ip-172-31-47-54 messagequeue[8766]: 10.738349 |0|0|0| W 31 CAL0000: MessageQueueClient::write: error writing 2741 bytes to IOSocket: sd: 57 inet: 127.0.0.1 port: 8601. Socket error was InetStreamSocket::write error: Broken pipe -- write from InetStreamSocket: sd: 57 inet: 127.0.0.1 port: 8601

I then see this in debug.log:

Jan 25 21:40:10 s_columnstore@ip-172-31-47-54 ExeMgr[10765]: 10.775769 |4|0|0| D 16 CAL0041: Start SQL statement: select k.value as make, count(*)  from study_response_1 s  left join keymap k on k.key = s.col25 and k.keymap_group_id = 1  group by k.value; |mtab|
Jan 25 21:40:10 s_columnstore@ip-172-31-47-54 ProcessMonitor[2442]: 10.894383 |0|0|0| D 18 CAL0000: STOPPING Process: ExeMgr         
Jan 25 21:40:10 s_columnstore@ip-172-31-47-54 ProcessMonitor[2442]: 10.894468 |0|0|0| D 18 CAL0000: StatusUpdate of Process ExeMgr State = 1 PID = 0         
Jan 25 21:40:10 s_columnstore@ip-172-31-47-54 ProcMon[2442]: 10.894453 |0|0|0| D 18 CAL0000: Send SET Alarm ID 13 on device ExeMgr         
Jan 25 21:40:10 s_columnstore@ip-172-31-47-54 ProcessMonitor[2442]: 10.897010 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/ExeMgr State = AUTO_OFFLINE         
Jan 25 21:40:10 s_columnstore@ip-172-31-47-54 ProcessMonitor[2442]: 10.897247 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/ExeMgr State = AUTO_OFFLINE PID = 0  

All the rest of the log output seems to be related ExeMgr restarting itself.

Is this a bug or a misconfiguration? Any known way to work around this other than converting all my tables to columnstore?

I can provide sample data if needed for reproducing the problem.



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2018-01-27 ]

Unfortunately it looks like ExeMgr crashed when you executed that query. Can you please turn on core files and send us the core file generated?

You can turn on core files by following these instructions:

https://mariadb.com/kb/en/library/system-troubleshooting-mariadb-columnstore/#enabledisable-core-files

If you wish to upload using our FTP server the details are:

ftp://ftp.mariadb.com/uploads

Comment by Andrew Simkovsky [ 2018-01-29 ]

Thanks for the help. I've generated the core file and attached it here. File is named MCOL1182-core.ExeMgr.18965.gz

Comment by Andrew Hutchings (Inactive) [ 2018-01-29 ]

Many thanks. We will analyse this as soon as possible.

Comment by Andrew Simkovsky [ 2018-02-01 ]

Hello, any update?

Comment by Andrew Simkovsky [ 2018-02-01 ]

Hello, thank you for confirming the bug. I see that its due to be fixed in 1.1.4. Do you know when that is targeted for release?

Comment by Andrew Hutchings (Inactive) [ 2018-02-01 ]

Hi,
Yes, sorry. I diagnosed the core file when uploaded but hadn't finished the triage. The core file helped figure out where the crash is happening and it is reproducible.

Comment by Andrew Hutchings (Inactive) [ 2018-02-01 ]

We are releasing 1.1.3 first before we schedule 1.1.4. Unfortunately it is too late to fix this for 1.1.3. We typically do a 1.1 release every 1-2 months.

Comment by Andrew Simkovsky [ 2018-02-01 ]

Ok, understood. Is there any chance that a fix would be backported to 1.1.2. Or is there a simple workaround that does not involve converting all tables to the columnstore engine?

Comment by Andrew Hutchings (Inactive) [ 2018-02-24 ]

Sorry, completely missed that last comment until now.

The last digit in out releases is an incremental bugfix version, we would not re-release a bugfix version. We instead release incremental bugfix releases. We may backport to 1.0 if there is a need for it.

I'm not yet aware of a workaround beyond the one you mentioned.

Comment by Sergio Paternoster [ 2018-03-22 ]

Could this be a possible workaround until fix?

select k.value as make, count
from study_response_1 s
left join keymap k on (k.key1 = s.col25 and k.keymap_group_id = (case when k.key1 = s.col25 then 1 else 0 end))
group by k.value;

Comment by Andrew Hutchings (Inactive) [ 2018-04-09 ]

Issue appears to be doOuterJoinOnFilter() executes "delete c" on the filter when making it a nullTree (which doesn't appear to be needed) and the filter is then used for the cross engine join.

Comment by Andrew Hutchings (Inactive) [ 2018-04-09 ]

For QA:

CREATE TABLE `keymap` (
  `keymap_id` int(11) NOT NULL,
  `keymap_group_id` int(11) NOT NULL,
  `key1` varchar(100) NOT NULL,
  `value` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`keymap_id`),
  KEY `Refkeymap_group29` (`keymap_group_id`)
) ENGINE=InnoDB;
 
CREATE TABLE `study_response_1` (
  `record_id` int(11) NOT NULL,
  `col1` varchar(1000) DEFAULT NULL,
  `col2` varchar(1000) DEFAULT NULL,
  `col3` varchar(1000) DEFAULT NULL,
  `col4` varchar(1000) DEFAULT NULL
) ENGINE=Columnstore;
 
select k.value as make from study_response_1 s  left join keymap k on k.key1 = s.col2 and k.keymap_group_id = 5;

Will crash ExeMgr before patch, empty result after.

Comment by Andrew Hutchings (Inactive) [ 2018-04-09 ]

Fix not perfect yet apparently...

Comment by Andrew Hutchings (Inactive) [ 2018-04-10 ]

New patch is much more stable

Comment by Daniel Lee (Inactive) [ 2018-04-16 ]

Build 1.1.4-1 source

/root/columnstore/mariadb-columnstore-server
commit 5199dd1a096fd3457e8fc0508bf5fb24cedec435
Merge: fce3c5e e554e04
Author: David.Hall <david.hall@mariadb.com>
Date: Wed Apr 11 11:04:46 2018 -0500

Merge pull request #108 from mariadb-corporation/MCOL-1331

MCOL-1331 Fix CASE1.DM.sql

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
[root@localhost mariadb-columnstore-engine]# git show
commit ae04b8a6877c87f3ed3566f2bf721bf285ca625f
Merge: 2ab632c dbcbd6c
Author: david hill <david.hill@mariadb.com>
Date: Tue Apr 10 10:55:56 2018 -0400

Merge pull request #438 from mariadb-corporation/MCOL-1323

MCOL-1323 cpimport Splitter has incorrect SIGPIPE mapping

diff --cc writeengine/splitter/we_splitterapp.cpp
index f52f362,0077ebd..402d2b0
mode 100755,100644..100755
— a/writeengine/splitter/we_splitterapp.cpp
+++ b/writeengine/splitter/we_splitterapp.cpp

Reproduced the issue in 1.1.2-1 with only one try
Reproduced the issue in 1.1.3-1 after few tries
Verified the fix in 1.1.4-1 by repeating the same query.

Generated at Thu Feb 08 02:26:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.