[MDEV-15463] CONNECT engine JDBC produces wrong query results Created: 2018-03-04  Updated: 2018-03-05

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - Connect
Affects Version/s: 10.2.13
Fix Version/s: 10.2

Type: Bug Priority: Major
Reporter: Robert Dyas Assignee: Olivier Bertrand
Resolution: Unresolved Votes: 0
Labels: None
Environment:

centos7



 Description   

CONNECT engine JDBC produces wrong query results when 3 remote tables are joined:

  • All tables have a "remote index" added on their primary key column (id) added via ALTER TABLE localConnectTable ADD INDEX (id)
  • All tables are declared with Memory=0 and Scrollable=yes and BLOCK_SIZE=1000
  • The remote tables have 50,000 to 500,000 records each.

The following query takes forever or times out (essentially fails):

SELECT `ftl_orders`.`id`, `ftl_orders`.`order_id`, `ftl_orders`.`created`, `ftl_order_lines`.`order_id`, `ftl_order_lines`.`quantity`, `ftl_order_lines`.`item_id` 
FROM `ftl_orders` 
JOIN `ftl_order_lines` ON ( `ftl_order_lines`.`order_id` = `ftl_orders`.`id` )  
WHERE  (  ( `ftl_orders`.`created` > '2018-03-01' )  AND  ( TRUE )  )  
LIMIT 0,10

The following query runs in 15 seconds and produced the correct results (only difference is LEFT JOIN vs JOIN... the one that works is the LEFT join below):

SELECT `ftl_orders`.`id`, `ftl_orders`.`order_id`, `ftl_orders`.`created`, `ftl_order_lines`.`order_id`, `ftl_order_lines`.`quantity`, `ftl_order_lines`.`item_id` 
FROM `ftl_orders` 
LEFT JOIN `ftl_order_lines` ON ( `ftl_order_lines`.`order_id` = `ftl_orders`.`id` )  
WHERE  (  ( `ftl_orders`.`created` > '2018-03-01' )  AND  ( TRUE )  )  
LIMIT 0,10

RESULTS:

id	order_id	created	order_id	quantity	item_id
264640	FT344314640	3/1/2018 0:11	264640	1	155914
264641	FT887314641	3/1/2018 0:12	264641	1	63664
264642	FT797314642	3/1/2018 0:20	264642	1	185649
264642	FT797314642	3/1/2018 0:20	264642	1	185650
264642	FT797314642	3/1/2018 0:20	264642	2	185654
264643	FT307314643	3/1/2018 0:32	264643	1	177806
264643	FT307314643	3/1/2018 0:32	264643	1	181239
264644	FT300314644	3/1/2018 0:46	264644	1	152669
264645	FT743314645	3/1/2018 1:11	264645	1	178042
264646	FT221314646	3/1/2018 1:12	264646	1	185037

The following query, which is the same as above but with 1 extra join, now produces INCORRECT results:

SELECT `ftl_orders`.`id`, `ftl_orders`.`order_id`, `ftl_orders`.`created`, `ftl_order_lines`.`order_id`, `ftl_order_lines`.`quantity`, `ftl_order_lines`.`item_id` 
FROM `ftl_orders` 
LEFT JOIN `ftl_order_lines` ON ( `ftl_order_lines`.`order_id` = `ftl_orders`.`id` ) 
LEFT JOIN `ftl_items` ON ( `ftl_items`.`id` = `ftl_order_lines`.`item_id` )  
WHERE  (  ( `ftl_orders`.`created` > '2018-03-01' )  AND  ( TRUE )  )  
LIMIT 0,10

RESULTS:

id	order_id	created	order_id	quantity	item_id
264640	FT344314640	3/1/2018 0:11	264640	1	155914
264641	FT887314641	3/1/2018 0:12	264641	1	63664
264641	FT887314641	3/1/2018 0:12	264641	1	63664
264641	FT887314641	3/1/2018 0:12	264641	1	63664
264641	FT887314641	3/1/2018 0:12	264641	1	63664
264641	FT887314641	3/1/2018 0:12	264641	1	63664
264641	FT887314641	3/1/2018 0:12	264641	1	63664
264641	FT887314641	3/1/2018 0:12	264641	1	63664
264641	FT887314641	3/1/2018 0:12	264641	1	63664
264641	FT887314641	3/1/2018 0:12	264641	1	63664



 Comments   
Comment by Olivier Bertrand [ 2018-03-05 ]

As allways, I cannot do much about such problems without being able to reproduce them. For this, I need the complete show create table of the CONNECT tables, the table definitions of the remote tables and, if possible, the data or a sample of the implied data.
Robert, if I still have access to your host, I may alternatively use JDBC from it.

Anyway, doing joins locally on remote tables is cumbersome, especially on large tables. If all your tables are on the same remote location, it is much better to have the remote server executing such queries by defining a table on a view, for instance:

CREATE TABLE bigjoin CONNECTION='jdbc:...' SRCDEF="
SELECT orders.id, orders.order_id, orders.created,
order_lines.order_id, order_lines.quantity, order_lines.item_id 
FROM orders JOIN order_lines ON (order_lines.order_id = orders.id )  
WHERE orders.created > '2018-03-01' LIMIT 0,10";

The SRCDEF being a query with the proper syntax, table and column names of the remote server.

I am not sure the third query produces wrong results. The second join may return several lines for each line of the first join and the 10 limit may give the impression of wrong result. You can test that by replacing SELECT by SELECT DISTINCT.

Generated at Thu Feb 08 08:21:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.