Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.2.13
-
None
-
centos7
Description
CONNECT engine JDBC produces wrong query results when 3 remote tables are joined:
- All tables have a "remote index" added on their primary key column (id) added via ALTER TABLE localConnectTable ADD INDEX (id)
- All tables are declared with Memory=0 and Scrollable=yes and BLOCK_SIZE=1000
- The remote tables have 50,000 to 500,000 records each.
The following query takes forever or times out (essentially fails):
SELECT `ftl_orders`.`id`, `ftl_orders`.`order_id`, `ftl_orders`.`created`, `ftl_order_lines`.`order_id`, `ftl_order_lines`.`quantity`, `ftl_order_lines`.`item_id` |
FROM `ftl_orders` |
JOIN `ftl_order_lines` ON ( `ftl_order_lines`.`order_id` = `ftl_orders`.`id` ) |
WHERE ( ( `ftl_orders`.`created` > '2018-03-01' ) AND ( TRUE ) ) |
LIMIT 0,10
|
The following query runs in 15 seconds and produced the correct results (only difference is LEFT JOIN vs JOIN... the one that works is the LEFT join below):
SELECT `ftl_orders`.`id`, `ftl_orders`.`order_id`, `ftl_orders`.`created`, `ftl_order_lines`.`order_id`, `ftl_order_lines`.`quantity`, `ftl_order_lines`.`item_id` |
FROM `ftl_orders` |
LEFT JOIN `ftl_order_lines` ON ( `ftl_order_lines`.`order_id` = `ftl_orders`.`id` ) |
WHERE ( ( `ftl_orders`.`created` > '2018-03-01' ) AND ( TRUE ) ) |
LIMIT 0,10
|
RESULTS:
id order_id created order_id quantity item_id
|
264640 FT344314640 3/1/2018 0:11 264640 1 155914
|
264641 FT887314641 3/1/2018 0:12 264641 1 63664
|
264642 FT797314642 3/1/2018 0:20 264642 1 185649
|
264642 FT797314642 3/1/2018 0:20 264642 1 185650
|
264642 FT797314642 3/1/2018 0:20 264642 2 185654
|
264643 FT307314643 3/1/2018 0:32 264643 1 177806
|
264643 FT307314643 3/1/2018 0:32 264643 1 181239
|
264644 FT300314644 3/1/2018 0:46 264644 1 152669
|
264645 FT743314645 3/1/2018 1:11 264645 1 178042
|
264646 FT221314646 3/1/2018 1:12 264646 1 185037
|
The following query, which is the same as above but with 1 extra join, now produces INCORRECT results:
SELECT `ftl_orders`.`id`, `ftl_orders`.`order_id`, `ftl_orders`.`created`, `ftl_order_lines`.`order_id`, `ftl_order_lines`.`quantity`, `ftl_order_lines`.`item_id` |
FROM `ftl_orders` |
LEFT JOIN `ftl_order_lines` ON ( `ftl_order_lines`.`order_id` = `ftl_orders`.`id` ) |
LEFT JOIN `ftl_items` ON ( `ftl_items`.`id` = `ftl_order_lines`.`item_id` ) |
WHERE ( ( `ftl_orders`.`created` > '2018-03-01' ) AND ( TRUE ) ) |
LIMIT 0,10
|
RESULTS:
id order_id created order_id quantity item_id
|
264640 FT344314640 3/1/2018 0:11 264640 1 155914
|
264641 FT887314641 3/1/2018 0:12 264641 1 63664
|
264641 FT887314641 3/1/2018 0:12 264641 1 63664
|
264641 FT887314641 3/1/2018 0:12 264641 1 63664
|
264641 FT887314641 3/1/2018 0:12 264641 1 63664
|
264641 FT887314641 3/1/2018 0:12 264641 1 63664
|
264641 FT887314641 3/1/2018 0:12 264641 1 63664
|
264641 FT887314641 3/1/2018 0:12 264641 1 63664
|
264641 FT887314641 3/1/2018 0:12 264641 1 63664
|
264641 FT887314641 3/1/2018 0:12 264641 1 63664
|
As allways, I cannot do much about such problems without being able to reproduce them. For this, I need the complete show create table of the CONNECT tables, the table definitions of the remote tables and, if possible, the data or a sample of the implied data.
Robert, if I still have access to your host, I may alternatively use JDBC from it.
Anyway, doing joins locally on remote tables is cumbersome, especially on large tables. If all your tables are on the same remote location, it is much better to have the remote server executing such queries by defining a table on a view, for instance:
CREATE TABLE bigjoin CONNECTION='jdbc:...' SRCDEF="
SELECT orders.id, orders.order_id, orders.created,
order_lines.order_id, order_lines.quantity, order_lines.item_id
FROM orders JOIN order_lines ON (order_lines.order_id = orders.id )
WHERE orders.created > '2018-03-01' LIMIT 0,10";
The SRCDEF being a query with the proper syntax, table and column names of the remote server.
I am not sure the third query produces wrong results. The second join may return several lines for each line of the first join and the 10 limit may give the impression of wrong result. You can test that by replacing SELECT by SELECT DISTINCT.