Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15463

CONNECT engine JDBC produces wrong query results

Details

    Description

      CONNECT engine JDBC produces wrong query results when 3 remote tables are joined:

      • All tables have a "remote index" added on their primary key column (id) added via ALTER TABLE localConnectTable ADD INDEX (id)
      • All tables are declared with Memory=0 and Scrollable=yes and BLOCK_SIZE=1000
      • The remote tables have 50,000 to 500,000 records each.

      The following query takes forever or times out (essentially fails):

      SELECT `ftl_orders`.`id`, `ftl_orders`.`order_id`, `ftl_orders`.`created`, `ftl_order_lines`.`order_id`, `ftl_order_lines`.`quantity`, `ftl_order_lines`.`item_id` 
      FROM `ftl_orders` 
      JOIN `ftl_order_lines` ON ( `ftl_order_lines`.`order_id` = `ftl_orders`.`id` )  
      WHERE  (  ( `ftl_orders`.`created` > '2018-03-01' )  AND  ( TRUE )  )  
      LIMIT 0,10
      

      The following query runs in 15 seconds and produced the correct results (only difference is LEFT JOIN vs JOIN... the one that works is the LEFT join below):

      SELECT `ftl_orders`.`id`, `ftl_orders`.`order_id`, `ftl_orders`.`created`, `ftl_order_lines`.`order_id`, `ftl_order_lines`.`quantity`, `ftl_order_lines`.`item_id` 
      FROM `ftl_orders` 
      LEFT JOIN `ftl_order_lines` ON ( `ftl_order_lines`.`order_id` = `ftl_orders`.`id` )  
      WHERE  (  ( `ftl_orders`.`created` > '2018-03-01' )  AND  ( TRUE )  )  
      LIMIT 0,10
      

      RESULTS:

      id	order_id	created	order_id	quantity	item_id
      264640	FT344314640	3/1/2018 0:11	264640	1	155914
      264641	FT887314641	3/1/2018 0:12	264641	1	63664
      264642	FT797314642	3/1/2018 0:20	264642	1	185649
      264642	FT797314642	3/1/2018 0:20	264642	1	185650
      264642	FT797314642	3/1/2018 0:20	264642	2	185654
      264643	FT307314643	3/1/2018 0:32	264643	1	177806
      264643	FT307314643	3/1/2018 0:32	264643	1	181239
      264644	FT300314644	3/1/2018 0:46	264644	1	152669
      264645	FT743314645	3/1/2018 1:11	264645	1	178042
      264646	FT221314646	3/1/2018 1:12	264646	1	185037
      

      The following query, which is the same as above but with 1 extra join, now produces INCORRECT results:

      SELECT `ftl_orders`.`id`, `ftl_orders`.`order_id`, `ftl_orders`.`created`, `ftl_order_lines`.`order_id`, `ftl_order_lines`.`quantity`, `ftl_order_lines`.`item_id` 
      FROM `ftl_orders` 
      LEFT JOIN `ftl_order_lines` ON ( `ftl_order_lines`.`order_id` = `ftl_orders`.`id` ) 
      LEFT JOIN `ftl_items` ON ( `ftl_items`.`id` = `ftl_order_lines`.`item_id` )  
      WHERE  (  ( `ftl_orders`.`created` > '2018-03-01' )  AND  ( TRUE )  )  
      LIMIT 0,10
      

      RESULTS:

      id	order_id	created	order_id	quantity	item_id
      264640	FT344314640	3/1/2018 0:11	264640	1	155914
      264641	FT887314641	3/1/2018 0:12	264641	1	63664
      264641	FT887314641	3/1/2018 0:12	264641	1	63664
      264641	FT887314641	3/1/2018 0:12	264641	1	63664
      264641	FT887314641	3/1/2018 0:12	264641	1	63664
      264641	FT887314641	3/1/2018 0:12	264641	1	63664
      264641	FT887314641	3/1/2018 0:12	264641	1	63664
      264641	FT887314641	3/1/2018 0:12	264641	1	63664
      264641	FT887314641	3/1/2018 0:12	264641	1	63664
      264641	FT887314641	3/1/2018 0:12	264641	1	63664
      

      Attachments

        Activity

          As allways, I cannot do much about such problems without being able to reproduce them. For this, I need the complete show create table of the CONNECT tables, the table definitions of the remote tables and, if possible, the data or a sample of the implied data.
          Robert, if I still have access to your host, I may alternatively use JDBC from it.

          Anyway, doing joins locally on remote tables is cumbersome, especially on large tables. If all your tables are on the same remote location, it is much better to have the remote server executing such queries by defining a table on a view, for instance:

          CREATE TABLE bigjoin CONNECTION='jdbc:...' SRCDEF="
          SELECT orders.id, orders.order_id, orders.created,
          order_lines.order_id, order_lines.quantity, order_lines.item_id 
          FROM orders JOIN order_lines ON (order_lines.order_id = orders.id )  
          WHERE orders.created > '2018-03-01' LIMIT 0,10";
          

          The SRCDEF being a query with the proper syntax, table and column names of the remote server.

          I am not sure the third query produces wrong results. The second join may return several lines for each line of the first join and the 10 limit may give the impression of wrong result. You can test that by replacing SELECT by SELECT DISTINCT.

          bertrandop Olivier Bertrand added a comment - As allways, I cannot do much about such problems without being able to reproduce them. For this, I need the complete show create table of the CONNECT tables, the table definitions of the remote tables and, if possible, the data or a sample of the implied data. Robert, if I still have access to your host, I may alternatively use JDBC from it. Anyway, doing joins locally on remote tables is cumbersome, especially on large tables. If all your tables are on the same remote location, it is much better to have the remote server executing such queries by defining a table on a view, for instance: CREATE TABLE bigjoin CONNECTION='jdbc:...' SRCDEF=" SELECT orders.id, orders.order_id, orders.created, order_lines.order_id, order_lines.quantity, order_lines.item_id FROM orders JOIN order_lines ON (order_lines.order_id = orders.id ) WHERE orders.created > '2018-03-01' LIMIT 0,10"; The SRCDEF being a query with the proper syntax, table and column names of the remote server. I am not sure the third query produces wrong results. The second join may return several lines for each line of the first join and the 10 limit may give the impression of wrong result. You can test that by replacing SELECT by SELECT DISTINCT.

          People

            bertrandop Olivier Bertrand
            rdyas Robert Dyas
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.