[MDEV-11832] CONNECT Engine - ODBC: SRCDEF Views Performance Created: 2017-01-18  Updated: 2017-05-06  Resolved: 2017-05-06

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Connect
Fix Version/s: N/A

Type: Task Priority: Minor
Reporter: Juan Telleria Assignee: Olivier Bertrand
Resolution: Fixed Votes: 0
Labels: connect-engine

Issue Links:
Relates
relates to MDEV-11954 CONNECT Engine - ODBC: Too long value... Closed

 Description   

CONNECT » ODBC Engine's Documentation states in section "Accessing specified views"
that when Creating CONNECT - ODBC Tables which access to views in the foreign server, WHERE clause is performed locally, instead of in the foreing server accessed via ODBC (As it would be done by default).

MariaDB Documentation » CONNECT » ODBC

Therefore, I would like to suggest that WHERE is also performed in the "Foreing Server", as it allows:

  • To reduce the data which is sent by the data source, preventing to have to send the hole table though the network for being able to filter it.

An example would be as follows:

CREATE TABLE custnum 
ENGINE=CONNECT 
TABLE_TYPE=ODBC 
CONNECTION='DSN=MS Access Database;DBQ=C:/Program Files/Microsoft Office/Office/1033/FPNWIND.MDB;'
SRCDEF=
'select country, count(*) as customers from customers group by country';

SELECT
*
FROM
custnum
WHERE
country IN (SELECT country FROM mylocalcountries);

or

SELECT
*
FROM
custnum
WHERE
country = 'UK';

Maybe this could be achieved by creating a "derived table" in the Foreing Server (which contains the View), to which the WHERE clause is applied afterwards.

Thank you

Hope my suggestion is useful!



 Comments   
Comment by Juan Telleria [ 2017-01-31 ]

Also SRCDEF clause has a max. character limitation, which does not allow to perform big queries

Comment by Olivier Bertrand [ 2017-02-02 ]

The simplest solution is to add support for a cond_push where clause to tables based on SRCDEF.

Here is a solution I propose that I have successfully implemented and tested:

CREATE TABLE custnum 
ENGINE=CONNECT 
TABLE_TYPE=ODBC 
CONNECTION='DSN=MS Access Database;DBQ=C:/Program Files/Microsoft Office/Office/1033/FPNWIND.MDB;'
SRCDEF='select country, count(*) as customers from customers where %s group by country';

The %s in the srcdef is a place holder for an eventual where clause made by cond_push. If the select query does not specify a where clause or a where clause not acceptable, it is filled by a dummy where clause (1=1)

Therefore if executing:

SELECT * FROM custnum WHERE country in ('UK','USA');

The query sent to the remote server will be:

select country, count(*) as customers from customers where country in ('UK','USA') group by country

Note: A first solution I tested was to replace the place holder by " WHERE " || clause and by nothing when they were no cond_push clauses. This avoids the dummy 1=1 clause but prevent to specify a more complex SRCDEF such as:

SELECT * FROM t1 WHERE COL=2 AND %s

If you agree on this solution I will also implement it for MYSQL and JDBC tables.

Comment by Juan Telleria [ 2017-02-03 ]

I am quite busy, but I will try to "give a twist" to your implementation by putting several examples in order to check "What would happen" to make it as robust as possible.

Comment by Juan Telleria [ 2017-02-03 ]

Check List:

  • ¿When a WHERE Clause can not be applied in the Foreing Server, and it becomes Dummy. Would the SQL Parser drop a "Warning" such as "Condition [...] was convert into Dummy".
  • It might happen that the WHERE Clause is not applicable in SRCDEF View, but yes in the new materialized table, so it might be possible to transfer the table from the Remote Server to the Local MariaDB Server, and filter the table in it.
  • ¿Functions in the WHERE Clause must correspond to those from the Remote Server or from the Local Server?
  • When I perform a query which contains a subquery such as:

select 
      country,
      count(*) as customers 
from 
      customers 
where 
      country in (SELECT Countries FROM t_MyCountries) 
group by 
      country

Would the parser transform it into:

select 
      country,
      count(*) as customers 
from 
      customers 
where 
      country in ('UK', 'Canada') 
group by 
      country

Maybe a good result could be achieved by means of CONCAT functions

  • If several conditions are executed in the WHERE Clause ¿Those which is possible to execute them in the WHERE Clause from the Foreing Server would be executed in such Server; and those which is possible to execute them in the Local Server are executed in the Local Server?

For example:

    select 
      country,
      GROUP_CONCAT(`Language` SEPARATOR ' / ') AS `Languages`
      count(*) as customers 
from 
      customers AS `t1`  -- Remote Server.
      LEFT JOIN retails AS `t2` USING (CustNum)  -- Local Server.
where 
     -- Executed in Remote Server:
     ( country /* Remote Column */ in ('UK', 'Canada') OR country = 'USA')
     -- Executed in Local Server (Based on the returned table):
     AND `RetailName` LIKE '%AMAZON%'
group by 
      country
}

  • ¿What would happen If we want to filter our query by means of a HAVING clause?
    ¿Would it be performed in the Remote Server or the Local Server?

Or it might depend....

  • %s symbol in SRCDEF could give place to an error, as MariaDB DATE_FORMAT function can use such symbol to express seconds. A more robust identifier in my opinion would be:

CREATE TABLE custnum 
     ENGINE=CONNECT 
     TABLE_TYPE=ODBC 
     CONNECTION='DSN=MS Access Database;DBQ=C:/Program Files/Microsoft      Office/Office/1033/FPNWIND.MDB;'
     SRCDEF=
     '
     select 
          country,
          count(*) as customers 
     from 
          customers 
     where 
          (%_cond_push_)
          AND language = 'English'
     group by 
          country
     ';

  • Joins to Tables which consist or Views. ¿Are consistent with the SQL Standard? Or LEFT Joins are converted to INNER JOINS. I mean, if there are no elements to JOIN, would the Local Table receive a NULL.

In our example, would the result set contain all Customers from our Local Server, and a Boolean Column which indicates us if it is also in the Remote Server's Table.

select 
      A.CustNum AS `Local CustNum`,
      IF(B.CustNum IS NOT NULL, 1, 0) AS `ThereIs Remote CustNum`
from 
      t_customers_local AS A -- Local Server
      LEFT JOIN t_customers_remote AS B  -- Remote Server.
}

We shall make it as consistent as possible with the SQL Standard.

Thank you

Great work!

Comment by Juan Telleria [ 2017-02-03 ]
  • And if we created a CONNECT ODBC Table View, with a Column definition. ¿How would the Engine behave?

CREATE TABLE t_custnum
(
     My_Renamed_Country VARCHAR(45),
     My_Renamed_Count INT
)
     ENGINE=CONNECT 
     TABLE_TYPE=ODBC 
     CONNECTION='DSN=MS Access Database;DBQ=C:/Program Files/Microsoft      Office/Office/1033/FPNWIND.MDB;'
     SRCDEF=
     'select 
          country,
          count(*) as customers 
     from 
          customers 
     where 
          %s 
     group by 
          country
     '; 

     SELECT
          *
     FROM
          t_cusnum
     WHERE
          My_Renamed_Count > 5

¿Would it be performed in local or remote server?

Comment by Juan Telleria [ 2017-02-03 ]

Oliver, I just realized about a bug with CONNECT ODBC Engine (MDEV-11986):

JOINS are performed always as INNER JOINs and not LEFT JOINs.

It's OK, only people shall be conscious about such behavior.

Comment by Olivier Bertrand [ 2017-02-03 ]

When a WHERE Clause can not be applied in the Foreing Server, and it becomes Dummy. Would the SQL Parser drop a "Warning" such as "Condition [...] was convert into Dummy".
This depends on the remote server parser.

It might happen that the WHERE Clause is not applicable in SRCDEF View, but yes in the new materialized table, so it might be possible to transfer the table from the Remote Server to the Local MariaDB Server, and filter the table in it.
Of course, using the MEMORY option. However, this is rarely needed. It must be understood how all this work. The SELECT statement is firstly parsed by MariaDB. It will be executed on the result set retrieved by CONNECT from the remote server. When a where clause is eligible to be included in the query sent to the remote server this will restrict the size of the data returned by the remote server. However the where clause still exist locally and will be executed whether or not it, or part of it, was included in the sent query. This means that the query result will be always correct whether the cond_push process returned something or not. The only thing that can change is concerning performance, not result.

Functions in the WHERE Clause must correspond to those from the Remote Server or from the Local Server
This is a problem because the select query is firstly parsed by the local MariaDB parser, and the constructed query sent to the remote server will be parsed by the remote parser. This is why where clauses containing functions are not kept in the cond_push process and are executed only locally. Thus, functions in the select statement must be MariaDB functions. However, functions in the srcdef string must be remote functions.

When I perform a query which contains a subquery...
Where clause containing sub-queries are not eligible for cond_push. Therefore they are always executed locally. Not transformation can be done.

If several conditions are executed in the WHERE Clause ¿Those which is possible to execute them in the WHERE Clause from the Foreing Server would be executed in such Server; and those which is possible to execute them in the Local Server are executed in the Local Server
This is automatically done by the process I descibed above.

SELECT query with column definition
Aliases are not permitted by SQL in WHERE clauses anyway. Just in USING clauses. However, it is possible to put the place holder in a HAVING clause when applicable. Unfortunately, this works only with data sources allowing aliases in HAVING clause and many, including Oracle or MS Access, do not.

%s symbol in SRCDEF could give place to an error, as MariaDB DATE_FORMAT function can use such symbol to express seconds
Sure enough but changing the place holder symbol would make the code much more complicated. However, instead of looking for "%s" in the srcdef for an eventual place holder I can look for " %s ". Will this solve that problem?

Comment by Juan Telleria [ 2017-02-04 ]

Just '%s' symbol is ok.

Thank you

Comment by Juan Telleria [ 2017-02-06 ]

So in conclusion:

  • Use just '%s' symbol as stated.
  • When connecting to other MariaDB / MySQL server, it would be a good idea that Functions in the WHERE Clause are also executed in the Foreing Server (Whenever it is possible). Maybe a table_option could be included for this purpose. Or simply when using FederatedX engine make that Functions are executed in the Foreing Server.
  • When connecting to other RDBMS (Such as SQL Server, Oracle Database), ¿Could IN clause be also executed in the Foreing Server? This clause is supported by all RDBMS (I think is part of SQL Standard), as well as other basic clauses such as LIKE.

SELECT
     *
FROM
     t_Foreing_Server_Table_Name
WHERE
     Column_Name IN ('Value1', 'Value2', [...], 'ValueN')

or

SELECT
     *
FROM
     t_Foreing_Server_Table_Name
WHERE
     Column_Name IN (@Var_Session_Variable_Name)

SELECT
     *
FROM
     t_Foreing_Server_Table_Name
WHERE
     Column_Name IN (Local_Variable_Name)

Where in the second case, variables consist in a string of type:

  • @Var_Session_Variable_Name = 'Value1', 'Value2', [...], 'ValueN'
  • Local_Variable_Name = 'Value1', 'Value2', [...], 'ValueN'

That could be created by:

SELECT
     GROUP_CONCAT(CONCAT("'", Column_Name, "'") 
INTO 
     @Var_Session_Variable_Name
FROM
     t_Local_Server_Table_Name

With this previous observations I would consider this little development as closed.
Thank you.

Comment by Olivier Bertrand [ 2017-02-06 ]

Currently WHERE clause with functions are not kept with ODBC and JDBC but they are kept and executed remotely by MYSQL tables; functions being the same. IN clauses are also eligible for remote execution.

About SRCDEF tables, I realized that the problem is far more complex than expected when SRCDEF contains aliases, joins and/or GROUP BY clauses. I am working at a more general solution, but this is not trivial and will take time.

Comment by Juan Telleria [ 2017-02-07 ]

Thank you

Comment by Juan Telleria [ 2017-02-08 ]

What about CONNECT Engine internally creating a derived table:

SELECT
     *
FROM
     (
          SELECT
               *
          FROM
               t_Table
     ) AS `t_SRCDEF`
WHERE
     [...]
HAVING
     [...]

Comment by Olivier Bertrand [ 2017-05-06 ]

This now implemented and documented. This was fixed some time ago and should be available in current versions.

Comment by Juan Telleria [ 2017-05-06 ]

Thank you

Generated at Thu Feb 08 07:53:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.