[MDEV-11954] CONNECT Engine - ODBC: Too long value for SRCDEF Created: 2017-01-31  Updated: 2017-02-02  Resolved: 2017-01-31

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 10.1.19
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Juan Telleria Assignee: Olivier Bertrand
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Windows 7


Issue Links:
Relates
relates to MDEV-11832 CONNECT Engine - ODBC: SRCDEF Views P... Closed
relates to MDEV-11955 CONNECT Engine - ODBC Table Views: Ef... Closed

 Description   

If I create the following table:

 
DROP TABLE IF EXISTS scada_data_processing.`t1_temp_aux_eo_ots_with_consumption_subset_population`;
 
CREATE TABLE scada_data_processing.`t1_temp_aux_eo_ots_with_consumption_subset_population` 
(
  `NumOT` varchar(36) NOT NULL,
  `OT_With_Consumption` BOOLEAN
) ENGINE=CONNECT
  TABLE_TYPE=ODBC
  CONNECTION="exp_operativa"  
  OPTION_LIST="UseDSN=Yes,User=************,Password=************"
  SRCDEF =
  "
    SELECT
      A.[NumOT] AS NumOT,
      CASE
        WHEN
          count(CASE WHEN B.[NumOT] IS NOT NULL THEN 1 ELSE 0 END) >=1
        THEN
          1
        ELSE
          0
      END AS OT_With_Consumption
    FROM
      exp_operativa.GPA99.T_OTS AS A
      LEFT JOIN exp_operativa.GPA99.T_MOVEMENTS AS B
        ON A.[NumOT] = B.[NumOT]
    WHERE
      [NumOT] IN ('401591205', '401761515', '401761682', '401761698', '401761721', '401762159', '401774167', '401774169', '401774171', '401774179', '401774182', '401774186', '401774193', ..., '501032891');
      AND B.[Enabled] = 1
    GROUP BY
      A.[NumOT]
  ";

I obtain error when creating the table "Too long value for SRCDEF". Could this limitation be removed by creating (For example) a Server Variable that controls this parameter.

Thank you.

Kind regards,
Juan



 Comments   
Comment by Olivier Bertrand [ 2017-01-31 ]

I have created the table on my machine with no problem but not replacing ....

Note that the limitation on string options is not due to CONNECT but is a MariaDB limitation. During the parsing of a query (here the CREATE TABLE statement) there are many places in sql_yacc.cc that raise this message with code such as:

if ((yyvsp[(3) - (3)].lex_str).length > ENGINE_OPTION_MAX_LENGTH)
   ...

Looking at the declaration of ENGINE_OPTION_MAX_LENGTH I see in create_options.h:

enum { ENGINE_OPTION_MAX_LENGTH=32767 };

What is strange is that in your create table statement is that the SRCDEF is not as large except if ... of the IN statement contains many more values.

I don't know whether it is possible to change this MariaDB limitation.

Comment by Juan Telleria [ 2017-02-01 ]

Of course, the IN clause could contain even more than 100.000 elements. I included only a little extract in the example.

My objective is about time efficiency when dealing with Tables which contain big amount of data in the Remote Server. So I have mainly 3 options:

  • Do regular SELECT using REMOTE Indexes (Not always Possible): Works Perfectly. Fast.
  • By a prepared statement, create a custom SRCDEF table_option with an IN clause: Fastest option, but with the character limitation you stated. An example could be as follows (It took 30 [Sec.]):

 SET @@group_concat_max_len = 18446744073709547520;
 
SELECT
      GROUP_CONCAT(DISTINCT CONCAT("'", MachineUT, "'") SEPARATOR ', ')
      INTO @Var_CONNECT_MachineUT_Current_Population
FROM
      scada_data_processing.`t0_windone_wtg_envelopes_population`;
    
SET @@group_concat_max_len = 4000;
 
SELECT 
    CONCAT("'", MIN(DateTimeOFF), "'") AS `WTG Population's First DateTimeOFF`,
    CONCAT("'", MAX(DateTimeOFF), "'") AS `WTG Population's Last DateTimeOFF`
INTO 
    @Var_CONNECT_MIN_DateTimeOFF_Population,
    @Var_CONNECT_MAX_DateTimeOFF_Population
FROM
    scada_data_processing.`t0_DISPOIN_imported_data`;
 
SET @Var_CONNECT_Create_EO_OTS_STOPS_Table =
CONCAT('
CREATE TABLE  scada_data_processing.`t0_temp_aux_eo_ots_stops_subset_population` 
(
   `NumOT` varchar(36) NOT NULL,
   `OTDescription` varchar(200) DEFAULT NULL,
   `UT` varchar(120) DEFAULT NULL,
   `OTTypeCode` varchar(20) DEFAULT NULL,
   `Subtipo` varchar(120) DEFAULT NULL,
   `LCNId` varchar(6) DEFAULT NULL,
   `StopStartDate` datetime(6) NOT NULL,
   `StopEndDate` datetime(6) NOT NULL,
   `CauseCode` varchar(12) NOT NULL
) ENGINE=CONNECT
  TABLE_TYPE=ODBC
  CONNECTION="exp_operativa"  
  OPTION_LIST="UseDSN=Yes,User=***********,Password=*************"
  SRCDEF =
"
  SELECT
     A.[NumOT],
     [OTDescription],
     [UT],
     [OTTypeCode],
     [Subtipo],
     [LCNId],
     [StopStartDate],
     [StopEndDate],
     [CauseCode]
FROM
     exp_operativa.GPA99.T_OTS AS A
     LEFT JOIN exp_operativa.GPA99.T_STOPS AS B
          ON A.[NumOT] = B.[NumOT]
     LEFT JOIN exp_operativa.GPA99.T_MACHINES AS C
          ON A.[UT] = C.[MachineUT]
WHERE
     (A.[UT] IN (', @Var_CONNECT_MachineUT_Current_Population, '))
     AND ([StopStartDate] IS NOT NULL)
     AND ([StopStartDate] BETWEEN 
          CONVERT(DATETIME,', @Var_CONNECT_MIN_DateTimeOFF_Population, ')
          AND CONVERT(DATETIME,', @Var_CONNECT_MAX_DateTimeOFF_Population, '))
     AND 
          ([StopEndDate] <= DATEADD(HOUR, 24, CONVERT(DATETIME,',                
          @Var_CONNECT_MAX_DateTimeOFF_Population, ')))
     AND (A.[Enabled] = 1 AND B.[Enabled] = 1 AND C.[Enabled] = 1)
     AND (ISNULL([MainStartDate], CAST(-53690 as DATETIME)) <= [StopStartDate])
     AND ([StopEndDate] <= ISNULL([MainEndDate], CAST(2958463 as DATETIME)))
"');
 
PREPARE Prepare_Statement_EO_OTS_Subset_Population 
      FROM @Var_CONNECT_Create_EO_OTS_STOPS_Table;
        
DROP TABLE IF EXISTS    scada_data_processing.`t0_temp_aux_eo_ots_stops_subset_population`;
 
EXECUTE Prepare_Statement_EO_OTS_Subset_Population;
 
DEALLOCATE PREPARE Prepare_Statement_EO_OTS_Subset_Population;

  • By using Views via SRCDEF Clause, and including large WHERE conditionals from SRCDEF (In my case the IN clause), outside the SRCDEF query, in a regular SELECT...WHERE clause: For being able to do further filtering over the View Definition, I would have to transfer such data all over the network to my server, and the filter it. The same example took 3 [min] with my data, almost 6 time more than by the previous method.

So maybe one possible improvement for CONNECT ODBC Query Efficiency could be the following:
*That in ODBC CONNECT Tables with Views (Using SRCDEF Clause), when performing in them a simple SELECT (Without JOINS), the WHERE clause is also done in the Remote Server, attaching such where clause to the View from SRCDEF as an AND in WHERE clause.

This would save the time required to transfer all the Views Data from the Remote Server to the local server for being able to do further filtering in it*

The previous example would be now as follows:

SELECT 
    CONCAT("'", MIN(DateTimeOFF), "'") AS `WTG Population's First DateTimeOFF`,
    CONCAT("'", MAX(DateTimeOFF), "'") AS `WTG Population's Last DateTimeOFF`
INTO 
    @Var_CONNECT_MIN_DateTimeOFF_Population,
    @Var_CONNECT_MAX_DateTimeOFF_Population
FROM
    scada_data_processing.`t0_DISPOIN_imported_data`;
 
SET @Var_CONNECT_Create_EO_OTS_STOPS_Table =
CONCAT('
CREATE TABLE  scada_data_processing.`t0_temp_aux_eo_ots_stops_subset_population` 
(
   `NumOT` varchar(36) NOT NULL,
   `OTDescription` varchar(200) DEFAULT NULL,
   `UT` varchar(120) DEFAULT NULL,
   `OTTypeCode` varchar(20) DEFAULT NULL,
   `Subtipo` varchar(120) DEFAULT NULL,
   `LCNId` varchar(6) DEFAULT NULL,
   `StopStartDate` datetime(6) NOT NULL,
   `StopEndDate` datetime(6) NOT NULL,
   `CauseCode` varchar(12) NOT NULL
) ENGINE=CONNECT
  TABLE_TYPE=ODBC
  CONNECTION="exp_operativa"  
  OPTION_LIST="UseDSN=Yes,User=***********,Password=*************"
  SRCDEF =
"
  SELECT
     A.[NumOT],
     [OTDescription],
     [UT],
     [OTTypeCode],
     [Subtipo],
     [LCNId],
     [StopStartDate],
     [StopEndDate],
     [CauseCode]
FROM
     exp_operativa.GPA99.T_OTS AS A
     LEFT JOIN exp_operativa.GPA99.T_STOPS AS B
          ON A.[NumOT] = B.[NumOT]
     LEFT JOIN exp_operativa.GPA99.T_MACHINES AS C
          ON A.[UT] = C.[MachineUT]
WHERE
     ([StopStartDate] IS NOT NULL)
     AND ([StopStartDate] BETWEEN 
          CONVERT(DATETIME,', @Var_CONNECT_MIN_DateTimeOFF_Population, ')
          AND CONVERT(DATETIME,', @Var_CONNECT_MAX_DateTimeOFF_Population,'))
     AND ([StopEndDate] <= DATEADD(HOUR, 24, CONVERT(DATETIME,',                
             @Var_CONNECT_MAX_DateTimeOFF_Population, ')))
     AND (A.[Enabled] = 1 AND B.[Enabled] = 1 AND C.[Enabled] = 1)
     AND (ISNULL([MainStartDate], CAST(-53690 as DATETIME)) <= [StopStartDate])
     AND ([StopEndDate] <= ISNULL([MainEndDate], CAST(2958463 as DATETIME)))
"');
 
PREPARE Prepare_Statement_EO_OTS_Subset_Population 
      FROM @Var_CONNECT_Create_EO_OTS_STOPS_Table;
        
DROP TABLE IF EXISTS    scada_data_processing.`t0_temp_aux_eo_ots_stops_subset_population`;
 
EXECUTE Prepare_Statement_EO_OTS_Subset_Population;
 
DEALLOCATE PREPARE Prepare_Statement_EO_OTS_Subset_Population;
 
SELECT
     *
FROM
     scada_data_processing.`t0_temp_aux_eo_ots_stops_subset_population`
WHERE
     MachineID IN
     (
          SELECT
               GROUP_CONCAT(DISTINCT CONCAT("'", MachineUT, "'") SEPARATOR ', ')
          FROM
                scada_data_processing.`t0_windone_wtg_envelopes_population`
      ) AS `derived_t1`

So in the example the optimizer would:

  • First, execute the derived table in MariaDB (Local Server), for being a regular InnoDB / Aria table (Not CONNECT).
  • In the main SELECT, pass the result from the derived table to the Remote Server, adding the WHERE clause from the SELECT to the SRCDEF statement executed in the Remote Server.

*So in summary, my suggestion is that in a table which has a CONNECT Engine (With an SRCDEF Clause), when performing a simple SELECT over it, CONNECT:
Checks the FROM clause from the SELECT and:

  1. If it is the only table from the FROM clause (No JOINS), it attaches the WHERE clause from the SELECT to the view's SRCDEF clause's WHERE.
  2. If there are several tables in the FROM clause, all data from the View is transfered to Local Server, and then further filtering is done over it.

With the main advantage that complex joins could be included in the View for being performed and remote server, and further filtering over such new materialized table (Created from joins), is also performed in the remote server.

Even more, the results data subset could be saved in our local server into a Temporary Table for working with them, adding indexes into such table and doing further complex joins with our servers data.*

Comment by Juan Telleria [ 2017-02-01 ]

However, if CONNECT ODBC is left just as it is now implemented (Without changes), I will still be really satisfied, as it has an awesome implementation and works perfectly.

Simply I exposed my time performance concerns to try to give it a twist and make it even better as regards time consumption efficiency.

Thank you

Comment by Olivier Bertrand [ 2017-02-01 ]

A third possibility could be to use a "sending command" table. With it you can send any query that will be executed by the remote server. I don't think there is a length limitation to the length of a string included in a query. The problem is to get back the result of the query ("sending command" tables only tell you that the command was executed successfully or not.

So the sent query should be something such as:

CREATE TABLE foo as (your big select query);

Just remains to create locally a table based on this foo remote result table and to read it.

I did not try it yet but can do it if you are interested and publish an example of how to do it.

Comment by Juan Telleria [ 2017-02-01 ]

Of course, If you want I could take some time at the weekend for doing and example.

I also tried yesterday to use an ODBC CONNECT Table for "Sending Queries", but I left it aside for not being able to have the result set back (As my REMOTE Server is Read Only).

However, a pretty good and ingenious idea just occurred to me:

How about creating a special type of CONNECT ODBC table which can send Foreing Server Queries in the WHERE Clause of the SELECT, but at the same time can return Results Back.

This new implementation would be really simple, and would not require to modify any other code (With subsequent bugs that could carry on).

A conceptual example would be as follows:

Connect Engine Table Create Statement

CREATE TABLE Connect_ODBC_Table_Name 
(
  Result_Column_1 data_type,
  Result_Column_2 data_type,
  [...]
  Result_Column_N data_type
)
ENGINE = CONNECT
TABLE_TYPE = ODBC
CONNECTION='[...(Connection String)...]'
OPTION_LIST='Execsrc=1';

  • NOTE: "Execsrc" could have instead number 2 for example.

SELECT Statement to CONNECT Engine Table

SELECT
    Result_Column_1,
    Result_Column_2,
    [...]
    Result_Column_N
FROM
    Connect_ODBC_Table_Name
WHERE
    EXECSRC_Command = '  [...(SQL Command Executed in the Remote Server)...] ';

Observations:

  • "EXECSRC_Command" would be a reserved CONNECT Column name, used for sending commands to Remote Server. This column, although it isn't shown in the CREATE TABLE, it would always exist "hidden" when OPTION_LIST='Execsrc=1' is used.
  • The SQL Command Executed in the Remote Server (In the WHERE Clause) shall always be a string: Which could be created via CONCAT or other means.
  • Obviously the CONNECT ODBC Table should have the Column Names that correspond to the Query which is going to be executed in the Remote Server, unless an empty table is created:

    CREATE TABLE Connect_ODBC_Table_Name 
         ENGINE = CONNECT
         TABLE_TYPE = ODBC
         CONNECTION='[...(Connection String)...]'
         OPTION_LIST='Execsrc=1';
    

    SELECT
        Result_Alias_Column_1,
        Result_Alias_Column_2,
        [...]
        Result_Alias_Column_N
    FROM
        Connect_ODBC_Table_Name
    WHERE
        EXECSRC_Command = '  [...(SQL Command Executed in the Remote Server)...] ';
    

    SELECT
        *
    FROM
        Connect_ODBC_Table_Name
    WHERE
        EXECSRC_Command = '  [...(SQL Command Executed in the Remote Server)...] ';
    

I would be pleased to do the documentation if required.

Kind regards,
Juan Telleria

Comment by Olivier Bertrand [ 2017-02-01 ]

I don't understand some of your example. In the observations you say:
Obviously the CONNECT ODBC Table should have the Column Names that correspond to the Query
but this is followed by a create table with no columns and a query with aliases that are coming from nowhere.

Anyway, the main problem seems the EXECSRC_Command hidden column. If it is not included in the columns of the table, it will raise a syntax error in the MariaDB parsing process. However, it may perhaps be included as a CONNECT special column.

Comment by Juan Telleria [ 2017-02-01 ]

I just realized that a "sending command" table ¿Might not be feasible? As when we do the following statement:

SELECT
    Result_Column_1,
    Result_Column_2,
    [...]
    Result_Column_N
FROM
    Connect_ODBC_Table_Name
WHERE
    EXECSRC_Command = '  [...(SQL Command Executed in the Remote Server)...] ';

The SQL Command to be executed in the remote server I suppose (Maybe not) that is subject to the length of the "EXECSRC_Command" column, which has a maximum length for VARCHAR of 65535 characters (Not unlimited).

However, a more detailed example would be as follows:

Imagine I create the following CONNECT ODBC table:

CREATE TABLE t_MyMachines
(
     MachineID INT NOT NULL,
     Machine_Description VARCHAR(255) NULL,
     Model VARCHAR(45) NULL,
     Cost DOUBLE NULL,
     EXECSRC_Command VARCHAR(65535) FLAG=0   -- ¿Shall be explicitly included?
) 
     ENGINE = CONNECT
     TABLE_TYPE = ODBC
     CONNECTION='[...(Connection String)...]'
     OPTION_LIST='Execsrc=1';

The SELECT would be:

SELECT
    MachineID,
    Machine_Description,
    Model
FROM
    Connect_ODBC_Table_Name
WHERE
    EXECSRC_Command =
    "
        SELECT
            MachineID,
            Machine_Description,
            Model,
            Price *1.21 AS Cost
        FROM
            t_MyMachines_Remote_Server
        WHERE
            Year = 2015
    "
    AND Cost >= 173;

In the SELECT:

  • We would perform the command specified in EXECSRC_Command = " [...] " in the remote server.
  • The filtering with the "Cost" Column in the WHERE Clause in the Local Server.
  • And only show with the SELECT the MachineID, Machine_Description and Model columns from CONNECT Table t_MyMachines.

But in case I create a CONNECT Table without Column Definition:

CREATE TABLE t_MyMachines
     ENGINE = CONNECT
     TABLE_TYPE = ODBC
     CONNECTION='[...(Connection String)...]'
     OPTION_LIST='Execsrc=1';

The SELECT would be:

SELECT
    MachineID,
    Machine_Description,
    Model,
    Cost
FROM
    Connect_ODBC_Table_Name
WHERE
    EXECSRC_Command =
    "
        SELECT
            MachineID,
            Machine_Description,
            Model,
            Price *1.21 AS Cost
        FROM
            t_MyMachines_Remote_Server
        WHERE
            Year = 2015
    "
    AND Cost >= 173;

So the CONNECT ODBC Table would return to the local server a table which contains the following columns:

  • MachineID,
  • Machine_Description,
  • Model,
  • Cost
Comment by Juan Telleria [ 2017-02-01 ]

Thank you

Comment by Olivier Bertrand [ 2017-02-01 ]

I appreciate your suggestion, unfortunately it is not feasible. One reason is that ordinary ODBC (or JDBC or MYSQL) tables and the corresponding "send command" tables are quite distinct objects. They are implemented in C++ by different classes having few things in common.

For instance, "send command" tables are written to retrieve just one record made of specific items while ordinary tables are written to fetch in a result set, and bound their columns to the result set values. Mixing both of them in a common class would be a kluge.

However, retrieving a result set from an ordinary "send command" table proved to be fairly simple. For example I made and tested this (using MYSQL table type instead of ODBC but that would be almost identical):

create table makeres (
command varchar(128) not null,
number int(5) not null flag=1,
message varchar(255) flag=2)
engine=connect table_type=mysql
connection='mysql://root@localhost/connect'
option_list='Execsrc=1';
select * from makeres where command='create or replace table res as select 5+7 as result';

This replies:

command number message
create or replace table res as select 5+7 as result 1 Affected rows

Now to get the result, simply create the table to read the remote made table:

create or replace table getres
engine=connect table_type=mysql
connection='mysql://root@localhost/connect' tabname=res;
select * from getres;

And you get:

result
12

Now if you send another query:

select * from makeres where command="create or replace table res as select 7 as number, 'seven' as name";

You can still get the result repeating:

create or replace table getres
engine=connect table_type=mysql
connection='mysql://root@localhost/connect' tabname=res;
select * from getres;

To obtain:

number name
7 seven

Of course the getres table must be re-created to get the new columns, which have not to be specified. They are automatically retrieved from the remote res table.

I think all this can also be automized in a procedure.

Comment by Juan Telleria [ 2017-02-02 ]

Thank you Oliver,

I totally agree that it is not feasible, apart for being far away from Standard SQL.

I appreciate your example about how to send & retrieve commands via a "Send Command Table" based on CREATE TABLE Table_Name AS SELECT [...], I'll give it a try by Speaking with my Remote Database Administrator.

Last suggestion which might be interesting: What about an implementation similar to that of TRANSACT-SQL's OPENQUERY():
OPENQUERY (Transact-SQL)

In MariaDB could be::

CREATE TABLE t_CONNECT_Table_Name
     ENGINE = CONNECT
     TABLE_TYPE = ODBC
     CONNECTION='[...(Connection String)...]'
     OPTION_LIST='Execsrc=1';

And:

SELECT 
     *
FROM 
     OPENQUERY(t_CONNECT_Table_Name, '  [...(SRCDEF Query Exec. Rem. Serv.)...] ') 

Comment by Olivier Bertrand [ 2017-02-02 ]

MariaDB does not accept tables with no columns. When some CONNECT tables seem to be created with no columns, this is because the discovery process is called allowing to retrieve columns definitions from another table or pre defined specifications.

In your example the create table would fail saying:

 "Cannot get columns from t_CONNECT_Table_Name"

because the target table name is by default the table name.

Comment by Juan Telleria [ 2017-02-02 ]

Never mind, thank you Oliver

I'll give a try to your suggestion

Generated at Thu Feb 08 07:53:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.