[MDEV-9542] CONNECT engine incorrect datatype mapping with ODBC TDS driver Created: 2016-02-09  Updated: 2016-02-17  Resolved: 2016-02-16

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Connect
Affects Version/s: 10.1.11
Fix Version/s: 10.0.24, 10.1.12

Type: Bug Priority: Major
Reporter: Robert Dyas Assignee: Olivier Bertrand
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS7 x86_64 on Google Cloud Platform (standard install) with unixODBC 2.3.1 and freetds v 0.95.19


Attachments: Text File out.txt     Text File sfout.txt    

 Description   

Using Connect engine with the Free TDS driver a couple of the data types are not mapped correctly.

For example, columns declared as DATE on the remote MS SQL Server come back as character and remote columns declared as DOUBLE or FLOAT are coming back with zero decimal places.

Note remote DATETIME does come back correctly as TIMESTAMP in Connect, and remote NUMERIC and DECIMAL types seem to come back correctly.

In the example below, next_bill_date is a DATE on the remote MS server, discount is a FLOAT on the remote server (a percentage) and my_double is a DOUBLE on the remote MS server.

So, for example, if 0.15 is the value of the discount field, it shows in isql as 0.15 but comes back in CONNECT as 0. Likewise a double as 23.45 comes back as 23.
Remote types explicitly declared as NUMERIC or DECIMAL appear to work correctly.

show columns in ms_cust_info
    -> ;
+----------------+---------------+------+-----+-------------------+-----------------------------+
| Field          | Type          | Null | Key | Default           | Extra                       |
+----------------+---------------+------+-----+-------------------+-----------------------------+
| id             | bigint(19)    | NO   |     | NULL              |                             |
| company        | varchar(40)   | YES  |     | NULL              |                             |
| created_on     | timestamp     | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| next_bill_date | varchar(10)   | YES  |     | NULL              |                             |
| discount       | double(8,0)   | YES  |     | NULL              |                             |
| sales          | decimal(8,2)  | YES  |     | NULL              |                             |
| percentage     | decimal(17,3) | YES  |     | NULL              |                             |
| my_double      | double(8,0)   | YES  |     | NULL              |                             |
+----------------+---------------+------+-----+-------------------+-----------------------------+
8 rows in set (0.00 sec)



 Comments   
Comment by Robert Dyas [ 2016-02-09 ]

isql result vs. connect result:

SQL> select my_double from cust_info
+-------------------------+
| my_double               |
+-------------------------+
| 5000.0100000000002      |
|                         |
|                         |
+-------------------------+
SQLRowCount returns 3
3 rows fetched
 
MariaDB [db11003]> select my_double from ms_cust_info;
+-----------+
| my_double |
+-----------+
|      5000 |
|      NULL |
|      NULL |
+-----------+
3 rows in set (0.44 sec)

Comment by Olivier Bertrand [ 2016-02-11 ]

CONNECT tries to translate the SQL types to MariaDB types using the function:

/***********************************************************************/
/*  TranslateSQLType: translate a SQL Type to a PLG type.              */
/***********************************************************************/
int TranslateSQLType(int stp, int prec, int& len, char& v, bool& w)
  {
  int type;
 
  switch (stp) {
    case SQL_WVARCHAR:                      //  (-9)
      w = true;
    case SQL_VARCHAR:                       //   12
      v = 'V';
      type = TYPE_STRING;
      break;
    case SQL_WCHAR:                         //  (-8)
      w = true;
    case SQL_CHAR:                          //    1
      type = TYPE_STRING;
      break;
    case SQL_WLONGVARCHAR:                  // (-10)
      w = true;
    case SQL_LONGVARCHAR:                   //  (-1)
      v = 'V';
      type = TYPE_STRING;
      len = MY_MIN(abs(len), GetConvSize());
      break;
    case SQL_NUMERIC:                       //    2
    case SQL_DECIMAL:                       //    3
//    type = (prec || len > 20) ? TYPE_DOUBLE
//         : (len > 10) ? TYPE_BIGINT : TYPE_INT;
      type = TYPE_DECIM;
      break;
    case SQL_INTEGER:                       //    4
      type = TYPE_INT;
      break;
    case SQL_SMALLINT:                      //    5
      type = TYPE_SHORT;
      break;
    case SQL_TINYINT:                       //  (-6)
    case SQL_BIT:                           //  (-7)
      type = TYPE_TINY;
      break;
    case SQL_FLOAT:                         //    6
    case SQL_REAL:                          //    7
    case SQL_DOUBLE:                        //    8
      type = TYPE_DOUBLE;
      break;
    case SQL_DATETIME:                      //    9
      type = TYPE_DATE;
      len = 19;
      break;
    case SQL_TYPE_DATE:                     //   91
      type = TYPE_DATE;
      len = 10;
      v = 'D';
      break;
    case SQL_INTERVAL:                      //   10
    case SQL_TYPE_TIME:                     //   92
      type = TYPE_STRING;
      len = 8 + ((prec) ? (prec+1) : 0);
      v = 'T';
      break;
    case SQL_TIMESTAMP:                     //   11
    case SQL_TYPE_TIMESTAMP:                //   93
      type = TYPE_DATE;
      len = 19 + ((prec) ? (prec+1) : 0);
      v = 'S';
      break;
    case SQL_BIGINT:                        //  (-5)
      type = TYPE_BIGINT;
      break;
    case SQL_UNKNOWN_TYPE:                  //    0
    case SQL_BINARY:                        //  (-2)
    case SQL_VARBINARY:                     //  (-3)
    case SQL_LONGVARBINARY:                 //  (-4)
    case SQL_GUID:                          // (-11)
    default:
      type = TYPE_ERROR;
      len = 0;
    } // endswitch type
 
  return type;
  } // end of TranslateSQLType

Of course, CONNECT does not directly knows the types that have been assigned to columns, but the types returned by the ODBC driver used to access the table.

For instance, if a column is of type DATE, the driver should say its type is SQL_TYPE_DATE and CONNECT will make a DATE column for it. If not, it could be a bug or just because the ODBC driver returned a wrong SQL type for it. I cannot tell what happens in your case not having the information about your source table, and not able to test with your environment and driver.

Could you execute your test in ODBC test mode to check what the driver returns when asking for the column description?

Note also that this occurs when creating the CONNECT table using discovery, which means not specifying the columns. In the case of wrong column definition, an easy fix is to manually define the columns with the proper type.

Comment by Robert Dyas [ 2016-02-11 ]

RE "Could you execute your test in ODBC test mode to check what the driver returns when asking for the column description?"

I'd be happy to - how do I do that?

Comment by Olivier Bertrand [ 2016-02-11 ]

I know how to do it on Windows but not on Linux. However, instead of this, could you get more information by retrieving the remote table columns definition:

Firstly by executing the function SQLColumns with isql and showing the result. This should show what the TDS driver returns.
Then by doing it with CONNECT by creating a table based on the remote table with the additional option CATFUNC=columns.

By the way, does this behaviour occur only with the TDS driver or also with other ODBC drivers?
I would be interrested also to see the CREATE TABLE you use with CONNECT.

Comment by Robert Dyas [ 2016-02-12 ]

The remote table column definitions in MS SQL Server are:

ID bigint not null
company varchar(40) null
created_on datetime null
next_bill_date date null
discount float null
sales numeric(6,2) null
percentage numeric(15,3) null
my_double float null

The above is from the MS SQL Server Management Studio.

Note also that we have just started using the Simba Salesforce ODBC driver, and the auto discovery there also declares double(8,0) rather than double. A fix for both drivers that would be workable I think is for all floats/doubles created via auto discovery to be declared as double (rather than double(8,0)).... as one can always manually declare.

Also, I think it would be much better for the default behavior of auto-discovery to simply skip and binary and unknown types, and possibly varchar columns exceeding a certain length (configurable) as they can always be manually declared... this option could be configurable via my.cnf.

output from the following command is attached out.txt
echo "help cust_info" | iusql -v TDS sa pass > out.txt

Comment by Robert Dyas [ 2016-02-12 ]

Output from CATFUNC=columns

+-----------+--------------+------------+----------------+-----------+-----------+-------------+---------------+
----------------+-------+----------+---------+
| Table_Cat | Table_Schema | Table_Name | Column_Name    | Data_Type | Type_Name | Column_Size | Buffer_Length |
 Decimal_Digits | Radix | Nullable | Remarks |
+-----------+--------------+------------+----------------+-----------+-----------+-------------+---------------+
----------------+-------+----------+---------+
| test1     | dbo          | cust_info  | id             |        -5 | bigint    |          19 |             8 |
              0 |    10 |        0 |         |
| test1     | dbo          | cust_info  | company        |        12 | varchar   |          40 |            40 |
              0 |     0 |        1 |         |
| test1     | dbo          | cust_info  | created_on     |        11 | datetime  |          23 |            16 |
              3 |     0 |        1 |         |
| test1     | dbo          | cust_info  | next_bill_date |        -9 | date      |          10 |            20 |
              0 |     0 |        1 |         |
| test1     | dbo          | cust_info  | discount       |         6 | float     |          15 |             8 |
              0 |    10 |        1 |         |
| test1     | dbo          | cust_info  | sales          |         2 | numeric   |           6 |             8 |
              2 |    10 |        1 |         |
| test1     | dbo          | cust_info  | percentage     |         2 | numeric   |          15 |            17 |
              3 |    10 |        1 |         |
| test1     | dbo          | cust_info  | my_double      |         6 | float     |          15 |             8 |
              0 |    10 |        1 |         |
+-----------+--------------+------------+----------------+-----------+-----------+-------------+---------------+
----------------+-------+----------+---------+

Comment by Robert Dyas [ 2016-02-12 ]

I'm not sure, but I can see the issue being that floats don't have a specific number of decimal digits (floating point) so the number is 0 in the CATFUNC=columns and the auto-discovery is using that number rather than just mapping to float (or double to be safe).

I will post output from the Simba Salesforece ODBC driver, which fails completely on auto-discovery but works with manually specified columns.

Comment by Robert Dyas [ 2016-02-12 ]

The following file has the output from CATFUNC=columns using the Simba Salesforece ODBC driver.
The auto-discovery just errors out with this driver, but the driver seems to work with manually specified columns.

Here is the CATFUNC output: sfout.txt

Note there are some other issues that I'll post in a different thread (ORDER BY fails on the connect table, but if the connect table is joined with a local table, then ORDER BY works properly).

Comment by Olivier Bertrand [ 2016-02-16 ]

This was caused by Connect not handling NULL values returned by catalog functions.

Note however that Connect only knows what is returned by the ODBC drivers and many are sometimes returning hazardous values.

Therefore, using discovery is a nice facility with limitations and it should be regarded as normal to edit the Create Table statement produced by it to met the exact requirements of the Connect table based on a remote table.

In particular, in the case of double values, you may want to specify the decimal numbers to be handled by the table.

About sorting the table, please consult the documentation about how the handle this point.

Comment by Robert Dyas [ 2016-02-17 ]

Yes Olivier, we came to the same conclusion about wild column mappings [e.g. varchar(1300) for a Fax number from a Salesforce driver] so we are using the catfunc=columns to auto-map to our own requirements and this is working great! Thank you for your assistance - much appreciated!

Generated at Thu Feb 08 07:35:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.