[MDEV-15558] Save real data type in addition to existing data type Created: 2018-03-13 Updated: 2023-12-22 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Data types |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major |
| Reporter: | Diego Dupin | Assignee: | Alexander Barkov |
| Resolution: | Unresolved | Votes: | 3 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Description |
Saving "real" data typeData type are actually not the "real" data type, but "storage" data type. BOOLEAN type means something compare to storage data type BIT(1) : that means Connector may return true/false and not 0/1. Connectors actually don't know the "real" data type, and it's needed.
MariaDB saves JSON like a string, then connector has no information that this field is a JSON object. That additional byte would permit a lot of new functionalities: BOOLEAN, JSON ok, but that may permit new types like IPV4 / IPV6 / UUID that are store are text. Another minor annoyance is that SHOW CREATE could return initial type :
Possible non breaking change of protocolCurrent protocol for Column definition packet is :
First field always send string<lenenc> catalog (always 'def') = 0x03 0x64 0x65 0x66 This field can be used to transmit to driver this additional metadata byte. |
| Comments |
| Comment by Rasmus Johansson (Inactive) [ 2018-05-25 ] | |||||||||||||||||||||||||||||||||||||||||
|
serg can you review this proposal for 10.4? | |||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2018-05-25 ] | |||||||||||||||||||||||||||||||||||||||||
|
There is no "real" data type. There are data types, and data type aliases. Alias is, exactly, just a different name for something. It's not a new or some "real" data type. The issue here seems to be not the "real" data type, but that the connector cannot tell JSON object from a string. It cannot do it based on the data type, because SQL:2016 (part 2, section 4.46 JSON data handling in SQL) says explicitly
It's easy to pass some flag over to connectors, if the server would know iftself what columns are supposed to contain JSON items. Perhaps the standard expected that it'll be done with a UDT? | |||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-05-25 ] | |||||||||||||||||||||||||||||||||||||||||
|
is JSON in the create table not a pretty strong indicator that column is supposed to contain JSON items. This information gets lost , it is not in frm and anywhere. But it should be there I think | |||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-05-25 ] | |||||||||||||||||||||||||||||||||||||||||
|
It is a problem for at least JDBC and .NET. they have to guess what is the datatype, and they support richer type systems than MySQL or MariaDB does If server supported annotation, it would help | |||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2018-05-28 ] | |||||||||||||||||||||||||||||||||||||||||
|
The same problem exists for the GEOMETRY data type variants.
Notice, it reports both columns as GEOMETRY. The precise data types POINT and POLYGON are not visible. The good news is that GEOMETRY at least exposes its additional attributes in INFORMATION_SCHEMA.GEOMETRY_COLUMNS. So for example one can use a query like this to know the exact types:
But I think that the exact data type should be available on the client automatically, without having to use I_S:
Note, DESC and SHOW CREATE actually report the exact data type:
But for connectors, it's a new query again, not any better than using I_S. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-05-28 ] | |||||||||||||||||||||||||||||||||||||||||
|
Bar, the problem is that we do not want to issue an I_S query for every column in resultset metadata. It should be sent with resultset metadata, And BTW, the missing protocol-level documentation for Geometry type is probably the reason for missing support by the connectors. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2018-05-28 ] | |||||||||||||||||||||||||||||||||||||||||
|
I agree, I also prefer to have the exact data type name automatically on the client side. I'm not aware of any special protocol-level features of GEOMETRY. Perhaps there are no any. We need to ask holyfoot to make sure. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-05-28 ] | |||||||||||||||||||||||||||||||||||||||||
|
Resultset metadata should be sufficient to determine data type, or alias, or whatever people used during their CREATE TABLE. an extra I_S query is not an option, cannot be done within connector without increasing latency by the factor of 2? | |||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2018-05-28 ] | |||||||||||||||||||||||||||||||||||||||||
|
I propose we make GEOMETRY report the exact data type first. Then we can extrapolate this for all new data types. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Diego Dupin [ 2018-06-22 ] | |||||||||||||||||||||||||||||||||||||||||
|
bar for geometry, connector can still know type (data type MYSQL_TYPE_GEOMETRY=255), then parse blob to know geometry sub type (WKBPoint / WKBLineString, ...). | |||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-06-22 ] | |||||||||||||||||||||||||||||||||||||||||
|
and for BOOLEAN , the connector does not know it is BOOLEAN, for example | |||||||||||||||||||||||||||||||||||||||||
| Comment by Rasmus Johansson (Inactive) [ 2018-08-03 ] | |||||||||||||||||||||||||||||||||||||||||
|
How should this work? Should the application that sends the data for storage in the database inform what datatype it was using for each piece of data before converting it to the storage datatype? Isn't that what the connectors are supposed to do? | |||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-08-03 ] | |||||||||||||||||||||||||||||||||||||||||
|
ratzpo, The original type information is lost with that, but sometimes you want driver to know the actual type. An example is ResultSet.getObject() in JDBC. It should return a Boolean, not an Integer for a column declared as BOOL. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2018-08-08 ] | |||||||||||||||||||||||||||||||||||||||||
|
I don't think that returning the original type alias that the user used is the solution. One can store any arbitrary data in a "JSON" column, and one can store 5 in a "boolean" column. An alias is just that, an alias, it does not, by definition, carry any additional metadata, so it's wrong to make any conclusions based on what alias the user used. Not mentioning that aliases don't survive mysqldump|mysql. If the connector needs to distinguish between a boolean and an integer, perhaps they should be two different types? How is it solved elsewhere? | |||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-08-08 ] | |||||||||||||||||||||||||||||||||||||||||
|
All Connectors already distinguish between boolean and integer, for couple of decades. There is a lot of hacks, the extra parameter to the connector tinyInt1IsBoolean etc And as of the alias.- maybe we should not discard "all" the information from the original DDL. And perhaps, "aliases" should survive mysqldump. This is all what this MDEV is about. User can help by adding check constraints. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Martha C Henry [ 2018-08-25 ] | |||||||||||||||||||||||||||||||||||||||||
|
I don't know squat about all of the technical details for the connector that y'all are discussing. Let me say this about the issue. I vote yes on somehow keeping the aliases in the DDL and mysqldump. If you are not going to keep the alias in the DDL, I would suggest getting rid of the alias entirely. Lucky me - I use ORM/Hibernate. So in my property definition for a database column I can specify boolean and json_array. That way, our programmers can't screw up the database by setting a bit or tinyint(1) column or json_array column to an invalid value. In mysql, I usually use BIT for boolean - 0/1 Obviously, there is a lot of interpretation possible for Bit/TinyInt(1)/Boolean by the client if usage is boolean
From what I can tell, mysql converts BIT to TINYINT(1), which is what you are doing in MariaDB with BOOLEAN tinyInt1isBit Default: true transformedBitIsBoolean Default: false Thank you to @diego.dupin for telling me about this thread | |||||||||||||||||||||||||||||||||||||||||
| Comment by Diego Dupin [ 2018-09-13 ] | |||||||||||||||||||||||||||||||||||||||||
|
I correct my answer about what bar indicate for Geometry type : For null value, connector doesn't know geometry type : Resultset :
javascript standard is GeoJSON, and then expected response from driver is { type: "Point", coordinates: [] }, not NULL, but connector doesn't know. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-09-14 ] | |||||||||||||||||||||||||||||||||||||||||
|
diego dupin, as for non-breaking change of the Column definition packet. Thus, "boo" == Boolean, "jso" = json It could be renamed to type-hint from "catalog". I doubt think there will ever be both catalogs and schemas either in MySQL or MariaDB | |||||||||||||||||||||||||||||||||||||||||
| Comment by Georg Richter [ 2018-10-23 ] | |||||||||||||||||||||||||||||||||||||||||
|
In the packet we have one byte which contains the length of integer values:
How about to increase it's value to 0xE and send 2 bytes at the end of the packet? This would work also with older drivers, which would just ignore additional information. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-10-23 ] | |||||||||||||||||||||||||||||||||||||||||
|
@georg 2 bytes can not describe custom/user-defined datatypes ( we do not have any now, but we could in the future) | |||||||||||||||||||||||||||||||||||||||||
| Comment by Garet Claborn [ 2022-11-16 ] | |||||||||||||||||||||||||||||||||||||||||
|
I very much agree the higher order type should be preserved as much as possible. There are numerous tools which I cannot use with MariaDB simply because JSON is stored as LONGTEXT and my own framework has no way of knowing the user stored JSON there. Very frustrating. We resort to use a "types" table of 3 unsigned integers id, parent, pointer and 1 name This allows us to
If MariaDB were to gain user-defined types; feel free to use that haha. Adding UDT data to information schema and returning the higher-order type directly rather than the built-in, would allow the server to internally continue to use its own primitives | |||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2022-11-16 ] | |||||||||||||||||||||||||||||||||||||||||
|
gclaborn, apparently, JSON already implemented, and type information is passed down. | |||||||||||||||||||||||||||||||||||||||||
| Comment by Martha C Henry [ 2022-11-16 ] | |||||||||||||||||||||||||||||||||||||||||
|
Vladislav Vaintroub - can you give a link to a driver that uses the information? | |||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2022-11-17 ] | |||||||||||||||||||||||||||||||||||||||||
|
martiehen- a link to a driver that uses this information https://github.com/mariadb-corporation/mariadb-connector-nodejs . Why does this driver use this information - it wants to return JSON as JSON, and correct representations of subtypes of Geometry data (points, multilines etc). If you want to contribute to your driver, you can take a look how Capabilities.MARIADB_CLIENT_EXTENDED_TYPE_INFO is used in this driver. The information can also be retrieved with MariaDB ConnectorC API , with the new API mariadb_field_attr() . It is exposed with the command line client, if you start it with --column-type-info switch.. In the example below . notice "BLOB(format=json)" in the output.
| |||||||||||||||||||||||||||||||||||||||||
| Comment by Martha C Henry [ 2022-11-17 ] | |||||||||||||||||||||||||||||||||||||||||
|
Awesome! Thank you! I will check it out |