Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15558

Save real data type in addition to existing data type

Details

    • New Feature
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • Data types
    • None

    Description

      Saving "real" data type

      Data type are actually not the "real" data type, but "storage" data type.
      Storing an additional byte in frm files to save "real" data type information would solve lots of issues.

      BOOLEAN type means something compare to storage data type BIT(1) : that means Connector may return true/false and not 0/1.
      JSON type means that data is not just a string, but that string can be a parse to JSON object.

      Connectors actually don't know the "real" data type, and it's needed.
      Example with JSON: MySQL server save JSON field differently. The result is having a dedicated MYSQL_TYPE_JSON Data type that is sent to connectors.
      Connectors may then handle the data differently. The most popular javascript driver use that information to create a JSON object from the string.

      connection.query("insert into user(name, additional) value (?, ?)", ['john', {game:'myGame', level:23}]);
      connection.query("select from user", (err, rows) => {
      	//with mysql
              //rows[0] = {name:'john', additional: {game:'myGame', level:23}}
       
      	//with mariadb :
              //rows[0] = {name:'john', additional: "{game:'myGame', level:23}"}
      });

      MariaDB saves JSON like a string, then connector has no information that this field is a JSON object.

      That additional byte would permit a lot of new functionalities: BOOLEAN, JSON ok, but that may permit new types like IPV4 / IPV6 / UUID that are store are text.

      Another minor annoyance is that SHOW CREATE could return initial type :

      CREATE TABLE `jsondata` (val1 JSON);
      SHOW CREATE TABLE `testj`.`jsondata`;
      //=> CREATE TABLE `jsondata` (
        `val1` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8
      //=> information about JSON is lost

      Possible non breaking change of protocol

      Current protocol for Column definition packet is :

      string<lenenc> catalog (always 'def')
      string<lenenc> schema
      string<lenenc> table alias
      string<lenenc> table
      string<lenenc> column alias
      string<lenenc> column
      int<lenenc> length of fixed fields (=0xC)
      int<2> character set number
      int<4> max. column size
      int<1> Field types
      int<2> Field detail flag
      int<1> decimals
      int<2> - unused -

      First field always send string<lenenc> catalog (always 'def') = 0x03 0x64 0x65 0x66
      no driver does use this information, because catalog doesn't exist in Mariadb.

      This field can be used to transmit to driver this additional metadata byte.
      This would permit that existing driver will receive the "storage" data type

      Attachments

        Issue Links

          Activity

            gclaborn Garet Claborn added a comment -

            I very much agree the higher order type should be preserved as much as possible. There are numerous tools which I cannot use with MariaDB simply because JSON is stored as LONGTEXT and my own framework has no way of knowing the user stored JSON there. Very frustrating.

            We resort to use a "types" table of 3 unsigned integers id, parent, pointer and 1 name

            This allows us to

            • make arbitrarily complex mappings between fully namespaced types,
            • aliases of those types
            • pointer interpreted by the software system or the type itself
            • even compose a type whose children are a linked list or tree of imported types

            If MariaDB were to gain user-defined types; feel free to use that haha. Adding UDT data to information schema and returning the higher-order type directly rather than the built-in, would allow the server to internally continue to use its own primitives

            gclaborn Garet Claborn added a comment - I very much agree the higher order type should be preserved as much as possible. There are numerous tools which I cannot use with MariaDB simply because JSON is stored as LONGTEXT and my own framework has no way of knowing the user stored JSON there. Very frustrating. We resort to use a "types" table of 3 unsigned integers id, parent, pointer and 1 name This allows us to make arbitrarily complex mappings between fully namespaced types, aliases of those types pointer interpreted by the software system or the type itself even compose a type whose children are a linked list or tree of imported types If MariaDB were to gain user-defined types; feel free to use that haha. Adding UDT data to information schema and returning the higher-order type directly rather than the built-in, would allow the server to internally continue to use its own primitives
            wlad Vladislav Vaintroub added a comment - - edited

            gclaborn, apparently, JSON already implemented, and type information is passed down.
            MariaDB has an extensible type system, and server-side plugin API to add new data types (no it is not SQL)
            Examples of new types are JSON, and there is UUID, and there is INET4/6. This was done in a way that preserves the protocol, and presents data types as strings or blobs to the down-level clients that are unaware of type info. If clients indicate they can read type info (i.e know this MariaDB functionality, the information is passed down to the client. Now you can complain about the drivers that are not using this information, but the datatypes are as true as it gets.

            wlad Vladislav Vaintroub added a comment - - edited gclaborn , apparently, JSON already implemented, and type information is passed down. MariaDB has an extensible type system, and server-side plugin API to add new data types (no it is not SQL) Examples of new types are JSON, and there is UUID, and there is INET4/6. This was done in a way that preserves the protocol, and presents data types as strings or blobs to the down-level clients that are unaware of type info. If clients indicate they can read type info (i.e know this MariaDB functionality, the information is passed down to the client. Now you can complain about the drivers that are not using this information, but the datatypes are as true as it gets.

            Vladislav Vaintroub - can you give a link to a driver that uses the information?

            martiehen Martha C Henry added a comment - Vladislav Vaintroub - can you give a link to a driver that uses the information?
            wlad Vladislav Vaintroub added a comment - - edited

            martiehen- a link to a driver that uses this information https://github.com/mariadb-corporation/mariadb-connector-nodejs . Why does this driver use this information - it wants to return JSON as JSON, and correct representations of subtypes of Geometry data (points, multilines etc). If you want to contribute to your driver, you can take a look how Capabilities.MARIADB_CLIENT_EXTENDED_TYPE_INFO is used in this driver.

            The information can also be retrieved with MariaDB ConnectorC API , with the new API mariadb_field_attr() . It is exposed with the command line client, if you start it with --column-type-info switch.. In the example below . notice "BLOB(format=json)" in the output.

            shell> mysql --column-type-info -uroot
            ...
            MariaDB [test]> create table t(j json);
            Query OK, 0 rows affected (0.006 sec)
             
            MariaDB [test]> select * from t;
            Field   1:  `j`
            Org_field:  `j`
            Catalog:    `def`
            Database:   `test`
            Table:      `t`
            Org_table:  `t`
            Type:       BLOB (format=json)
            Collation:  utf8mb4_general_ci (45)
            

            wlad Vladislav Vaintroub added a comment - - edited martiehen - a link to a driver that uses this information https://github.com/mariadb-corporation/mariadb-connector-nodejs . Why does this driver use this information - it wants to return JSON as JSON, and correct representations of subtypes of Geometry data (points, multilines etc). If you want to contribute to your driver, you can take a look how Capabilities.MARIADB_CLIENT_EXTENDED_TYPE_INFO is used in this driver. The information can also be retrieved with MariaDB ConnectorC API , with the new API mariadb_field_attr() . It is exposed with the command line client, if you start it with --column-type-info switch.. In the example below . notice "BLOB(format=json)" in the output. shell> mysql --column-type-info -uroot ... MariaDB [test]> create table t(j json); Query OK, 0 rows affected (0.006 sec)   MariaDB [test]> select * from t; Field 1: `j` Org_field: `j` Catalog: `def` Database: `test` Table: `t` Org_table: `t` Type: BLOB (format=json) Collation: utf8mb4_general_ci (45)

            Awesome! Thank you! I will check it out

            martiehen Martha C Henry added a comment - Awesome! Thank you! I will check it out

            People

              bar Alexander Barkov
              diego dupin Diego Dupin
              Votes:
              3 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.