Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12325

Unexpected data type and truncation when using CTE

Details

    Description

      I have a table with a company employee hierarchy:

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1
      (
        id INT,     -- employee ID
        mid INT,    -- manager ID, or NULL if top level employee
        name TEXT   -- employee name
      );
      INSERT INTO t1 VALUES (0,NULL, 'Name');
      INSERT INTO t1 VALUES (1,0,    'Name1');
      INSERT INTO t1 VALUES (2,0,    'Name2');
      INSERT INTO t1 VALUES (11,1,   'Name11');
      INSERT INTO t1 VALUES (12,1,   'Name12');
      

      Now I want to print the company hierarchy as a table that additionally includes employee hierarchy levels and manager names. I use a recursive CTE for that:

      DROP TABLE IF EXISTS t2;
      CREATE TABLE t2 AS
      WITH RECURSIVE
      cteReports (level, id, mid, name) AS
      (
        SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL
        UNION ALL
        SELECT r.level + 1, e.id, e.mid, e.name FROM t1 e
        INNER JOIN cteReports r ON e.mid = r.id
      )
      SELECT
        level, id, mid, name,
        (SELECT name FROM t1 WHERE id= cteReports.mid) AS mname
      FROM cteReports 
      ORDER BY level, mid;
      SHOW CREATE TABLE t2;
      SELECT * FROM t2;
      

      It creates a table with the expected structure:

      +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Table | Create Table                                                                                                                                                                                                          |
      +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | t2    | CREATE TABLE `t2` (
        `level` bigint(1) NOT NULL DEFAULT 0,
        `id` int(11) DEFAULT NULL,
        `mid` int(11) DEFAULT NULL,
        `name` text DEFAULT NULL,
        `mname` text DEFAULT NULL
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
      +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      

      and with correct data:

      +-------+------+------+--------+-------+
      | level | id   | mid  | name   | mname |
      +-------+------+------+--------+-------+
      |     1 |    0 | NULL | Name   | NULL  |
      |     2 |    1 |    0 | Name1  | Name  |
      |     2 |    2 |    0 | Name2  | Name  |
      |     3 |   11 |    1 | Name11 | Name1 |
      |     3 |   12 |    1 | Name12 | Name1 |
      +-------+------+------+--------+-------+
      

      Now I want for some reasons to add some big number to manager IDs (mid):

      DROP TABLE IF EXISTS t2;
      CREATE TABLE t2 AS
      WITH RECURSIVE
      cteReports (level, id, mid, name) AS
      (
        SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL
        UNION ALL
        SELECT r.level + 1, e.id, e.mid + 1000000000000, e.name FROM t1 e
        INNER JOIN cteReports r ON e.mid = r.id
      )
      SELECT
        level, id, mid, name,
        (SELECT name FROM t1 WHERE id= cteReports.mid) AS mname
      FROM cteReports 
      ORDER BY level, mid;
      SHOW CREATE TABLE t2;
      SELECT * FROM t2;
      

      It creates exactly the same table to the one in the previous script and truncates data silently:

      +-------+------+------------+--------+-------+
      | level | id   | mid        | name   | mname |
      +-------+------+------------+--------+-------+
      |     1 |    0 |       NULL | Name   | NULL  |
      |     2 |    1 | 2147483647 | Name1  | NULL  |
      |     2 |    2 | 2147483647 | Name2  | NULL  |
      |     3 |   11 | 2147483647 | Name11 | NULL  |
      |     3 |   12 | 2147483647 | Name12 | NULL  |
      +-------+------+------------+--------+-------+
      

      This looks wrong. Instead if INT, the expected column type for mid should be BIGINT or DECIMAL, and no data truncation should happen.

      Attachments

        Issue Links

          Activity

            PostgreSQL returns an error on attempt to execute the same script:

            ERROR:  recursive query "ctereports" column 3 has type integer in non-recursive term but type bigint overall
            LINE 5:   SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL
            

            bar Alexander Barkov added a comment - PostgreSQL returns an error on attempt to execute the same script: ERROR: recursive query "ctereports" column 3 has type integer in non-recursive term but type bigint overall LINE 5: SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL

            If I run a similar script in SQL Server, it also returns an error (with the same idea like in PostgreSQL):

            DROP TABLE t1;
            DROP TABLE t2;
             
            CREATE TABLE t1
            (
              id INT,
              mid INT,
              name VARCHAR(64)
            );
            INSERT INTO t1 VALUES (0,NULL, 'Name');
            INSERT INTO t1 VALUES (1,0,    'Name1');
            INSERT INTO t1 VALUES (2,0,    'Name2');
            INSERT INTO t1 VALUES (11,1,   'Name11');
            INSERT INTO t1 VALUES (12,1,   'Name12');
             
             
            WITH
            cteReports (level, id, mid, name) AS
            (
              SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL
              UNION ALL
              SELECT r.level + 1, e.id, e.mid + 1000000000000, e.name FROM t1 e
              INNER JOIN cteReports r ON e.mid = r.id
            )
            SELECT
              level, id, mid, name,
              (SELECT name FROM t1 WHERE id= cteReports.mid) AS mname
            INTO t2
            FROM cteReports 
            ORDER BY level, mid;
            

            Message number: 240, Severity 16, State: 1, Line: 1, Msg: 'Types don't match between the anchor and the recursive part in column "mid" of recursive query "cteReports".'
            

            bar Alexander Barkov added a comment - If I run a similar script in SQL Server, it also returns an error (with the same idea like in PostgreSQL): DROP TABLE t1; DROP TABLE t2;   CREATE TABLE t1 ( id INT , mid INT , name VARCHAR (64) ); INSERT INTO t1 VALUES (0, NULL , 'Name' ); INSERT INTO t1 VALUES (1,0, 'Name1' ); INSERT INTO t1 VALUES (2,0, 'Name2' ); INSERT INTO t1 VALUES (11,1, 'Name11' ); INSERT INTO t1 VALUES (12,1, 'Name12' );     WITH cteReports ( level , id, mid, name ) AS ( SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL UNION ALL SELECT r. level + 1, e.id, e.mid + 1000000000000, e. name FROM t1 e INNER JOIN cteReports r ON e.mid = r.id ) SELECT level , id, mid, name , ( SELECT name FROM t1 WHERE id= cteReports.mid) AS mname INTO t2 FROM cteReports ORDER BY level , mid; Message number: 240, Severity 16, State: 1, Line: 1, Msg: 'Types don't match between the anchor and the recursive part in column "mid" of recursive query "cteReports".'
            cvicentiu Vicențiu Ciorbaru added a comment - - edited

            We should experiment with Postgres and SQL server to see exactly what "matching types" mean and how we should handle such cases. Possible options are:

            • Report error if types are not identical (or perhaps not compatible, leading to data loss)
            • Allow queries where recursive part has smaller domain that is included by non-recursive part.
            cvicentiu Vicențiu Ciorbaru added a comment - - edited We should experiment with Postgres and SQL server to see exactly what "matching types" mean and how we should handle such cases. Possible options are: Report error if types are not identical (or perhaps not compatible, leading to data loss) Allow queries where recursive part has smaller domain that is included by non-recursive part.

            So now we allow the second option and it looks like the Standard does not say explicitly that we are wrong.
            The Standard is explicit here: the type of a recursive CTE is determined exclusively by non-recursive selects of the specification.

            Alexander,
            Do you want the same behaviour as in Postgres?

            igor Igor Babaev (Inactive) added a comment - So now we allow the second option and it looks like the Standard does not say explicitly that we are wrong. The Standard is explicit here: the type of a recursive CTE is determined exclusively by non-recursive selects of the specification. Alexander, Do you want the same behaviour as in Postgres?

            Igor, thanks for clarifying.

            Now it silently truncates data. There is a big chance to loose data unintentionally.

            I think returning an error (like PostgreSQL and SQL Server do) would be better than the current behavior.

            But I'd prefer collecting all data types (like a regular UNION does), from both non-recursive and recursive parts.
            Our regular UNION implementation is much more flexible comparing to the Standard. It's confusing that UNIONs inside CTE and UNIONs outside of CTE behave differently.

            bar Alexander Barkov added a comment - Igor, thanks for clarifying. Now it silently truncates data. There is a big chance to loose data unintentionally. I think returning an error (like PostgreSQL and SQL Server do) would be better than the current behavior. But I'd prefer collecting all data types (like a regular UNION does), from both non-recursive and recursive parts. Our regular UNION implementation is much more flexible comparing to the Standard. It's confusing that UNIONs inside CTE and UNIONs outside of CTE behave differently.

            Alexander,
            Any ideas how such an implementation could work?
            The Standard discards the recursive part because it can't figure out what type it could be of: the recursive references does not allow to do it.
            Suppose you have two selects in the specification: a non-recursive S1 and a recursive S2. You take the type of S1 T0 and use it to calculate the type of S2 T1. If T0=T1 the type of the union will be T1. If not the type of the union will be T2. If T2=T3 you are done. If not, you have to calculate the type of S2 again and it might happen to be a new type T4. And so on and so on.

            igor Igor Babaev (Inactive) added a comment - Alexander, Any ideas how such an implementation could work? The Standard discards the recursive part because it can't figure out what type it could be of: the recursive references does not allow to do it. Suppose you have two selects in the specification: a non-recursive S1 and a recursive S2. You take the type of S1 T0 and use it to calculate the type of S2 T1. If T0=T1 the type of the union will be T1. If not the type of the union will be T2. If T2=T3 you are done. If not, you have to calculate the type of S2 again and it might happen to be a new type T4. And so on and so on.
            bar Alexander Barkov added a comment - - edited

            Sounds complicated indeed.
            Let's go with returning an error?

            Only the same or a sub-type should be allowed in the recursive part in this case, right?

            How to we detect "same or sub-type"? It's not that easy.
            It should take into account at least the following:
            1. cmp_type() should generally be the same, but with many exceptions:
            a. fill the difference between TIME, DATE, DATETIME)
            b. allow INT->DECIMAL conversion
            c. allow INT->DOUBLE conversion
            d. perhaps allow INT->STRING conversion
            2. unsigned_flag should be the same (but not necessarily for INT->DECIMAL, INT->DOUBLE and INT->BIGINT conversion)
            3. collation.collation should be the same, collation aggregation should not cause ILLEGAL MIX.
            4. decimals should be the same or smaller
            5. max_length should be the same or smaller
            6. ENUM/SET/GEOMETRY will need some special treatment.

            It's easier to call Item_type_holder::join_type() on every iteration on the recursive part.
            If type or attributes have changed, then return an error.

            bar Alexander Barkov added a comment - - edited Sounds complicated indeed. Let's go with returning an error? Only the same or a sub-type should be allowed in the recursive part in this case, right? How to we detect "same or sub-type"? It's not that easy. It should take into account at least the following: 1. cmp_type() should generally be the same, but with many exceptions: a. fill the difference between TIME, DATE, DATETIME) b. allow INT->DECIMAL conversion c. allow INT->DOUBLE conversion d. perhaps allow INT->STRING conversion 2. unsigned_flag should be the same (but not necessarily for INT->DECIMAL, INT->DOUBLE and INT->BIGINT conversion) 3. collation.collation should be the same, collation aggregation should not cause ILLEGAL MIX. 4. decimals should be the same or smaller 5. max_length should be the same or smaller 6. ENUM/SET/GEOMETRY will need some special treatment. It's easier to call Item_type_holder::join_type() on every iteration on the recursive part. If type or attributes have changed, then return an error.

            I'm doing UNION related refactoring for pluggable data types which is touching the relevant code. To avoid merge troubles, I'll fix this bug after my refactoring and will ask you to review, Igor. Is it fine with you?

            bar Alexander Barkov added a comment - I'm doing UNION related refactoring for pluggable data types which is touching the relevant code. To avoid merge troubles, I'll fix this bug after my refactoring and will ask you to review, Igor. Is it fine with you?

            Alexander,

            What I consider as a real compatibility problem of 10.2 is the following.

            create table t1 (a int);
            insert into t values (3), (1), (2);
            create view v1(c) as select a+1 from t1;
             
            MariaDB [test]> show columns from t1;
            +-------+---------+------+-----+---------+-------+
            | Field | Type    | Null | Key | Default | Extra |
            +-------+---------+------+-----+---------+-------+
            | a     | int(11) | YES  |     | NULL    |       |
            +-------+---------+------+-----+---------+-------+
             
            MariaDB [test]> show columns from v1;
            +-------+------------+------+-----+---------+-------+
            | Field | Type       | Null | Key | Default | Extra |
            +-------+------------+------+-----+---------+-------+
            | c     | bigint(12) | YES  |     | NULL    |       |
            +-------+------------+------+-----+---------+-------+
            

            This is not a result of some bug in the code for views.
            The type of (a+1) is considered as bigint everywhere in the server. And this is not compliant with the Standard and this is not compatible with the other servers.

            So you have to resolve this problem before resolving the problem of the proper type aggregation for UNION.
            (Maybe You've already resolved this problem for 10.3?)

            igor Igor Babaev (Inactive) added a comment - Alexander, What I consider as a real compatibility problem of 10.2 is the following. create table t1 (a int); insert into t values (3), (1), (2); create view v1(c) as select a+1 from t1;   MariaDB [test]> show columns from t1; +-------+---------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+---------+------+-----+---------+-------+ | a | int(11) | YES | | NULL | | +-------+---------+------+-----+---------+-------+   MariaDB [test]> show columns from v1; +-------+------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------+------+-----+---------+-------+ | c | bigint(12) | YES | | NULL | | +-------+------------+------+-----+---------+-------+ This is not a result of some bug in the code for views. The type of (a+1) is considered as bigint everywhere in the server. And this is not compliant with the Standard and this is not compatible with the other servers. So you have to resolve this problem before resolving the problem of the proper type aggregation for UNION. (Maybe You've already resolved this problem for 10.3?)

            It's considered as bigint to guarantee that the result will fit into the column without overflow.
            Which data type should it be according to the Standard? I'm afraid if the Standard says it must be INT, we won't change to INT.
            Anyway, my changes in the UNION code are not related to data type of a+1. The order doesn't matter.

            bar Alexander Barkov added a comment - It's considered as bigint to guarantee that the result will fit into the column without overflow. Which data type should it be according to the Standard? I'm afraid if the Standard says it must be INT, we won't change to INT. Anyway, my changes in the UNION code are not related to data type of a+1. The order doesn't matter.

            igor, please take over this bug.

            bar Alexander Barkov added a comment - igor , please take over this bug.

            Got struck by this today, trying to use some examples to perform cycle checks, and ORDER BY SYBLING equivalent, found in various PostgreSQL related "how to port CONNECT BY PRIOR to CTE" posts and manual snippets. Not having a native array type like postgreSQL, I tried to collect ID paths in either a comma separated string, to check with FIND_IN_ROW, using CONCAT to append, or JSON_ARRAY_CREATE in the base query and JSON_APPEND in the recursive part, but ended up only having the first ID collected in both cases.

            To make matters worse: no warning about the truncation was found in SHOW WARNINGS, and this worked even with STRICT sql mode, which should not allow silent truncation at all. My minimal test query came down to

            with recursive r1 (id, msg) as (
              SELECT 1 as id, 'x' as msg
            UNION
              SELECT 2 as id, 'abc' as msg FROM r1 WHERE r1.id = 1
            ) select id, msg from r1;
            

            With expected result – which is also what PostgreSQL and Oracle 18xe return:

            +------+------+
            | id   | msg  |
            +------+------+
            |    1 | x    |
            |    2 | abc  |
            +------+------+
            

            But actual result:

            +------+------+
            | id   | msg  |
            +------+------+
            |    1 | x    |
            |    2 | a    |
            +------+------+
            

            hholzgra Hartmut Holzgraefe added a comment - Got struck by this today, trying to use some examples to perform cycle checks, and ORDER BY SYBLING equivalent, found in various PostgreSQL related "how to port CONNECT BY PRIOR to CTE" posts and manual snippets. Not having a native array type like postgreSQL, I tried to collect ID paths in either a comma separated string, to check with FIND_IN_ROW, using CONCAT to append, or JSON_ARRAY_CREATE in the base query and JSON_APPEND in the recursive part, but ended up only having the first ID collected in both cases. To make matters worse: no warning about the truncation was found in SHOW WARNINGS, and this worked even with STRICT sql mode, which should not allow silent truncation at all. My minimal test query came down to with recursive r1 (id, msg) as ( SELECT 1 as id, 'x' as msg UNION SELECT 2 as id, 'abc' as msg FROM r1 WHERE r1.id = 1 ) select id, msg from r1; With expected result – which is also what PostgreSQL and Oracle 18xe return: +------+------+ | id | msg | +------+------+ | 1 | x | | 2 | abc | +------+------+ But actual result: +------+------+ | id | msg | +------+------+ | 1 | x | | 2 | a | +------+------+

            On the other hand a simple UNION returns the expected result

            MariaDB [(none)]> SELECT 1 as id, 'x' as msg
                ->  UNION ALL 
                -> SELECT 2 as id, 'abc' as msg;
            +----+-----+
            | id | msg |
            +----+-----+
            |  1 | x   |
            |  2 | abc |
            +----+-----+
            

            which makes it even more surprising that within the body of a `WITH RECURSION ... (...)` it does not.

            hholzgra Hartmut Holzgraefe added a comment - On the other hand a simple UNION returns the expected result MariaDB [(none)]> SELECT 1 as id, 'x' as msg -> UNION ALL -> SELECT 2 as id, 'abc' as msg; +----+-----+ | id | msg | +----+-----+ | 1 | x | | 2 | abc | +----+-----+ which makes it even more surprising that within the body of a `WITH RECURSION ... (...)` it does not.

            MySQL 8.0.15 also doesn't produce expected output, but at least fails with

            "ERROR 1406 (22001): Data too long for column 'msg' at row 1"

            hholzgra Hartmut Holzgraefe added a comment - MySQL 8.0.15 also doesn't produce expected output, but at least fails with "ERROR 1406 (22001): Data too long for column 'msg' at row 1"

            In a normal UNION case, the type of column is the 'union type' of all respective columns at that position.
            This works for a normal SELECT ... UNION ... SELECT

            It looks like with CTE:s the expression for column 3, 'mid' in the following query

            SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL
            UNION ALL
            SELECT r.level + 1, e.id, e.mid + 1000000000000, e.name FROM t1 e
            INNER JOIN cteReports r ON e.mid = r.id...

            Is always taken from the SELECT 1,id,mid part and not of an union of types.
            This happens even if I swap the parts of the UNION's.
            If I change the mid to mid + 1.0 or mid + 10000000000, the type of the mid column
            changes according to the type of this column, not the union of columns

            This can be seen with the following query:
            CREATE or replace TABLE t2 AS
            WITH RECURSIVE
            cteReports (level, id, mid2, name) AS
            (
            SELECT r.level + 1, e.id, "bbb", e.name FROM t1 e
            INNER JOIN cteReports r ON e.mid = r.id
            UNION ALL
            SELECT 1, id, "aa", name FROM t1 WHERE mid IS NULL
            )
            SELECT
            level, id, mid, name, "aa"
            FROM cteReports
            ORDER BY level, mid;
            show create table t2;

            Where mid gets the type: `mid` varchar(2) DEFAULT NULL,

            monty Michael Widenius added a comment - In a normal UNION case, the type of column is the 'union type' of all respective columns at that position. This works for a normal SELECT ... UNION ... SELECT It looks like with CTE:s the expression for column 3, 'mid' in the following query SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL UNION ALL SELECT r.level + 1, e.id, e.mid + 1000000000000, e.name FROM t1 e INNER JOIN cteReports r ON e.mid = r.id... Is always taken from the SELECT 1,id,mid part and not of an union of types. This happens even if I swap the parts of the UNION's. If I change the mid to mid + 1.0 or mid + 10000000000, the type of the mid column changes according to the type of this column, not the union of columns This can be seen with the following query: CREATE or replace TABLE t2 AS WITH RECURSIVE cteReports (level, id, mid2, name) AS ( SELECT r.level + 1, e.id, "bbb", e.name FROM t1 e INNER JOIN cteReports r ON e.mid = r.id UNION ALL SELECT 1, id, "aa", name FROM t1 WHERE mid IS NULL ) SELECT level, id, mid, name, "aa" FROM cteReports ORDER BY level, mid; show create table t2; Where mid gets the type: `mid` varchar(2) DEFAULT NULL,

            Here's a proof that with the current procedure of determining the type of a union we can't reach a fixed point to determine the type of a recursive function in a general case:

            MariaDB [test]> create table t_0 as select 'a' as col;
            Query OK, 1 row affected (0.037 sec)
            Records: 1  Duplicates: 0  Warnings: 0
             
            MariaDB [test]> show create table t_0;
            +-------+------------------------------------------------------------------------------------------------------------+
            | Table | Create Table                                                                                               |
            +-------+------------------------------------------------------------------------------------------------------------+
            | t_0   | CREATE TABLE `t_0` (
              `col` varchar(1) CHARACTER SET utf8 NOT NULL
            ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
            +-------+------------------------------------------------------------------------------------------------------------+
            1 row in set (0.000 sec)
             
            MariaDB [test]> create table t_1 as select col from t_0 union select concat(t_0.col,'b') from t_0;
            Query OK, 2 rows affected (0.035 sec)
            Records: 2  Duplicates: 0  Warnings: 0
             
            MariaDB [test]> show create table t_1;
            +-------+----------------------------------------------------------------------------------------------------------------+
            | Table | Create Table                                                                                                   |
            +-------+----------------------------------------------------------------------------------------------------------------+
            | t_1   | CREATE TABLE `t_1` (
              `col` varchar(2) CHARACTER SET utf8 DEFAULT NULL
            ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
            +-------+----------------------------------------------------------------------------------------------------------------+
            1 row in set (0.001 sec)
             
            MariaDB [test]> create table t_2 as select col from t_1 union select concat(t_1.col,'b') from t_1;Query OK, 3 rows affected (0.033 sec)
            Records: 3  Duplicates: 0  Warnings: 0
             
            MariaDB [test]> show create table t_2;
            +-------+----------------------------------------------------------------------------------------------------------------+
            | Table | Create Table                                                                                                   |
            +-------+----------------------------------------------------------------------------------------------------------------+
            | t_2   | CREATE TABLE `t_2` (
              `col` varchar(3) CHARACTER SET utf8 DEFAULT NULL
            ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
            +-------+----------------------------------------------------------------------------------------------------------------+
            1 row in set (0.001 sec)
            

            igor Igor Babaev (Inactive) added a comment - Here's a proof that with the current procedure of determining the type of a union we can't reach a fixed point to determine the type of a recursive function in a general case: MariaDB [test]> create table t_0 as select 'a' as col; Query OK, 1 row affected (0.037 sec) Records: 1 Duplicates: 0 Warnings: 0   MariaDB [test]> show create table t_0; +-------+------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+------------------------------------------------------------------------------------------------------------+ | t_0 | CREATE TABLE `t_0` ( `col` varchar(1) CHARACTER SET utf8 NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+------------------------------------------------------------------------------------------------------------+ 1 row in set (0.000 sec)   MariaDB [test]> create table t_1 as select col from t_0 union select concat(t_0.col,'b') from t_0; Query OK, 2 rows affected (0.035 sec) Records: 2 Duplicates: 0 Warnings: 0   MariaDB [test]> show create table t_1; +-------+----------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+----------------------------------------------------------------------------------------------------------------+ | t_1 | CREATE TABLE `t_1` ( `col` varchar(2) CHARACTER SET utf8 DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+----------------------------------------------------------------------------------------------------------------+ 1 row in set (0.001 sec)   MariaDB [test]> create table t_2 as select col from t_1 union select concat(t_1.col,'b') from t_1;Query OK, 3 rows affected (0.033 sec) Records: 3 Duplicates: 0 Warnings: 0   MariaDB [test]> show create table t_2; +-------+----------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+----------------------------------------------------------------------------------------------------------------+ | t_2 | CREATE TABLE `t_2` ( `col` varchar(3) CHARACTER SET utf8 DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+----------------------------------------------------------------------------------------------------------------+ 1 row in set (0.001 sec)

            Returning an error message when the types of the recursive and non-recursive parts are not the same (as PostgreSQL and others do) is possible, but not trivial. The fact is we support mutually recursive CTE when recursive CTE A uses recursive CTE B that uses recursive CTE C that uses recursive CTE D that uses A. We do it in full compliance with the Standard. A jump to another type may happen in the definition of any of these CTE. Other databases (including MySQL) do not support mutually recursive CTE so it's easier for them to report an error in this case.

            igor Igor Babaev (Inactive) added a comment - Returning an error message when the types of the recursive and non-recursive parts are not the same (as PostgreSQL and others do) is possible, but not trivial. The fact is we support mutually recursive CTE when recursive CTE A uses recursive CTE B that uses recursive CTE C that uses recursive CTE D that uses A. We do it in full compliance with the Standard. A jump to another type may happen in the definition of any of these CTE. Other databases (including MySQL) do not support mutually recursive CTE so it's easier for them to report an error in this case.
            monty Michael Widenius added a comment - - edited

            I will try to here summary the things learned from this MDEV:

            For recursive CTE:s, according to the SQL standard, the type of the created columns are taken from the non recursive part of a CTE.
            This means that for the query:

            WITH RECURSIVE
            cteReports (level, id, mid, name) AS
            (
              SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL
              UNION ALL
              SELECT r.level + 1, e.id, e.mid + 1000000000000, e.name FROM t1 e
              INNER JOIN cteReports r ON e.mid = r.id
            )
            SELECT
              level, id, mid, name,
              (SELECT name FROM t1 WHERE id= cteReports.mid) AS mname
            FROM cteReports 
            ORDER BY level, mid;
            

            The type of the result from the CTE is taken from:

            SELECT 1, id, mid, name FROM t1;
            

            This means that the type of mid should be int (as it it is defined in t1).

            If we follow this logic, the bug is that we don't get a warning or error (depending on strict mode) when trying to store 2e.mid + 1000000000000" into an int column. Adding this error should be trivial and I will look into creating a patch for this. This is, as far as I understand, what the SQL standard would require.

            An extension to the SQL standard that could be considered in case of recursive CTE's:
            When there are expression for a column, we could extend the type to the 'biggest possible' of that type. This means that we would do the following type conversations for the result:
            Int -> bigint
            char/varchar -> blob
            decimal(X,Y) -> decimal(65,Y)

            For the moment we are considering to follow the SQL standard and give an error if the result of an expression causes an overflow.

            For the cases that one gets an error, one can always fix it by adding a cast for the 'non recursive part' of the CTE:

            WITH RECURSIVE 
            cteReports (level, id, mid, name) AS 
            (
              SELECT 1, id, cast(mid as double), name FROM t1 WHERE mid IS NULL 
              UNION ALL
              SELECT r.level + 1, e.id, e.mid + 1000000000000, e.name FROM t1 e 
               INNER JOIN cteReports r ON e.mid = r.id 
            ) .....
            

            monty Michael Widenius added a comment - - edited I will try to here summary the things learned from this MDEV: For recursive CTE:s, according to the SQL standard, the type of the created columns are taken from the non recursive part of a CTE. This means that for the query: WITH RECURSIVE cteReports (level, id, mid, name) AS ( SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL UNION ALL SELECT r.level + 1, e.id, e.mid + 1000000000000, e.name FROM t1 e INNER JOIN cteReports r ON e.mid = r.id ) SELECT level, id, mid, name, (SELECT name FROM t1 WHERE id= cteReports.mid) AS mname FROM cteReports ORDER BY level, mid; The type of the result from the CTE is taken from: SELECT 1, id, mid, name FROM t1; This means that the type of mid should be int (as it it is defined in t1). If we follow this logic, the bug is that we don't get a warning or error (depending on strict mode) when trying to store 2e.mid + 1000000000000" into an int column. Adding this error should be trivial and I will look into creating a patch for this. This is, as far as I understand, what the SQL standard would require. An extension to the SQL standard that could be considered in case of recursive CTE's: When there are expression for a column, we could extend the type to the 'biggest possible' of that type. This means that we would do the following type conversations for the result: Int -> bigint char/varchar -> blob decimal(X,Y) -> decimal(65,Y) For the moment we are considering to follow the SQL standard and give an error if the result of an expression causes an overflow. For the cases that one gets an error, one can always fix it by adding a cast for the 'non recursive part' of the CTE: WITH RECURSIVE cteReports (level, id, mid, name) AS ( SELECT 1, id, cast(mid as double), name FROM t1 WHERE mid IS NULL UNION ALL SELECT r.level + 1, e.id, e.mid + 1000000000000, e.name FROM t1 e INNER JOIN cteReports r ON e.mid = r.id ) .....

            Please review a556c4d972eaa09bb8ad586b9bda6cfb2a7b574a (in bb-10.3-monty)

            monty Michael Widenius added a comment - Please review a556c4d972eaa09bb8ad586b9bda6cfb2a7b574a (in bb-10.3-monty)

            The standard SQL:2016 says that in a recursive CTE column types are defined by the non-recursive part in the part 2, section 7.17 <query expression>, Syntax Rules, paragraph 20) b) iv) 3) B).

            That says that if query expression body contains a UNION and is the result of an anchor expression and i-th column is recursively referred to then the declared type of the i-th column is the same as the declared type of the i-th column of the non-recursive part of the UNION.

            serg Sergei Golubchik added a comment - The standard SQL:2016 says that in a recursive CTE column types are defined by the non-recursive part in the part 2, section 7.17 <query expression>, Syntax Rules, paragraph 20) b) iv) 3) B). That says that if query expression body contains a UNION and is the result of an anchor expression and i -th column is recursively referred to then the declared type of the i -th column is the same as the declared type of the i -th column of the non-recursive part of the UNION.

            monty, the behavior you've implemented is inconsistent with existing cases. For example in

            select 'a' union all select repeat('b',seq) from seq_1_to_5;
            

            the result column is MEDIUMBLOB with the length as reported in the protocol being 50331645. When going over it as in

            select 'a' union all select repeat('b',seq) from seq_50331640_to_50331644;
            

            the result is silently truncated.
            In a CREATE ... SELECT case the strict mode defines whether it should be a warning or an error.

            The patch changes the behavior for recursive CTEs only, this is inconsistent. All cases when the generated value is longer than the protocol reported length should behave similarly.

            serg Sergei Golubchik added a comment - monty , the behavior you've implemented is inconsistent with existing cases. For example in select 'a' union all select repeat( 'b' ,seq) from seq_1_to_5; the result column is MEDIUMBLOB with the length as reported in the protocol being 50331645. When going over it as in select 'a' union all select repeat( 'b' ,seq) from seq_50331640_to_50331644; the result is silently truncated. In a CREATE ... SELECT case the strict mode defines whether it should be a warning or an error. The patch changes the behavior for recursive CTEs only, this is inconsistent. All cases when the generated value is longer than the protocol reported length should behave similarly.

            Fix pushed to 10.3

            This patch adds code to abort the CTE if the calculated values in the
            recursive part does not fit in the fields in the created temporary table.
            This affects strict mode and one is not using CREATE ... IGNORE or INSERT IGNORE

            The new code only affects recursive CTE, so it should not cause any notable
            problems for old applications.

            monty Michael Widenius added a comment - Fix pushed to 10.3 This patch adds code to abort the CTE if the calculated values in the recursive part does not fit in the fields in the created temporary table. This affects strict mode and one is not using CREATE ... IGNORE or INSERT IGNORE The new code only affects recursive CTE, so it should not cause any notable problems for old applications.

            People

              monty Michael Widenius
              bar Alexander Barkov
              Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.