Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12325

Unexpected data type and truncation when using CTE

Details

    Description

      I have a table with a company employee hierarchy:

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1
      (
        id INT,     -- employee ID
        mid INT,    -- manager ID, or NULL if top level employee
        name TEXT   -- employee name
      );
      INSERT INTO t1 VALUES (0,NULL, 'Name');
      INSERT INTO t1 VALUES (1,0,    'Name1');
      INSERT INTO t1 VALUES (2,0,    'Name2');
      INSERT INTO t1 VALUES (11,1,   'Name11');
      INSERT INTO t1 VALUES (12,1,   'Name12');
      

      Now I want to print the company hierarchy as a table that additionally includes employee hierarchy levels and manager names. I use a recursive CTE for that:

      DROP TABLE IF EXISTS t2;
      CREATE TABLE t2 AS
      WITH RECURSIVE
      cteReports (level, id, mid, name) AS
      (
        SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL
        UNION ALL
        SELECT r.level + 1, e.id, e.mid, e.name FROM t1 e
        INNER JOIN cteReports r ON e.mid = r.id
      )
      SELECT
        level, id, mid, name,
        (SELECT name FROM t1 WHERE id= cteReports.mid) AS mname
      FROM cteReports 
      ORDER BY level, mid;
      SHOW CREATE TABLE t2;
      SELECT * FROM t2;
      

      It creates a table with the expected structure:

      +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Table | Create Table                                                                                                                                                                                                          |
      +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | t2    | CREATE TABLE `t2` (
        `level` bigint(1) NOT NULL DEFAULT 0,
        `id` int(11) DEFAULT NULL,
        `mid` int(11) DEFAULT NULL,
        `name` text DEFAULT NULL,
        `mname` text DEFAULT NULL
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
      +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      

      and with correct data:

      +-------+------+------+--------+-------+
      | level | id   | mid  | name   | mname |
      +-------+------+------+--------+-------+
      |     1 |    0 | NULL | Name   | NULL  |
      |     2 |    1 |    0 | Name1  | Name  |
      |     2 |    2 |    0 | Name2  | Name  |
      |     3 |   11 |    1 | Name11 | Name1 |
      |     3 |   12 |    1 | Name12 | Name1 |
      +-------+------+------+--------+-------+
      

      Now I want for some reasons to add some big number to manager IDs (mid):

      DROP TABLE IF EXISTS t2;
      CREATE TABLE t2 AS
      WITH RECURSIVE
      cteReports (level, id, mid, name) AS
      (
        SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL
        UNION ALL
        SELECT r.level + 1, e.id, e.mid + 1000000000000, e.name FROM t1 e
        INNER JOIN cteReports r ON e.mid = r.id
      )
      SELECT
        level, id, mid, name,
        (SELECT name FROM t1 WHERE id= cteReports.mid) AS mname
      FROM cteReports 
      ORDER BY level, mid;
      SHOW CREATE TABLE t2;
      SELECT * FROM t2;
      

      It creates exactly the same table to the one in the previous script and truncates data silently:

      +-------+------+------------+--------+-------+
      | level | id   | mid        | name   | mname |
      +-------+------+------------+--------+-------+
      |     1 |    0 |       NULL | Name   | NULL  |
      |     2 |    1 | 2147483647 | Name1  | NULL  |
      |     2 |    2 | 2147483647 | Name2  | NULL  |
      |     3 |   11 | 2147483647 | Name11 | NULL  |
      |     3 |   12 | 2147483647 | Name12 | NULL  |
      +-------+------+------------+--------+-------+
      

      This looks wrong. Instead if INT, the expected column type for mid should be BIGINT or DECIMAL, and no data truncation should happen.

      Attachments

        Issue Links

          Activity

            monty Michael Widenius added a comment - - edited

            I will try to here summary the things learned from this MDEV:

            For recursive CTE:s, according to the SQL standard, the type of the created columns are taken from the non recursive part of a CTE.
            This means that for the query:

            WITH RECURSIVE
            cteReports (level, id, mid, name) AS
            (
              SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL
              UNION ALL
              SELECT r.level + 1, e.id, e.mid + 1000000000000, e.name FROM t1 e
              INNER JOIN cteReports r ON e.mid = r.id
            )
            SELECT
              level, id, mid, name,
              (SELECT name FROM t1 WHERE id= cteReports.mid) AS mname
            FROM cteReports 
            ORDER BY level, mid;
            

            The type of the result from the CTE is taken from:

            SELECT 1, id, mid, name FROM t1;
            

            This means that the type of mid should be int (as it it is defined in t1).

            If we follow this logic, the bug is that we don't get a warning or error (depending on strict mode) when trying to store 2e.mid + 1000000000000" into an int column. Adding this error should be trivial and I will look into creating a patch for this. This is, as far as I understand, what the SQL standard would require.

            An extension to the SQL standard that could be considered in case of recursive CTE's:
            When there are expression for a column, we could extend the type to the 'biggest possible' of that type. This means that we would do the following type conversations for the result:
            Int -> bigint
            char/varchar -> blob
            decimal(X,Y) -> decimal(65,Y)

            For the moment we are considering to follow the SQL standard and give an error if the result of an expression causes an overflow.

            For the cases that one gets an error, one can always fix it by adding a cast for the 'non recursive part' of the CTE:

            WITH RECURSIVE 
            cteReports (level, id, mid, name) AS 
            (
              SELECT 1, id, cast(mid as double), name FROM t1 WHERE mid IS NULL 
              UNION ALL
              SELECT r.level + 1, e.id, e.mid + 1000000000000, e.name FROM t1 e 
               INNER JOIN cteReports r ON e.mid = r.id 
            ) .....
            

            monty Michael Widenius added a comment - - edited I will try to here summary the things learned from this MDEV: For recursive CTE:s, according to the SQL standard, the type of the created columns are taken from the non recursive part of a CTE. This means that for the query: WITH RECURSIVE cteReports (level, id, mid, name) AS ( SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL UNION ALL SELECT r.level + 1, e.id, e.mid + 1000000000000, e.name FROM t1 e INNER JOIN cteReports r ON e.mid = r.id ) SELECT level, id, mid, name, (SELECT name FROM t1 WHERE id= cteReports.mid) AS mname FROM cteReports ORDER BY level, mid; The type of the result from the CTE is taken from: SELECT 1, id, mid, name FROM t1; This means that the type of mid should be int (as it it is defined in t1). If we follow this logic, the bug is that we don't get a warning or error (depending on strict mode) when trying to store 2e.mid + 1000000000000" into an int column. Adding this error should be trivial and I will look into creating a patch for this. This is, as far as I understand, what the SQL standard would require. An extension to the SQL standard that could be considered in case of recursive CTE's: When there are expression for a column, we could extend the type to the 'biggest possible' of that type. This means that we would do the following type conversations for the result: Int -> bigint char/varchar -> blob decimal(X,Y) -> decimal(65,Y) For the moment we are considering to follow the SQL standard and give an error if the result of an expression causes an overflow. For the cases that one gets an error, one can always fix it by adding a cast for the 'non recursive part' of the CTE: WITH RECURSIVE cteReports (level, id, mid, name) AS ( SELECT 1, id, cast(mid as double), name FROM t1 WHERE mid IS NULL UNION ALL SELECT r.level + 1, e.id, e.mid + 1000000000000, e.name FROM t1 e INNER JOIN cteReports r ON e.mid = r.id ) .....

            Please review a556c4d972eaa09bb8ad586b9bda6cfb2a7b574a (in bb-10.3-monty)

            monty Michael Widenius added a comment - Please review a556c4d972eaa09bb8ad586b9bda6cfb2a7b574a (in bb-10.3-monty)

            The standard SQL:2016 says that in a recursive CTE column types are defined by the non-recursive part in the part 2, section 7.17 <query expression>, Syntax Rules, paragraph 20) b) iv) 3) B).

            That says that if query expression body contains a UNION and is the result of an anchor expression and i-th column is recursively referred to then the declared type of the i-th column is the same as the declared type of the i-th column of the non-recursive part of the UNION.

            serg Sergei Golubchik added a comment - The standard SQL:2016 says that in a recursive CTE column types are defined by the non-recursive part in the part 2, section 7.17 <query expression>, Syntax Rules, paragraph 20) b) iv) 3) B). That says that if query expression body contains a UNION and is the result of an anchor expression and i -th column is recursively referred to then the declared type of the i -th column is the same as the declared type of the i -th column of the non-recursive part of the UNION.

            monty, the behavior you've implemented is inconsistent with existing cases. For example in

            select 'a' union all select repeat('b',seq) from seq_1_to_5;
            

            the result column is MEDIUMBLOB with the length as reported in the protocol being 50331645. When going over it as in

            select 'a' union all select repeat('b',seq) from seq_50331640_to_50331644;
            

            the result is silently truncated.
            In a CREATE ... SELECT case the strict mode defines whether it should be a warning or an error.

            The patch changes the behavior for recursive CTEs only, this is inconsistent. All cases when the generated value is longer than the protocol reported length should behave similarly.

            serg Sergei Golubchik added a comment - monty , the behavior you've implemented is inconsistent with existing cases. For example in select 'a' union all select repeat( 'b' ,seq) from seq_1_to_5; the result column is MEDIUMBLOB with the length as reported in the protocol being 50331645. When going over it as in select 'a' union all select repeat( 'b' ,seq) from seq_50331640_to_50331644; the result is silently truncated. In a CREATE ... SELECT case the strict mode defines whether it should be a warning or an error. The patch changes the behavior for recursive CTEs only, this is inconsistent. All cases when the generated value is longer than the protocol reported length should behave similarly.

            Fix pushed to 10.3

            This patch adds code to abort the CTE if the calculated values in the
            recursive part does not fit in the fields in the created temporary table.
            This affects strict mode and one is not using CREATE ... IGNORE or INSERT IGNORE

            The new code only affects recursive CTE, so it should not cause any notable
            problems for old applications.

            monty Michael Widenius added a comment - Fix pushed to 10.3 This patch adds code to abort the CTE if the calculated values in the recursive part does not fit in the fields in the created temporary table. This affects strict mode and one is not using CREATE ... IGNORE or INSERT IGNORE The new code only affects recursive CTE, so it should not cause any notable problems for old applications.

            People

              monty Michael Widenius
              bar Alexander Barkov
              Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.