[MDEV-12325] Unexpected data type and truncation when using CTE Created: 2017-03-22 Updated: 2023-09-19 Resolved: 2022-08-08 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Data types, Optimizer - CTE |
| Affects Version/s: | 10.2, 10.3 |
| Fix Version/s: | 10.3.36, 10.4.26, 10.5.17, 10.6.9, 10.7.5 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Alexander Barkov | Assignee: | Michael Widenius |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Epic Link: | Data type cleanups | ||||||||
| Description |
|
I have a table with a company employee hierarchy:
Now I want to print the company hierarchy as a table that additionally includes employee hierarchy levels and manager names. I use a recursive CTE for that:
It creates a table with the expected structure:
and with correct data:
Now I want for some reasons to add some big number to manager IDs (mid):
It creates exactly the same table to the one in the previous script and truncates data silently:
This looks wrong. Instead if INT, the expected column type for mid should be BIGINT or DECIMAL, and no data truncation should happen. |
| Comments |
| Comment by Alexander Barkov [ 2017-03-22 ] | ||||||||||||||||||||||||||||||||||||||||
|
PostgreSQL returns an error on attempt to execute the same script:
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2017-03-22 ] | ||||||||||||||||||||||||||||||||||||||||
|
If I run a similar script in SQL Server, it also returns an error (with the same idea like in PostgreSQL):
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Vicențiu Ciorbaru [ 2017-03-22 ] | ||||||||||||||||||||||||||||||||||||||||
|
We should experiment with Postgres and SQL server to see exactly what "matching types" mean and how we should handle such cases. Possible options are:
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Igor Babaev [ 2017-03-22 ] | ||||||||||||||||||||||||||||||||||||||||
|
So now we allow the second option and it looks like the Standard does not say explicitly that we are wrong. Alexander, | ||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2017-03-23 ] | ||||||||||||||||||||||||||||||||||||||||
|
Igor, thanks for clarifying. Now it silently truncates data. There is a big chance to loose data unintentionally. I think returning an error (like PostgreSQL and SQL Server do) would be better than the current behavior. But I'd prefer collecting all data types (like a regular UNION does), from both non-recursive and recursive parts. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Igor Babaev [ 2017-03-23 ] | ||||||||||||||||||||||||||||||||||||||||
|
Alexander, | ||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2017-03-23 ] | ||||||||||||||||||||||||||||||||||||||||
|
Sounds complicated indeed. Only the same or a sub-type should be allowed in the recursive part in this case, right? How to we detect "same or sub-type"? It's not that easy. It's easier to call Item_type_holder::join_type() on every iteration on the recursive part. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2017-04-26 ] | ||||||||||||||||||||||||||||||||||||||||
|
I'm doing UNION related refactoring for pluggable data types which is touching the relevant code. To avoid merge troubles, I'll fix this bug after my refactoring and will ask you to review, Igor. Is it fine with you? | ||||||||||||||||||||||||||||||||||||||||
| Comment by Igor Babaev [ 2017-04-26 ] | ||||||||||||||||||||||||||||||||||||||||
|
Alexander, What I consider as a real compatibility problem of 10.2 is the following.
This is not a result of some bug in the code for views. So you have to resolve this problem before resolving the problem of the proper type aggregation for UNION. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2017-04-26 ] | ||||||||||||||||||||||||||||||||||||||||
|
It's considered as bigint to guarantee that the result will fit into the column without overflow. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2019-03-21 ] | ||||||||||||||||||||||||||||||||||||||||
|
igor, please take over this bug. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2019-10-01 ] | ||||||||||||||||||||||||||||||||||||||||
|
Got struck by this today, trying to use some examples to perform cycle checks, and ORDER BY SYBLING equivalent, found in various PostgreSQL related "how to port CONNECT BY PRIOR to CTE" posts and manual snippets. Not having a native array type like postgreSQL, I tried to collect ID paths in either a comma separated string, to check with FIND_IN_ROW, using CONCAT to append, or JSON_ARRAY_CREATE in the base query and JSON_APPEND in the recursive part, but ended up only having the first ID collected in both cases. To make matters worse: no warning about the truncation was found in SHOW WARNINGS, and this worked even with STRICT sql mode, which should not allow silent truncation at all. My minimal test query came down to
With expected result – which is also what PostgreSQL and Oracle 18xe return:
But actual result:
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2019-10-01 ] | ||||||||||||||||||||||||||||||||||||||||
|
On the other hand a simple UNION returns the expected result
which makes it even more surprising that within the body of a `WITH RECURSION ... (...)` it does not. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2019-10-01 ] | ||||||||||||||||||||||||||||||||||||||||
|
MySQL 8.0.15 also doesn't produce expected output, but at least fails with "ERROR 1406 (22001): Data too long for column 'msg' at row 1" | ||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Widenius [ 2022-07-18 ] | ||||||||||||||||||||||||||||||||||||||||
|
In a normal UNION case, the type of column is the 'union type' of all respective columns at that position. It looks like with CTE:s the expression for column 3, 'mid' in the following query SELECT 1, id, mid, name FROM t1 WHERE mid IS NULL Is always taken from the SELECT 1,id,mid part and not of an union of types. This can be seen with the following query: Where mid gets the type: `mid` varchar(2) DEFAULT NULL, | ||||||||||||||||||||||||||||||||||||||||
| Comment by Igor Babaev [ 2022-07-18 ] | ||||||||||||||||||||||||||||||||||||||||
|
Here's a proof that with the current procedure of determining the type of a union we can't reach a fixed point to determine the type of a recursive function in a general case:
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Igor Babaev [ 2022-07-18 ] | ||||||||||||||||||||||||||||||||||||||||
|
Returning an error message when the types of the recursive and non-recursive parts are not the same (as PostgreSQL and others do) is possible, but not trivial. The fact is we support mutually recursive CTE when recursive CTE A uses recursive CTE B that uses recursive CTE C that uses recursive CTE D that uses A. We do it in full compliance with the Standard. A jump to another type may happen in the definition of any of these CTE. Other databases (including MySQL) do not support mutually recursive CTE so it's easier for them to report an error in this case. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Widenius [ 2022-08-05 ] | ||||||||||||||||||||||||||||||||||||||||
|
I will try to here summary the things learned from this MDEV: For recursive CTE:s, according to the SQL standard, the type of the created columns are taken from the non recursive part of a CTE.
The type of the result from the CTE is taken from:
This means that the type of mid should be int (as it it is defined in t1). If we follow this logic, the bug is that we don't get a warning or error (depending on strict mode) when trying to store 2e.mid + 1000000000000" into an int column. Adding this error should be trivial and I will look into creating a patch for this. This is, as far as I understand, what the SQL standard would require. An extension to the SQL standard that could be considered in case of recursive CTE's: For the moment we are considering to follow the SQL standard and give an error if the result of an expression causes an overflow. For the cases that one gets an error, one can always fix it by adding a cast for the 'non recursive part' of the CTE:
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Widenius [ 2022-08-05 ] | ||||||||||||||||||||||||||||||||||||||||
|
Please review a556c4d972eaa09bb8ad586b9bda6cfb2a7b574a (in bb-10.3-monty) | ||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2022-08-06 ] | ||||||||||||||||||||||||||||||||||||||||
|
The standard SQL:2016 says that in a recursive CTE column types are defined by the non-recursive part in the part 2, section 7.17 <query expression>, Syntax Rules, paragraph 20) b) iv) 3) B). That says that if query expression body contains a UNION and is the result of an anchor expression and i-th column is recursively referred to then the declared type of the i-th column is the same as the declared type of the i-th column of the non-recursive part of the UNION. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2022-08-06 ] | ||||||||||||||||||||||||||||||||||||||||
|
monty, the behavior you've implemented is inconsistent with existing cases. For example in
the result column is MEDIUMBLOB with the length as reported in the protocol being 50331645. When going over it as in
the result is silently truncated. The patch changes the behavior for recursive CTEs only, this is inconsistent. All cases when the generated value is longer than the protocol reported length should behave similarly. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Widenius [ 2022-08-08 ] | ||||||||||||||||||||||||||||||||||||||||
|
Fix pushed to 10.3 This patch adds code to abort the CTE if the calculated values in the The new code only affects recursive CTE, so it should not cause any notable |