[MDEV-24263] REPLACE() a string with NULL leads to unexpected result for non UTF-8 char sets Created: 2020-11-21 Updated: 2023-04-27 |
|
| Status: | Confirmed |
| Project: | MariaDB Server |
| Component/s: | Character Sets |
| Affects Version/s: | 10.1, 10.2, 10.3, 10.4, 10.5 |
| Fix Version/s: | 10.4, 10.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Lau | Assignee: | Oleksandr Byelkin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Description |
|
I expect both fields of the following query to be NULL, however only the utf8mb4_expected field actually is. Instead, the latin1_unexpected field returns the original string of foo, which is unexpected.
This seems to be in issue in all versions of MariaDB and all versions of MySQL below 8.0.x (works on MySQL 8.0.21). |
| Comments |
| Comment by Sergei Golubchik [ 2020-11-21 ] | |||||||||||||||||||||||||||
|
on the contrary, latin1 result is correct, there is no bar in foo, so the replacement doesn't take place, it does not matter what the value of the replacement string is. In the utf8mb4 case the code returns NULL too early, before making sure that the replacement value should be used. | |||||||||||||||||||||||||||
| Comment by Lau [ 2020-11-21 ] | |||||||||||||||||||||||||||
|
That is incorrect. REPLACE() should return NULL if any of its arguments is NULL. From Changes in MySQL 8.0.0 (2016-09-12, Development Milestone):
I checked the behavior of other string methods again, and they all work as expected and return NULL if any argument is NULL.
The closest one to REPLACE() of the functions I check is INSERT(), which will only insert a new string into an existing string, if the start position argument is within the length of the existing string. It also returns NULL if the new string to insert is NULL and the start position is larger than the existing string (meaning even if no string would be inserted). Therefore, the function behaves exactly as expected, by returning NULL if any argument is NULL, independent of whether the argument is actually used or not. This is also stated in the MySQL docs:
Therefore, the correct order is to first check all arguments for NULL and return NULL if any arguments is actually NULL. If no argument is NULL, then execute the actual logic of the function. And this needs to be consistent across all possible char sets/collations. | |||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2020-11-21 ] | |||||||||||||||||||||||||||
|
Technically REPLACE() should behave the way it is documented, which can be anything. Alas, this detail is not documented (yet).
in the first case NULL is not needed to determine the value of the expression and it does not make the result NULL. In the second case it is and it does. You can think of NULL as of "unknown value", it's neither true, nor false, it's unknown.
In the +, =, CONCAT(), SUBSTRING(), and LOCATE() case the result is clearly unknown if one of the arguments is unknown. INSERT() can be argued either way, but it is explicitly documented to return NULL if any argument is NULL. In the REPLACE() case the unknown value surely does not affect the result if the replacement never takes place. This is also the clear intention of the code — it does not evaluate the third argument until it knows that the replacement, indeed, will take place. | |||||||||||||||||||||||||||
| Comment by Lau [ 2020-11-21 ] | |||||||||||||||||||||||||||
|
Of course any issue can be argued in at least two directions. However, the point I am making here is, that the current behavior is unexpected. Not just from the standpoint, that there are different results for UTF-8 and other charsets, but also from the standpoint that if this string function returns a non-null value even if a parameter is NULL, than this is the only string function I know of that does that (if you know others, that conditionally return NULL for a NULL parameter, please state them). I recently took inventory over all functions our provider supports and documented them internally: from the 99 SQL functions we currently translate, all that take arguments return NULL if any argument is NULL, with the exception of course for the special case of non-UTF-8 charsets in the REPLACE() function that I outlined and reported as a bug. Its true, that there is no official standard here. However, I looked at how SQL Server and PostgreSQL handle REPLACE():
PostgreSQL: As for MySQL, I made my case that the expected design can be seen in how the similar INSERT() function handles arguments. While it is true, that there is no explicit documentation yet for REPLACE() and NULL handling in the MySQL docs, the previously referenced release notes should be enough evidence for what the intended design is, and that returning a non-null value here has been categorized as a bug. That being said, since I am the lead developer of the Pomelo.EntityFrameworkCore.MySql provider and I do my best to keep the provider not just compatible with all officially supported version of MySQL but also with all officially supported version of MariaDB, any help in reasonable cases like this one is greatly appreciated. Since we have over 18,000 tests in place, we usually find all divergences between MariaDB and MySQL, and also the ones between each individual version update, and take the time to report them and then deal with them in our provider. | |||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2020-12-14 ] | |||||||||||||||||||||||||||
|
There are string functions that don't necessarily return NULL if some arguments are NULL:
And while CONCAT_WS is a special case, ELT clearly follows the logic "the result is not NULL unless NULL argument is used". FIELD and MAKE_SET just don't return NULL at all, so it looks. It's rather inconsistent, unfortunately. I'd rather pick one rule and stick with it. The difference in REPLACE behavior with single-byte and multi-byte character sets is unquestionably a bug. We will fix it. The question only is, what the behavior should be. And I lean towards the logical "the result is not NULL unless NULL argument is used" rule. | |||||||||||||||||||||||||||
| Comment by Lau [ 2020-12-15 ] | |||||||||||||||||||||||||||
While this might make sense for MariaDB as an island, I would strife for as much conformity with MySQL as reasonably possible. Since MySQL has categorized the old behavior as a bug, I think that MariaDB should implement the current MySQL behavior as well. After MySQL, I would strife for conformity with other major database systems. As already stated previously, Microsoft SQL Server and PostgreSQL implement the behavior in the same way that MySQL now does. Differences in details like this will inevitably lead to bugs that are very hard to track down (and therefore costly), when migrating from MySQL to MariaDB. | |||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2020-12-16 ] | |||||||||||||||||||||||||||
|
Okay, you win. I've tried SQL Server and PostreSQL, but also SQLite, DB2 and Firebase. Everybody returns NULL if the last argument is NULL. I still think it's illogical that REPLACE('a', 'b', NULL) IS UNKNOWN, as far as I can see it's perfectly known. But if every other implementation I've tried does the illogical thing, so will we. |