[MDEV-24147] Encoding fails with user created function Created: 2020-11-05  Updated: 2021-01-31  Resolved: 2021-01-31

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Tarquin Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

CentOS
Server version: 10.3.17-MariaDB-log MariaDB Server



 Description   

My hosting service runs MariaDB on a CentOS server. I connect via SSH (utf-8) or via PHP (with encoding everywhere set to utf-8 as well). The same issue happens no matter which of these approaches I use.

> show session variables like 'character_set%';
+--------------------------+------------------------------+
| Variable_name            | Value                        |
+--------------------------+------------------------------+
| character_set_client     | utf8                         |
| character_set_connection | utf8                         |
| character_set_database   | latin1                       |
| character_set_filesystem | binary                       |
| character_set_results    | utf8                         |
| character_set_server     | latin1                       |
| character_set_system     | utf8                         |
| character_sets_dir       | /usr/share/mariadb/charsets/ |
+--------------------------+------------------------------+
> DELIMITER //
> CREATE FUNCTION return_string( str TEXT ) RETURNS TEXT CHARSET 'utf8'
> BEGIN
> RETURN str;
> END //
> DELIMITER ;
> SELECT return_string('ā');
ERROR 1366 (22007): Incorrect string value: '\xC4\x81' for column ``.``.`str` at row 1

Unless I am mistaken, this is a bug. It works correctly in MySQL. MariaDB seems to get confused about which charset it is using with the user function. A simple "SELECT 'ā';" works correctly.

SHOW CREATE FUNCTION return_string;
shows the expected status of the function as being utf-8.

This works correctly if you use this instead:
CREATE FUNCTION return_string( str TEXT CHARSET 'utf8' ) RETURNS TEXT CHARSET 'utf8'
specifying the input encoding too. However, the connection was already in utf-8, the function is created to return values in utf-8 and the database tables (in the case where I discovered this issue) were in utf-8. So it seems really unexpected/buggy for it to try to interpret the function input as latin1. Especially when MySQL treats the input as utf-8 (at least in the slightly older version that I have access to).



 Comments   
Comment by Elena Stepanova [ 2021-01-03 ]

As documented,

If the character set and collation are not specifically set, the database defaults at the time of creation will be used.

As your output shows, you have character_set_database=latin1, so that's what is used.
It works the same way in MySQL, but in MySQL due to different defaults you are more likely to have utf8<something> as character_set_database, which probably causes the difference you observe.

Comment by Tarquin [ 2021-01-31 ]

Thanks @elenst, great catch, that was exactly the difference. (Sorry, missed that message when it came in.)
MySQL install has character_set_database=utf8

Generated at Thu Feb 08 09:27:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.