[MDEV-11928] Session charset applied to COLUMN_GET() first arg in 10.1 Created: 2017-01-27  Updated: 2017-03-07  Resolved: 2017-03-07

Status: Closed
Project: MariaDB Server
Component/s: Dynamic Columns
Affects Version/s: 10.1.20
Fix Version/s: 10.1.21

Type: Bug Priority: Major
Reporter: Tom Assignee: Oleksandr Byelkin
Resolution: Won't Fix Votes: 0
Labels: None

Sprint: 10.1.22

 Description   

I added this as a comment to MDEV-9429 and then realized it wasn't going to be seen and was in fact different.

The gist of it

SELECT 2 * COLUMN_GET(
    COLUMN_GET(
        COLUMN_CREATE('a', COLUMN_CREATE('b', 123)),
        'a' AS char
    ), 
    'b' AS char
)

returns 246 in 10.0 and -64 in 10.1.

I'm guessing this was added to the docs as resolution for MDEV-9429:

If you are trying to get a nested dynamic column as a string use 'as BINARY' as the last argument of COLUMN_GET (otherwise problems with character set conversion and illegal symbols are possible)...

but the wording doesn't exactly apply. I am not trying to get a nested column as a string. I have a previously functioning ORM that generates Maria-specific SQL, which wrote an SQL expression involving a nested dynamic column's value. The integer value of the expression is a function of server version.

I don't understand how the server works but it appears the server was changed so that in

  • column_get(column_get(…, 'foo' as char), …)

Maria 10.1 modifies the value of the 1st param of the outer column_get() by converting from the client charset (utf8) to binary? It's not clear to me:

  • why it should introduce this conversion when the context doesn't call for it (friendly context-appropriate conversion is apparent elsewhere in mysql, e.g. 2 * "2" gives 4)
  • how this change is an improvement
  • why give a spurious integer answer (and a warning) instead of no answer or an error

1. Before upgrade

In good-old 10.0, char was understood when it was inside a

MariaDB [(none)]> show variables like 'version';
+---------------+--------------------------------+
| Variable_name | Value                          |
+---------------+--------------------------------+
| version       | 10.0.27-MariaDB-1~trusty-wsrep |
+---------------+--------------------------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]>  SELECT (COLUMN_GET(COLUMN_GET(COLUMN_CREATE('child', COLUMN_CREATE('int', 123)), 'child' AS char), 'int' AS char) * 2);
+-----------------------------------------------------------------------------------------------------------------+
| (COLUMN_GET(COLUMN_GET(COLUMN_CREATE('child', COLUMN_CREATE('int', 123)), 'child' AS char), 'int' AS char) * 2) |
+-----------------------------------------------------------------------------------------------------------------+
|                                                                                                             246 |
+-----------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> SELECT (COLUMN_GET(COLUMN_GET(COLUMN_CREATE('child', COLUMN_CREATE('int', 123)), 'child' AS binary), 'int' AS char) * 2);
+-------------------------------------------------------------------------------------------------------------------+
| (COLUMN_GET(COLUMN_GET(COLUMN_CREATE('child', COLUMN_CREATE('int', 123)), 'child' AS binary), 'int' AS char) * 2) |
+-------------------------------------------------------------------------------------------------------------------+
|                                                                                                               246 |
+-------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

2. After upgrade

Now we need to do more work in our ORM

MariaDB [(none)]> show variables like 'version';
+---------------+-----------------+
| Variable_name | Value           |
+---------------+-----------------+
| version       | 10.1.20-MariaDB |
+---------------+-----------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> SELECT (COLUMN_GET(COLUMN_GET(COLUMN_CREATE('child', COLUMN_CREATE('int', 123)), 'child' AS char), 'int' AS char) * 2);
+-----------------------------------------------------------------------------------------------------------------+
| (COLUMN_GET(COLUMN_GET(COLUMN_CREATE('child', COLUMN_CREATE('int', 123)), 'child' AS char), 'int' AS char) * 2) |
+-----------------------------------------------------------------------------------------------------------------+
|                                                                                                             -64 |
+-----------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
 
MariaDB [(none)]> SHOW WARNINGS;
+---------+------+---------------------------------------+
| Level   | Code | Message                               |
+---------+------+---------------------------------------+
| Warning | 1300 | Invalid utf8 character string: '\xF6' |
+---------+------+---------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> SELECT (COLUMN_GET(COLUMN_GET(COLUMN_CREATE('child', COLUMN_CREATE('int', 123)), 'child' AS binary), 'int' AS char) * 2);
+-------------------------------------------------------------------------------------------------------------------+
| (COLUMN_GET(COLUMN_GET(COLUMN_CREATE('child', COLUMN_CREATE('int', 123)), 'child' AS binary), 'int' AS char) * 2) |
+-------------------------------------------------------------------------------------------------------------------+
|                                                                                                               246 |
+-------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)



 Comments   
Comment by Elena Stepanova [ 2017-02-01 ]

Thanks for the report.

Comment by Oleksandr Byelkin [ 2017-03-03 ]

It looks like signed/unsigned problem. Actually converting number to character and back looks for me dangerous in any case.

Comment by Oleksandr Byelkin [ 2017-03-03 ]

So problem is probably in setup which changed:
show variables like 'version';
Variable_name Value
version 10.1.20-MariaDB-debug
SELECT 2 * COLUMN_GET(
COLUMN_GET(
COLUMN_CREATE('a', COLUMN_CREATE('b', 123)),
'a' AS char
),
'b' AS char
);
2 * COLUMN_GET(
COLUMN_GET(
COLUMN_CREATE('a', COLUMN_CREATE('b', 123)),
'a' AS char
),
'b' AS char
)
246

The problem in your setup that it can't make utf-8 character from number. So I need to know what configuration used to reproduce the problem. or you will see yourself what to change to fix if you really need such dangerous tricks.

Comment by Oleksandr Byelkin [ 2017-03-07 ]

It looks like a bug in 10.0, it do not react on charset properly. Correct assignment should be with charset mentioning:
SELECT (COLUMN_GET(COLUMN_GET(COLUMN_CREATE('child', COLUMN_CREATE('int', 123)), 'child' AS char charset latin1), 'int' AS char charset latin1) * 2);

but 10.0 ignore them and even following processed as latin1:
SELECT (COLUMN_GET(COLUMN_GET(COLUMN_CREATE('child', COLUMN_CREATE('int', 123)), 'child' AS char charset utf8), 'int' AS char charset utf8) * 2);

Comment by Tom [ 2017-03-07 ]

Please confirm my understanding. We can avoid the dangers you mention if we access nested dynamic column values using AS binary in all but the outermost COLUMN_GET()

Thus, for example

COLUMN_GET(
    COLUMN_GET(
        COLUMN_GET(
            `my_blob_column`, 
            'foo' AS binary
        ),
        'bar' AS binary
    ),
    'baz' AS whatever
)

Where whatever is some type suitable to the app.

I made this assumption in the fix to the ORM

Comment by Oleksandr Byelkin [ 2017-03-07 ]

Thanks Alexandr Barkov for finding what changed in 10.1, it was fix of MDEV-7661 which bring correct behavior.

To make make conversion number->symbol stable and unreliable on external changes it is better to mention charset explicitly or in any other way (SET NAMES for examples) to set such charset where number you want has sens as a character, otherwise (like with number 123 and utf8) it will be replaced with '?' (what we have here).

Comment by Oleksandr Byelkin [ 2017-03-07 ]

Yes, binary or latin1 or something where all numbers correspond to some characters (actually yes, binary probably is better)

Comment by Oleksandr Byelkin [ 2017-03-07 ]

So I close the issue as it is actually consequence of bug fixed. If you have more questions ask and reopen it please.

Generated at Thu Feb 08 07:53:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.