[MDEV-8402] Bug #77473 Truncated data with subquery & UTF8 Created: 2015-06-30  Updated: 2016-06-29  Resolved: 2016-06-10

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 5.5, 10.0, 10.1
Fix Version/s: 5.5.50, 10.0.26, 10.1.15

Type: Bug Priority: Major
Reporter: Hartmut Holzgraefe Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: upstream-fixed

Sprint: 5.5.50

 Description   

REPEAT() and RPAD() results in a subquery get truncated if the length parameter is between 32768 and 65536 for 2-byte UTF-8 sequences, or between 21845 and 65535 . Below and above that point the result is correct.

set names utf8;
select length(data) as len from ( select repeat('ä', ...) as data ) as Sub;
 
parameter -> result
 
36766 -> 65532
32767 -> 65534
32678 -> 65534
...
65535 -> 65534
65536 -> 131072
65537 -> 131074

This is not a problem of the length function in the outer query, it is already the string returned from the inner query that gets truncated as can be seen with

mysql -Ne "set names utf8; select data from (select repeat('é', 40000) as data ) as Sub;" | wc
      1       1   65535

Looks as if the intermediate result length is calculated by character count and not by byte length, and so the wrong data type seems to be used to store the intermediate result. E.g. when using a unicode character that has a 3 byte UTF-8 representation:

select length(data) as len from ( select repeat('☃', ...) as data ) as Sub;
 
parameter -> result
 
21844 -> 65532
21845 -> 65535
21846 -> 65535 
...
65535 -> 65535
65536 -> 196608



 Comments   
Comment by Hartmut Holzgraefe [ 2015-06-30 ]

Upstream bug is http://bugs.mysql.com/77473

Comment by Elena Stepanova [ 2015-07-01 ]

There is also a warning issued when the problem occurs. In the output below, the third result is erroneous.

set names latin1;
select length(data) as len from ( select repeat('ä', 65535) as data ) as Sub;
len
131070
select length(data) as len from ( select repeat('ä', 65536) as data ) as Sub;
len
131072
set names utf8;
select length(data) as len from ( select repeat('ä', 65535) as data ) as Sub;
len
65534
Warnings:
Warning	1366	Incorrect string value: '\xC3\xA4\xC3\xA4\xC3\xA4...' for column 'data' at row 1
select length(data) as len from ( select repeat('ä', 65536) as data ) as Sub;
len
131072

Comment by Hartmut Holzgraefe [ 2016-03-07 ]

Oracle fix is here: https://github.com/mysql/mysql-server/commit/03cb5110c701afa43ddea082998ee6a54e731bf6

Comment by Elena Stepanova [ 2016-03-07 ]

Fixed in 5.7, see the commit above.

Generated at Thu Feb 08 07:26:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.