[MXS-1542] Invalid data from UTF16 strings in avrorouter Created: 2017-11-24  Updated: 2023-11-03  Resolved: 2018-01-04

Status: Closed
Project: MariaDB MaxScale
Component/s: avrorouter
Affects Version/s: 2.1.11
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: markus makela Assignee: markus makela
Resolution: Won't Fix Votes: 0
Labels: None

Sprint: 2017-49

 Description   

The avrorouter doesn't work when the utf16 character set is used. The following test case should generate the proper output but it produces empty strings instead.

CREATE TABLE t1 (data varchar(30) NOT NULL) DEFAULT CHARSET=utf16;
INSERT INTO t1 VALUES ("Hello World"), ("Բարեւ աշխարհ"), ("こんにちは世界"), ("你好,世界"), ("Привет мир");

Changing the charset to utf8mb4 generates correct output.



 Comments   
Comment by markus makela [ 2018-01-04 ]

The Avro file format does not appear to support UTF16:

a string is encoded as a long followed by that many bytes of UTF-8 encoded character data.
For example, the three-character string "foo" would be encoded as the long value 3 (encoded as hex 06) followed by the UTF-8 encoding of 'f', 'o', and 'o' (the hex bytes 66 6f 6f):
06 66 6f 6f

Comment by markus makela [ 2018-01-04 ]

As this would require a significant change in the way the files are stored, this will not be fixed.

Generated at Thu Feb 08 04:07:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.