[MDEV-24100] Failed to read test report file: Invalid byte 2 of 3-byte UTF-8 sequence. Created: 2020-10-06  Updated: 2020-11-24  Resolved: 2020-11-23

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.2.37

Type: Bug Priority: Critical
Reporter: Alexey Bychko (Inactive) Assignee: Rasmus Johansson (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: XML File mtr-normal.xml    
Issue Links:
Duplicate
Relates
relates to MDEV-23649 An invalid XML character in mysql_te... Closed

 Description   

xml test reports are not valid sometimes if contain unicode sequences.

abychko@Alexeys-Mini ~ % xmllint Downloads/mtr-normal.xml
Downloads/mtr-normal.xml:2749: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xE5 0x73 0x27 0x29
-slave-bin.000001	#	Annotate_rows	1	#	INSERT INTO t5(b) VALUES ('g�s')

we widely use xml reports from MTR, so we need to fix unicode output everywhere.

original line from xml:

-slave-bin.000001	#	Annotate_rows	1	#	INSERT INTO t5(b) VALUES ('gås')



 Comments   
Comment by Alexey Bychko (Inactive) [ 2020-10-06 ]

probably it's a ticket for all versions, because every version can dump unicode beyond utf-8 set

Comment by Marko Mäkelä [ 2020-11-18 ]

I would suggest the following deterministic test case:

SELECT _latin1 0xe527;

Write the same line to both a .test and a .result file. The diff command will produce a difference due to 2 lines missing from the .result. That difference will contain the invalid UTF-8 sequence 0xe5 0x27. The byte 0xe5 is supposed to be followed by a non-ASCII byte.

Generated at Thu Feb 08 09:27:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.