[MDEV-32264] mariadb-dump doesn't utf-8 encode database name Created: 2023-09-27  Updated: 2023-10-12  Resolved: 2023-10-12

Status: Closed
Project: MariaDB Server
Component/s: N/A
Affects Version/s: 10.11
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Kjell Rilbe Assignee: Vladislav Vaintroub
Resolution: Won't Fix Votes: 0
Labels: backup, unicode
Environment:

Windows Server, x64, English locale, cmd.exe, any code page.


Issue Links:
Relates
relates to MDEV-26713 Windows - improve i18n support Closed

 Description   

Please refer to my Stack Overflow question about this issue.

In short, in Windows using `cmd.exe`, if I try to use a database name that contains the character "ö" with Unicode code point U+00F6, the command fails with these two errors:

mariadb-dump.exe: Error: 'Illegal mix of collations (utf8mb3_general_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation '='' when trying to dump tablespaces
mariadb-dump.exe: Got error: 1300: "Invalid utf8mb4 character string: 'dbf\xF6retag'" when selecting the database

This is a server with `character-set-server=utf8mb4` and the same for the database charset.

Regardless of the code page (`chcp`) being used in the `cmd.exe` session, the "ö" on the command line is apparently correctly interpreted as the Unicode character U+00F6, but when sending the database name to the server if puts that Unicode codepoint into the database name string "as is", instead of encoding it into utf-8.

The correct utf-8 encoding is \xc3\xB6. I can get a working command line if I replace the "ö" with "ö", because those two characters have Unicode code points U+00C3 and U+00B6, resulting in a string that contains "ö" if interpreted as utf-8.

I am unable to determine if the problem lies within `mariadb-dump.exe`, in a client library/connector that it makes use of, or if it's a problem at the server side.



 Comments   
Comment by Kjell Rilbe [ 2023-09-27 ]

Sorry, forgot to mention that this affects MariaDB version 10.11.5, Win64

Comment by Kjell Rilbe [ 2023-09-28 ]

The same problem applies to tables names on the command line.

The work-around is not useful if any of the characters have Unicode codepoints that are special characters in the code page used in the cmd.exe session. For example, capital "Ö" is problematic. That said, table names are not case sensitive, so that particular example is easy to get around.

Comment by Vladislav Vaintroub [ 2023-09-30 ]

krilbe, can you please also describe the exact version of your Windows, as in output of the "ver" command

Comment by Vladislav Vaintroub [ 2023-09-30 ]

krilbe can you also exactly specify the Windows version. There will be some difference in behavior in between Windows 10 1903 and earlier versions

Comment by Kjell Rilbe [ 2023-10-01 ]

Sure:

Microsoft Windows [Version 10.0.17763.4499]

This is Windows Server 2019

Comment by Vladislav Vaintroub [ 2023-10-02 ]

Ok , I checked, non-ASCII database works ok on the modern enough Windows, i.e anything since Windows 10 1903, due to MDEV-26713.

But Windows Server 2019 is based on build 1809, which makes it "old" in terms of MDEV-26713. MariaDB will soon stop supporting Windows 2019, as mainstream support by Microsoft ends Jan 2024.

A workaround for older versions of Windows may be --default-character-set=latin1, at least this fixes misinterpretation of dbname on the command line. You have found a second workaround already.

Comment by Vladislav Vaintroub [ 2023-10-12 ]

I'm closing for now. It never worked correctly, on old Windows (which unfortunately includes Windows Server 2019), but there are workarounds. On newer versions of Windows, it works already well.

Generated at Thu Feb 08 10:30:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.