[MDEV-32264] mariadb-dump doesn't utf-8 encode database name Created: 2023-09-27 Updated: 2023-10-12 Resolved: 2023-10-12 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | N/A |
| Affects Version/s: | 10.11 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Kjell Rilbe | Assignee: | Vladislav Vaintroub |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | backup, unicode | ||
| Environment: |
Windows Server, x64, English locale, cmd.exe, any code page. |
||
| Issue Links: |
|
||||||||
| Description |
|
Please refer to my Stack Overflow question about this issue. In short, in Windows using `cmd.exe`, if I try to use a database name that contains the character "ö" with Unicode code point U+00F6, the command fails with these two errors:
This is a server with `character-set-server=utf8mb4` and the same for the database charset. Regardless of the code page (`chcp`) being used in the `cmd.exe` session, the "ö" on the command line is apparently correctly interpreted as the Unicode character U+00F6, but when sending the database name to the server if puts that Unicode codepoint into the database name string "as is", instead of encoding it into utf-8. The correct utf-8 encoding is \xc3\xB6. I can get a working command line if I replace the "ö" with "ö", because those two characters have Unicode code points U+00C3 and U+00B6, resulting in a string that contains "ö" if interpreted as utf-8. I am unable to determine if the problem lies within `mariadb-dump.exe`, in a client library/connector that it makes use of, or if it's a problem at the server side. |
| Comments |
| Comment by Kjell Rilbe [ 2023-09-27 ] | |
|
Sorry, forgot to mention that this affects MariaDB version 10.11.5, Win64 | |
| Comment by Kjell Rilbe [ 2023-09-28 ] | |
|
The same problem applies to tables names on the command line. The work-around is not useful if any of the characters have Unicode codepoints that are special characters in the code page used in the cmd.exe session. For example, capital "Ö" is problematic. That said, table names are not case sensitive, so that particular example is easy to get around. | |
| Comment by Vladislav Vaintroub [ 2023-09-30 ] | |
|
krilbe, can you please also describe the exact version of your Windows, as in output of the "ver" command | |
| Comment by Vladislav Vaintroub [ 2023-09-30 ] | |
|
krilbe can you also exactly specify the Windows version. There will be some difference in behavior in between Windows 10 1903 and earlier versions | |
| Comment by Kjell Rilbe [ 2023-10-01 ] | |
|
Sure:
This is Windows Server 2019 | |
| Comment by Vladislav Vaintroub [ 2023-10-02 ] | |
|
Ok , I checked, non-ASCII database works ok on the modern enough Windows, i.e anything since Windows 10 1903, due to But Windows Server 2019 is based on build 1809, which makes it "old" in terms of A workaround for older versions of Windows may be --default-character-set=latin1, at least this fixes misinterpretation of dbname on the command line. You have found a second workaround already. | |
| Comment by Vladislav Vaintroub [ 2023-10-12 ] | |
|
I'm closing for now. It never worked correctly, on old Windows (which unfortunately includes Windows Server 2019), but there are workarounds. On newer versions of Windows, it works already well. |