Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32264

mariadb-dump doesn't utf-8 encode database name

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Fix
    • 10.11
    • N/A
    • N/A
    • Windows Server, x64, English locale, cmd.exe, any code page.

    Description

      Please refer to my Stack Overflow question about this issue.

      In short, in Windows using `cmd.exe`, if I try to use a database name that contains the character "ö" with Unicode code point U+00F6, the command fails with these two errors:

      mariadb-dump.exe: Error: 'Illegal mix of collations (utf8mb3_general_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation '='' when trying to dump tablespaces
      mariadb-dump.exe: Got error: 1300: "Invalid utf8mb4 character string: 'dbf\xF6retag'" when selecting the database
      

      This is a server with `character-set-server=utf8mb4` and the same for the database charset.

      Regardless of the code page (`chcp`) being used in the `cmd.exe` session, the "ö" on the command line is apparently correctly interpreted as the Unicode character U+00F6, but when sending the database name to the server if puts that Unicode codepoint into the database name string "as is", instead of encoding it into utf-8.

      The correct utf-8 encoding is \xc3\xB6. I can get a working command line if I replace the "ö" with "ö", because those two characters have Unicode code points U+00C3 and U+00B6, resulting in a string that contains "ö" if interpreted as utf-8.

      I am unable to determine if the problem lies within `mariadb-dump.exe`, in a client library/connector that it makes use of, or if it's a problem at the server side.

      Attachments

        Issue Links

          Activity

            krilbe Kjell Rilbe added a comment -

            Sorry, forgot to mention that this affects MariaDB version 10.11.5, Win64

            krilbe Kjell Rilbe added a comment - Sorry, forgot to mention that this affects MariaDB version 10.11.5, Win64
            krilbe Kjell Rilbe added a comment -

            The same problem applies to tables names on the command line.

            The work-around is not useful if any of the characters have Unicode codepoints that are special characters in the code page used in the cmd.exe session. For example, capital "Ö" is problematic. That said, table names are not case sensitive, so that particular example is easy to get around.

            krilbe Kjell Rilbe added a comment - The same problem applies to tables names on the command line. The work-around is not useful if any of the characters have Unicode codepoints that are special characters in the code page used in the cmd.exe session. For example, capital "Ö" is problematic. That said, table names are not case sensitive, so that particular example is easy to get around.

            krilbe, can you please also describe the exact version of your Windows, as in output of the "ver" command

            wlad Vladislav Vaintroub added a comment - krilbe , can you please also describe the exact version of your Windows, as in output of the "ver" command

            krilbe can you also exactly specify the Windows version. There will be some difference in behavior in between Windows 10 1903 and earlier versions

            wlad Vladislav Vaintroub added a comment - krilbe can you also exactly specify the Windows version. There will be some difference in behavior in between Windows 10 1903 and earlier versions
            krilbe Kjell Rilbe added a comment - - edited

            Sure:

            Microsoft Windows [Version 10.0.17763.4499]
            

            This is Windows Server 2019

            krilbe Kjell Rilbe added a comment - - edited Sure: Microsoft Windows [Version 10.0.17763.4499] This is Windows Server 2019

            Ok , I checked, non-ASCII database works ok on the modern enough Windows, i.e anything since Windows 10 1903, due to MDEV-26713.

            But Windows Server 2019 is based on build 1809, which makes it "old" in terms of MDEV-26713. MariaDB will soon stop supporting Windows 2019, as mainstream support by Microsoft ends Jan 2024.

            A workaround for older versions of Windows may be --default-character-set=latin1, at least this fixes misinterpretation of dbname on the command line. You have found a second workaround already.

            wlad Vladislav Vaintroub added a comment - Ok , I checked, non-ASCII database works ok on the modern enough Windows, i.e anything since Windows 10 1903, due to MDEV-26713 . But Windows Server 2019 is based on build 1809, which makes it "old" in terms of MDEV-26713 . MariaDB will soon stop supporting Windows 2019, as mainstream support by Microsoft ends Jan 2024. A workaround for older versions of Windows may be --default-character-set=latin1, at least this fixes misinterpretation of dbname on the command line. You have found a second workaround already.

            I'm closing for now. It never worked correctly, on old Windows (which unfortunately includes Windows Server 2019), but there are workarounds. On newer versions of Windows, it works already well.

            wlad Vladislav Vaintroub added a comment - I'm closing for now. It never worked correctly, on old Windows (which unfortunately includes Windows Server 2019), but there are workarounds. On newer versions of Windows, it works already well.

            People

              wlad Vladislav Vaintroub
              krilbe Kjell Rilbe
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.