Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8765

mysqldump silently corrupts 4-byte UTF-8 data

Details

    Description

      Bug for Oracle MySQL: https://bugs.mysql.com/bug.php?id=71746

      But this also affect MariaDB 10.0:

      [dvaneeden@dve-mac msb_ma10_0_20]$ ./my sqldump --skip-extended-insert unicodedata | grep DOLPHIN
      INSERT INTO `ucd` VALUES ('1F42C','?','DOLPHIN','So','0','ON','','','','','N','','','','','');
      [dvaneeden@dve-mac msb_ma10_0_20]$ ./my sqldump --skip-extended-insert --default-character-set=utf8mb4 unicodedata | grep DOLPHIN
      INSERT INTO `ucd` VALUES ('1F42C','��','DOLPHIN','So','0','ON','','','','','N','','','','','');

      Attachments

        Issue Links

          Activity

            dveeden Daniël van Eeden created issue -
            dveeden Daniël van Eeden made changes -
            Field Original Value New Value
            elenst Elena Stepanova made changes -
            Fix Version/s 10.0 [ 16000 ]
            Assignee Alexander Barkov [ bar ]
            Labels upstream
            elenst Elena Stepanova made changes -
            danblack Daniel Black added a comment -

            upstream fixed as per ebaff9fffc958030a57d8ea7f1f2d527cac1df64

            mariadb needs to change include/my_global.h:#define MYSQL_UNIVERSAL_CLIENT_CHARSET to utf8mb4
            mysqldump is the only place this is used.

            Really trivial fix to prevent backup corruption, even if utf8mb4 isn't the default.

            danblack Daniel Black added a comment - upstream fixed as per ebaff9fffc958030a57d8ea7f1f2d527cac1df64 mariadb needs to change include/my_global.h:#define MYSQL_UNIVERSAL_CLIENT_CHARSET to utf8mb4 mysqldump is the only place this is used. Really trivial fix to prevent backup corruption, even if utf8mb4 isn't the default.
            danblack Daniel Black made changes -
            Labels upstream upstream-fixed
            danblack Daniel Black made changes -
            Labels upstream-fixed beginner-friendly upstream-fixed
            svoj Sergey Vojtovich made changes -
            Labels beginner-friendly upstream-fixed beginner-friendly contribution foundation upstream-fixed
            svoj Sergey Vojtovich made changes -
            Priority Major [ 3 ] Critical [ 2 ]

            Raised priority as there's pull request now.

            svoj Sergey Vojtovich added a comment - Raised priority as there's pull request now.

            @Sergey could you add the link to the pull request here

            rutuja Rutuja Surve (Inactive) added a comment - @Sergey could you add the link to the pull request here

            rutuja, there's a link on the right side under "Development" section.
            https://github.com/MariaDB/server/pull/547

            svoj Sergey Vojtovich added a comment - rutuja , there's a link on the right side under "Development" section. https://github.com/MariaDB/server/pull/547

            Hi, I confirm this on both 5.5 and 10.3 using the UTF dataset available at:
            https://github.com/dveeden/mysqlunicodedata

            The fix in the associated PR#547 does fix the dump issue, instead of garbage '?', mysqldump does export the proper UTF symbols after patching, without the need for explicit --default-character-set.

            As far as the actual fix in the PR, at least the mysqldump* tests need adjusting, however, I can't speak for the overall implications of switching MYSQL_UNIVERSAL_CLIENT_CHARSET to utfmb4 for the entire suite. Someone better suited should evaluate that. Thank you!

            teodor Teodor Mircea Ionita (Inactive) added a comment - Hi, I confirm this on both 5.5 and 10.3 using the UTF dataset available at: https://github.com/dveeden/mysqlunicodedata The fix in the associated PR#547 does fix the dump issue, instead of garbage '?', mysqldump does export the proper UTF symbols after patching, without the need for explicit --default-character-set. As far as the actual fix in the PR, at least the mysqldump* tests need adjusting, however, I can't speak for the overall implications of switching MYSQL_UNIVERSAL_CLIENT_CHARSET to utfmb4 for the entire suite. Someone better suited should evaluate that. Thank you!
            bar Alexander Barkov made changes -
            issue.field.resolutiondate 2018-10-12 07:51:28.0 2018-10-12 07:51:28.09
            bar Alexander Barkov made changes -
            Fix Version/s 10.3.11 [ 23141 ]
            Fix Version/s 10.0 [ 16000 ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]
            bar Alexander Barkov made changes -
            Fix Version/s 10.4.0 [ 23115 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov added a comment - - edited

            This issue is critical for the JSON data type, which is an alias to longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin.

            bar Alexander Barkov added a comment - - edited This issue is critical for the JSON data type, which is an alias to longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin .
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 71405 ] MariaDB v4 [ 149587 ]

            People

              bar Alexander Barkov
              dveeden Daniël van Eeden
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.