Uploaded image for project: 'MariaDB Connector/J'
  1. MariaDB Connector/J
  2. CONJ-988

UTF-16 surrogates are incorrectly computed

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Not a Bug
    • 3.0.6
    • N/A
    • Other
    • None

    Description

      The code to compute the surrogate pair looks like this (in org.mariadb.jdbc.client.socket.impl.PacketWriter):

                    int surrogatePairs =
                        ((currChar << 10) + nextChar) + (0x010000 - (0xD800 << 10) - 0xDC00);
      

      According to the Unicode standard, this should look like this, however (https://unicodebook.readthedocs.io/unicode_encodings.html#surrogates):

          code = 0x10000;
          code += (units[0] & 0x03FF) << 10;
          code += (units[1] & 0x03FF);
      

      Not too surprisingly, the two computations don't come to the same results.
      Example: \udbc0\udd89

      public class MyClass {
          public static void main(String args[]) {
            char current=0xdbc0;
            char next=0xdd89;
            int c=10000;
            c+=(current & 0x3ff) << 10;
            c+=(next & 0x3ff);
       
          int surrogatePairs =
                        ((current << 10) + next) + (0x010000 - (0xD800 << 10) - 0xDC00);
       
            System.out.println(c+" VS. "+surrogatePairs);
          }
      }
      

      Attachments

        Activity

          People

            diego dupin Diego Dupin
            axel.doerfler Axel Dörfler
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.