[CONJ-988] UTF-16 surrogates are incorrectly computed - Jira

XML

Word

Printable

The code to compute the surrogate pair looks like this (in org.mariadb.jdbc.client.socket.impl.PacketWriter):

              int surrogatePairs =

                  ((currChar << 10) + nextChar) + (0x010000 - (0xD800 << 10) - 0xDC00);

According to the Unicode standard, this should look like this, however (https://unicodebook.readthedocs.io/unicode_encodings.html#surrogates):

    code = 0x10000;

    code += (units[0] & 0x03FF) << 10;

    code += (units[1] & 0x03FF);

Not too surprisingly, the two computations don't come to the same results.
Example: \udbc0\udd89

public class MyClass {

    public static void main(String args[]) {

      char current=0xdbc0;

      char next=0xdd89;

      int c=10000;

      c+=(current & 0x3ff) << 10;

      c+=(next & 0x3ff);

    int surrogatePairs =

                  ((current << 10) + next) + (0x010000 - (0xD800 << 10) - 0xDC00);

      System.out.println(c+" VS. "+surrogatePairs);

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.