[CONJ-988] UTF-16 surrogates are incorrectly computed Created: 2022-07-05 Updated: 2022-07-26 Resolved: 2022-07-25 |
|
| Status: | Closed |
| Project: | MariaDB Connector/J |
| Component/s: | Other |
| Affects Version/s: | 3.0.6 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Axel Dörfler | Assignee: | Diego Dupin |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Description |
|
The code to compute the surrogate pair looks like this (in org.mariadb.jdbc.client.socket.impl.PacketWriter):
According to the Unicode standard, this should look like this, however (https://unicodebook.readthedocs.io/unicode_encodings.html#surrogates):
Not too surprisingly, the two computations don't come to the same results.
|
| Comments |
| Comment by Diego Dupin [ 2022-07-25 ] | ||
|
hmm. "surrogatePairs" is badly named, code point would have been more appropriate, and in fact:
would be better replaced by :
nevermind, i'll change that. | ||
| Comment by Axel Dörfler [ 2022-07-25 ] | ||
|
I can confirm that Character.toCodePoint() is the exact same code as your version. It does, however, not produce the same results that the computation in the standard does. Well, it actually does for 0x10000 as you mention, but that's just one unique case; it doesn't have the same solution for pretty much any other value. | ||
| Comment by Diego Dupin [ 2022-07-25 ] | ||
|
I mean, this is equal in all cases : currChar is in the range U+D800 to U+DBFF inclusive => currChar & 0x3ff stricly equals currChar - 0xD800 ((currChar << 10) + nextChar ) + (0x010000 - (0xD800 << 10) - 0xDC00 | ||
| Comment by Axel Dörfler [ 2022-07-26 ] | ||
|
You're absolutely right! Damn, I'm not sure what I was thinking yesterday; I used values different for 'c' different than 0x10000 (in the range from 0x10000 to 0x10ffff) to test, instead of using different surrogate pairs with the correct base. Sorry for the noise and my stupidity, at least the code got a bit cleaner as a result! Thanks for your patience! | ||
| Comment by Diego Dupin [ 2022-07-26 ] | ||
|
no problem, issues are possible, and double checking is always a good idea ! |