[MDEV-24732] CP1252 is not Latin-1 Created: 2021-01-29 Updated: 2021-05-17 Resolved: 2021-05-17 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Character Sets |
| Affects Version/s: | 10.1, 10.3, 10.5 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Marc Prat Masó | Assignee: | Sergei Golubchik |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | character-set | ||
| Environment: |
Ubuntu 20.04, python3.7 with Mariadb10.3 or/and mariadb:10.1 (docker) |
||
| Description |
|
In MySQL Latin-1 is an alias to cp1252. Meanwhile the standard (e.g. python) Latin-1 is ISO-8859-1. Solution: add ISO-8859-1 charset and maybe change the documentation to avoid confusion. This is a duplicated issue from mysql: https://bugs.mysql.com/bug.php?id=101556 |
| Comments |
| Comment by Sergei Golubchik [ 2021-02-17 ] |
|
python isn't the standard. Microsoft is the standard for CP1252, they also call it "Windows Latin 1" or "ANSI Latin 1", depending on where you look. It seems the issue you're describing is that python connector treats MariaDB's latin1 as "ISO Latin 1", while it is, in fact, "ANSI Latin 1". Looks like a bug in the mysql python connector then, doesn't it? |
| Comment by Marc Prat Masó [ 2021-02-17 ] |
|
As I understand, this is what you say: Also, as you mention I will report this bug to python too. |
| Comment by Sergei Golubchik [ 2021-02-17 ] |
|
Adding iso88590-1 as a new charset is possible. But it doesn't look meaningful to me, see, for example, here or here. ISO-8859-1 is a strict subset of CP1252, that is anything that ISO-8859-1 can represent, CP1252 can represent too. The problem is not that the string is stored in the wrong charset, the problem is that it is interpreted by the client in the wrong charset. The client interprets strings in cp1252 as strings in iso-8859-1. The fix is to stop doing it, the client should interpret strings in cp1252 as strings in cp1252. |
| Comment by Marc Prat Masó [ 2021-02-18 ] |
|
Yes of course for the end user this doesn't offer nothing new, if they stick to the same charset. |
| Comment by Sergei Golubchik [ 2021-04-19 ] |
|
The python connector used incorrect "latin 1" charset, it used "ISO Latin 1", instead of "ANSI Latin 1". I agree this bug could also possibly happen with other software, it's unfortunate that "Latin 1" name is so overloaded. But we cannot help it, can we? Existing charsets cannot be renamed. Adding new iso8859_1 charset won't remove an ambiguity from latin1. What problem does the current definition of latin1 causes? How does it affect selects and inserts? |