[MDEV-3435] LP:488040 - Support for contractions between non-ASCII characters and Croatian collation Created: 2009-11-25 Updated: 2012-10-04 Resolved: 2012-10-04 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Ante Karamatić (Inactive) | Assignee: | Michael Widenius |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | Launchpad | ||
| Attachments: |
|
| Description |
|
From Neven Jacmenovic: The feature we desperately need in MariaDB is proper support for Croatian utf8 collation based on Croatian alphabet (http://en.wikipedia.org/wiki/Gajica) so we can finally sort croatian words (names etc) properly. MySQL don't have support for it, without this, we can't consider MySQL server or MariaDB for that matter, a choice for eg. government migration to open-source platform in near future. Most, if not all of those organizations now use MS SQL instead of open source solutions. AFAIK the countries which would benefit from the same implementation (alongside Croatia) are: Bosnia, Serbia (for latin charset) and Monte Negro (for latin charset). There already is built in latin2 Croatian collation (latin2_croatian_ci) and CP1250 Croatian collation (cp1250_croatian_ci) in MySQL but those implementations lack digraph support - single letters consisted of two letters (http://www.collation-charts.org/mysql60/mysql604.latin2_croatian_ci.html) and they are useless. And without proper support for diagraphs, we will never be able to use ORDER BY properly (a-b-c-č-ć-d-dž-đ-e-f-g-h-i-j-k-l-lj-m-n-nj-o-p-r-s-š-t-u-v-z-ž). Closest to Croatian is Slovenian collation (utf8_slovenian_ci) support built-in in MySQL, but it also lacks digraphs so it's not possible to adapt it (http://www.collation-charts.org/mysql60/mysql604.utf8_slovenian_ci.html). Right now, we are forced to use utf8_general_ci collation, which off course, doesn't know how to order Croatian alphabet properly. I've attached mysqldump with Croatian alphabet. Valid ordering should be: a-b-c-č-ć-d-dž-đ-e-f-g-h-i-j-k-l-lj-m-n-nj-o-p-r-s-š-t-u-v-z-ž. I've submitted S4 feature request to MySQL some time ago, and MySQL dev team started talking about it, but nothing happened (http://bugs.mysql.com/44523). Please MariaDB developers, make our native language suck less! |
| Comments |
| Comment by Ante Karamatić (Inactive) [ 2009-11-25 ] |
|
Re: Support for contractions between non-ASCII characters and Croatian collation |
| Comment by Ante Karamatić (Inactive) [ 2009-11-25 ] |
|
test_croatian_order.sql |
| Comment by Ante Karamatić (Inactive) [ 2009-11-28 ] |
|
Re: Support for contractions between non-ASCII characters and Croatian collation http://www.collation-charts.org/articles/croatian.htm this patch does more than just add support for Croatian UTF8 collation. It was based on Alexander's patch for mysql 5.1 (http://www.collation-charts.org/articles/utf8_croatian_ci.diff) and you could probably get it by pulling from mysql 6. |
| Comment by Ante Karamatić (Inactive) [ 2009-11-28 ] |
|
As explained at: http://www.collation-charts.org/articles/croatian.htm this patch does more than just add support for Croatian UTF8 collation. It was based on Alexander's patch for mysql 5.1 (http://www.collation-charts.org/articles/utf8_croatian_ci.diff) and you could probably get it by pulling from mysql 6. |
| Comment by Michael Widenius [ 2009-11-30 ] |
|
re: [Bug 488040] [NEW] Support for contractions between non-ASCII characters and Croatian collation Hi! >>>>> "Ante" == Ante Karamati <Ante> writes: Ante> Public bug reported: Ante> The feature we desperately need in MariaDB is proper support for <cut> Croatian character sets are pushed into MariaDB 5.1-merge and should Regards, |
| Comment by Michael Widenius [ 2009-11-30 ] |
|
Re: Support for contractions between non-ASCII characters and Croatian collation |
| Comment by Ante Karamatić (Inactive) [ 2009-12-02 ] |
|
Re: Support for contractions between non-ASCII characters and Croatian collation http://www.collation-charts.org/ "Dec 2, 2009. An updated version of the Croatian collation patch for MySQL-5.1 is available. It works a little bit more accurate when optimizing a LIKE query for UCS2 columns, in case of non-ASCII contractions: SELECT a FROM t1 WHERE a LIKE 'dž%'; The previous version could potentially lose some rows." |
| Comment by Ante Karamatić (Inactive) [ 2009-12-02 ] |
|
There's an update for this bug. Patch is attached. Explained at: http://www.collation-charts.org/ "Dec 2, 2009. An updated version of the Croatian collation patch for MySQL-5.1 is available. It works a little bit more accurate when optimizing a LIKE query for UCS2 columns, in case of non-ASCII contractions: SELECT a FROM t1 WHERE a LIKE 'dž%'; The previous version could potentially lose some rows." |
| Comment by Rasmus Johansson (Inactive) [ 2010-02-11 ] |
|
Launchpad bug id: 488040 |