[MDEV-3435] LP:488040 - Support for contractions between non-ASCII characters and Croatian collation Created: 2009-11-25  Updated: 2012-10-04  Resolved: 2012-10-04

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Ante Karamatić (Inactive) Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: Launchpad

Attachments: XML File LPexportBug488040.xml     File LPexportBug488040_ctype-ucs2.c.v0-v1.diff     File LPexportBug488040_maria.croatian.diff     File LPexportBug488040_test_croatian_order.sql    

 Description   

From Neven Jacmenovic:

The feature we desperately need in MariaDB is proper support for Croatian utf8 collation based on Croatian alphabet (http://en.wikipedia.org/wiki/Gajica) so we can finally sort croatian words (names etc) properly. MySQL don't have support for it, without this, we can't consider MySQL server or MariaDB for that matter, a choice for eg. government migration to open-source platform in near future. Most, if not all of those organizations now use MS SQL instead of open source solutions.

AFAIK the countries which would benefit from the same implementation (alongside Croatia) are: Bosnia, Serbia (for latin charset) and Monte Negro (for latin charset).

There already is built in latin2 Croatian collation (latin2_croatian_ci) and CP1250 Croatian collation (cp1250_croatian_ci) in MySQL but those implementations lack digraph support - single letters consisted of two letters (http://www.collation-charts.org/mysql60/mysql604.latin2_croatian_ci.html) and they are useless. And without proper support for diagraphs, we will never be able to use ORDER BY properly (a-b-c-č-ć-d-dž-đ-e-f-g-h-i-j-k-l-lj-m-n-nj-o-p-r-s-š-t-u-v-z-ž).

Closest to Croatian is Slovenian collation (utf8_slovenian_ci) support built-in in MySQL, but it also lacks digraphs so it's not possible to adapt it (http://www.collation-charts.org/mysql60/mysql604.utf8_slovenian_ci.html).

Right now, we are forced to use utf8_general_ci collation, which off course, doesn't know how to order Croatian alphabet properly. I've attached mysqldump with Croatian alphabet. Valid ordering should be: a-b-c-č-ć-d-dž-đ-e-f-g-h-i-j-k-l-lj-m-n-nj-o-p-r-s-š-t-u-v-z-ž.
"DŽ", "NJ" and "LJ" are SINGLE letters.

I've submitted S4 feature request to MySQL some time ago, and MySQL dev team started talking about it, but nothing happened (http://bugs.mysql.com/44523).

Please MariaDB developers, make our native language suck less!



 Comments   
Comment by Ante Karamatić (Inactive) [ 2009-11-25 ]

Re: Support for contractions between non-ASCII characters and Croatian collation

Comment by Ante Karamatić (Inactive) [ 2009-11-25 ]

test_croatian_order.sql
LPexportBug488040_test_croatian_order.sql

Comment by Ante Karamatić (Inactive) [ 2009-11-28 ]

Re: Support for contractions between non-ASCII characters and Croatian collation
As explained at:

http://www.collation-charts.org/articles/croatian.htm

this patch does more than just add support for Croatian UTF8 collation. It was based on Alexander's patch for mysql 5.1 (http://www.collation-charts.org/articles/utf8_croatian_ci.diff) and you could probably get it by pulling from mysql 6.

Comment by Ante Karamatić (Inactive) [ 2009-11-28 ]

As explained at:

http://www.collation-charts.org/articles/croatian.htm

this patch does more than just add support for Croatian UTF8 collation. It was based on Alexander's patch for mysql 5.1 (http://www.collation-charts.org/articles/utf8_croatian_ci.diff) and you could probably get it by pulling from mysql 6.
maria.croatian.diff
LPexportBug488040_maria.croatian.diff

Comment by Michael Widenius [ 2009-11-30 ]

re: [Bug 488040] [NEW] Support for contractions between non-ASCII characters and Croatian collation

Hi!

>>>>> "Ante" == Ante Karamati <Ante> writes:

Ante> Public bug reported:
>> From Neven Jacmenovic:

Ante> The feature we desperately need in MariaDB is proper support for
Ante> Croatian utf8 collation based on Croatian alphabet
Ante> (http://en.wikipedia.org/wiki/Gajica) so we can finally sort croatian
Ante> words (names etc) properly. MySQL don't have support for it, without
Ante> this, we can't consider MySQL server or MariaDB for that matter, a
Ante> choice for eg. government migration to open-source platform in near
Ante> future. Most, if not all of those organizations now use MS SQL instead
Ante> of open source solutions.

<cut>

Croatian character sets are pushed into MariaDB 5.1-merge and should
be in default MariaDB 5.1 tomorrow.

Regards,
Monty

Comment by Michael Widenius [ 2009-11-30 ]

Re: Support for contractions between non-ASCII characters and Croatian collation
Croatian character sets are pushed into MariaDB 5.1-merge and should be in default MariaDB 5.1 tomorrow.

Comment by Ante Karamatić (Inactive) [ 2009-12-02 ]

Re: Support for contractions between non-ASCII characters and Croatian collation
There's an update for this bug. Patch is attached. Explained at:

http://www.collation-charts.org/

"Dec 2, 2009. An updated version of the Croatian collation patch for MySQL-5.1 is available. It works a little bit more accurate when optimizing a LIKE query for UCS2 columns, in case of non-ASCII contractions:

SELECT a FROM t1 WHERE a LIKE 'dž%';

The previous version could potentially lose some rows."

Comment by Ante Karamatić (Inactive) [ 2009-12-02 ]

There's an update for this bug. Patch is attached. Explained at:

http://www.collation-charts.org/

"Dec 2, 2009. An updated version of the Croatian collation patch for MySQL-5.1 is available. It works a little bit more accurate when optimizing a LIKE query for UCS2 columns, in case of non-ASCII contractions:

SELECT a FROM t1 WHERE a LIKE 'dž%';

The previous version could potentially lose some rows."
ctype-ucs2.c.v0-v1.diff
LPexportBug488040_ctype-ucs2.c.v0-v1.diff

Comment by Rasmus Johansson (Inactive) [ 2010-02-11 ]

Launchpad bug id: 488040

Generated at Thu Feb 08 06:48:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.