[MDEV-4929] Add Myanmar (Burmese) collation - Jira

Alexander Barkov created issue - 2013-08-20 17:31

Sergei Golubchik made changes - 2013-08-20 18:13

Field	Original Value	New Value
Description	There is a long standing feature request from Myanmar users: http://bugs.mysql.com/bug.php?id=22008 The users are still waiting for the collation: https://lists.askmonty.org/pipermailp/info/2013-August/001006.html This task needs ~~MDEV-4928~~ to be merged from MySQL-5.6 first.	There is a long standing feature request from Myanmar users: http://bugs.mysql.com/bug.php?id=22008 The users are still waiting for the collation. This task needs ~~MDEV-4928~~ to be merged from MySQL-5.6 first.

Sergei Golubchik made changes - 2013-08-20 18:13

Link

This issue is blocked by ~~MDEV-4928~~ [ ~~MDEV-4928~~ ]

Alexander Barkov made changes - 2013-09-19 10:22

Issue Type

Bug [ 1 ]

Task [ 3 ]

Sithu Thwin (Inactive) added a comment - 2013-10-06 04:44

I want to help. But I don't know how. I've already checkout latest source code from launchpad. Can anyone guide me how to and where to start ?

Sithu Thwin (Inactive) added a comment - 2013-10-06 04:44 I want to help. But I don't know how. I've already checkout latest source code from launchpad. Can anyone guide me how to and where to start ?

Alexander Barkov added a comment - 2013-10-14 10:43 - edited

Hi Sithu. Sorry for a late reply.

We're releasing 10.0.5 soon which will include "~~MDEV-4928~~ Collation customization improvements",
which is prerequisite for the Myanmar collation.
After the release I can give you instructions how to configure mysqld to support Myanmar
as a dynamically loadable collation, so you can test it. If everything is fine with it, then
we can include it into the next release (10.0.6) as a built-in collation, so it will work out
of the box.

Another option is not to wait for 10.0.5. You can download the 10.0.5 sources
from Launchpad lp:~maria-captains/maria/10.0 (see https://launchpad.net/maria/10.0 for details),
so we can start testing right now.

Please let me know when you're ready.
Thanks.

Alexander Barkov added a comment - 2013-10-14 10:43 - edited Hi Sithu. Sorry for a late reply. We're releasing 10.0.5 soon which will include " MDEV-4928 Collation customization improvements", which is prerequisite for the Myanmar collation. After the release I can give you instructions how to configure mysqld to support Myanmar as a dynamically loadable collation, so you can test it. If everything is fine with it, then we can include it into the next release (10.0.6) as a built-in collation, so it will work out of the box. Another option is not to wait for 10.0.5. You can download the 10.0.5 sources from Launchpad lp:~maria-captains/maria/10.0 (see https://launchpad.net/maria/10.0 for details), so we can start testing right now. Please let me know when you're ready. Thanks.

Alexander Barkov made changes - 2013-10-14 10:51

Comment

[ I meant "Sorry for a late reply" of course :) ]

Sithu Thwin (Inactive) added a comment - 2013-10-14 11:03

I've already downloaded latest sources code from maria-captains launchpad
repo. Where and How do I start testing ?
Thanks
Sithu

On Mon, Oct 14, 2013 at 2:21 PM, Alexander Barkov (JIRA) <

–

Sithu Thwin (Inactive) added a comment - 2013-10-14 11:03 I've already downloaded latest sources code from maria-captains launchpad repo. Where and How do I start testing ? Thanks Sithu On Mon, Oct 14, 2013 at 2:21 PM, Alexander Barkov (JIRA) < –

Alexander Barkov added a comment - 2013-10-14 12:20

Index.xml with the Myanmar collation definition.

Alexander Barkov added a comment - 2013-10-14 12:20 Index.xml with the Myanmar collation definition.

Alexander Barkov made changes - 2013-10-14 12:20

Attachment

Index.xml.gz [ 23900 ]

Alexander Barkov added a comment - 2013-10-14 12:24 - edited

Please do the following steps:

Download Index.xml.gz attached in this issue.
gunzip Index.xml.gz
Go to /share/charsets directory of your MariaDB-10.0.5 installation
Save the existing Index.xml: mv Index.xml Index.xml.orig
Put the downloaded Index.xml instead of the old one
Restart MariaDB server
Run this query:
SHOW COLLATION LIKE 'utf8_m%';
It should report the new collation utf8_myanmar_ci.
Try to create tables, insert some data and try sorting order.

Thanks.

Alexander Barkov added a comment - 2013-10-14 12:24 - edited Please do the following steps: Download Index.xml.gz attached in this issue. gunzip Index.xml.gz Go to /share/charsets directory of your MariaDB-10.0.5 installation Save the existing Index.xml: mv Index.xml Index.xml.orig Put the downloaded Index.xml instead of the old one Restart MariaDB server Run this query: SHOW COLLATION LIKE 'utf8_m%'; It should report the new collation utf8_myanmar_ci. Try to create tables, insert some data and try sorting order. Thanks.

Sithu Thwin (Inactive) added a comment - 2013-10-14 23:14

I've downloaded source codes using instruction on this link
https://mariadb.com/kb/en/Getting_the_MariaDB_Source_Code/
I got maria source tree. I found VERSION file in it and it shows 10.0.4.
I branched with command bzr branch lp:maria/10.0 . I got new dir 10.0 and
it's VERSION file still 10.0.4.
I build mariadb server in 10.0 branch and installed. MySql version still
show 10.0.4.
I replaced index.xml and when I run SHOW COLLATION LIKE 'utf8_m%';
I got

Empty set (0.00 sec)

in mysql prompt.
What am I doing wrong?
Thanks,
Sithu

On Mon, Oct 14, 2013 at 3:55 PM, Alexander Barkov (JIRA) <

–

Sithu Thwin (Inactive) added a comment - 2013-10-14 23:14 I've downloaded source codes using instruction on this link https://mariadb.com/kb/en/Getting_the_MariaDB_Source_Code/ I got maria source tree. I found VERSION file in it and it shows 10.0.4. I branched with command bzr branch lp:maria/10.0 . I got new dir 10.0 and it's VERSION file still 10.0.4. I build mariadb server in 10.0 branch and installed. MySql version still show 10.0.4. I replaced index.xml and when I run SHOW COLLATION LIKE 'utf8_m%'; I got Empty set (0.00 sec) in mysql prompt. What am I doing wrong? Thanks, Sithu On Mon, Oct 14, 2013 at 3:55 PM, Alexander Barkov (JIRA) < –

Alexander Barkov added a comment - 2013-10-15 09:28

Please try this SQL script:

drop table if exists t1;
create table t1 (a varchar(10) character set utf8 collate utf8_myanmar_ci);
show warnings;

What does "show warnings" return?

Thanks.

Alexander Barkov added a comment - 2013-10-15 09:28 Please try this SQL script: drop table if exists t1; create table t1 (a varchar(10) character set utf8 collate utf8_myanmar_ci); show warnings; What does "show warnings" return? Thanks.

Sithu Thwin (Inactive) added a comment - 2013-10-15 09:51

Which database I should use? mysql or information_scheme ?
Thanks

On Tue, Oct 15, 2013 at 12:59 PM, Alexander Barkov (JIRA) <

–

Sithu Thwin (Inactive) added a comment - 2013-10-15 09:51 Which database I should use? mysql or information_scheme ? Thanks On Tue, Oct 15, 2013 at 12:59 PM, Alexander Barkov (JIRA) < –

Alexander Barkov added a comment - 2013-10-15 09:55

Non of them. Please use "test" or some other non-system database you have access to.

Alexander Barkov added a comment - 2013-10-15 09:55 Non of them. Please use "test" or some other non-system database you have access to.

Alexander Barkov added a comment - 2013-10-15 09:58

Sorry, the "SHOW WARNINGS" output in the above comment does not look right.

I guess something happened during copy-and-paste.
Can you please paste again?

Alexander Barkov added a comment - 2013-10-15 09:58 Sorry, the "SHOW WARNINGS" output in the above comment does not look right. I guess something happened during copy-and-paste. Can you please paste again?

Alexander Barkov added a comment - 2013-10-15 09:59

Did you restart the server after replacing Index.xml ?

Alexander Barkov added a comment - 2013-10-15 09:59 Did you restart the server after replacing Index.xml ?

Sithu Thwin (Inactive) added a comment - 2013-10-15 10:01

Sorry first output is not the right one. It is before replacing
Index.xml.

Here is after replacing Index.xml

MariaDB [mysql]> drop table if exists t1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

MariaDB [mysql]> create table t1 (a varchar(10) character set utf8 collate
utf8_myanmar_ci);
Query OK, 0 rows affected (0.35 sec)

MariaDB [mysql]> show warnings;Empty set (0.00 sec)

MariaDB [mysql]>

–

Sithu Thwin (Inactive) added a comment - 2013-10-15 10:01 Sorry first output is not the right one. It is before replacing Index.xml. Here is after replacing Index.xml MariaDB [mysql] > drop table if exists t1; Query OK, 0 rows affected, 1 warning (0.00 sec) MariaDB [mysql] > create table t1 (a varchar(10) character set utf8 collate utf8_myanmar_ci); Query OK, 0 rows affected (0.35 sec) MariaDB [mysql] > show warnings;Empty set (0.00 sec) MariaDB [mysql] > –

Sithu Thwin (Inactive) added a comment - 2013-10-15 10:08

I run killall mysqld before starting mysql.
Here is output on test database.

MariaDB [test]> drop table if exists t1;
Query OK, 0 rows affected (0.18 sec)

MariaDB [test]> create table t1 (a varchar(10) character set utf8 collate utf8_myanmar_ci);
Query OK, 0 rows affected (0.38 sec)

MariaDB [test]> show warnings;
Empty set (0.00 sec)

MariaDB [test]>

Sithu Thwin (Inactive) added a comment - 2013-10-15 10:08 I run killall mysqld before starting mysql. Here is output on test database. MariaDB [test] > drop table if exists t1; Query OK, 0 rows affected (0.18 sec) MariaDB [test] > create table t1 (a varchar(10) character set utf8 collate utf8_myanmar_ci); Query OK, 0 rows affected (0.38 sec) MariaDB [test] > show warnings; Empty set (0.00 sec) MariaDB [test] >

Sithu Thwin (Inactive) made changes - 2013-10-15 10:14

Comment

[ Here is output

MariaDB [mysql]> drop table if exists t1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

MariaDB [mysql]> create table t1 (a varchar(10) character set utf8 collate
utf8_myanmar_ci);
ERROR 1273 (HY000): Unknown collation: 'utf8_myanmar_ci'
MariaDB [mysql]> show warnings;
+-------+------+--------------------------------------+
+-------+------+--------------------------------------+
+-------+------+--------------------------------------+
1 row in set (0.00 sec)

MariaDB [mysql]>

On Tue, Oct 15, 2013 at 1:21 PM, Sithu Thwin (JIRA) <
jira@mariadb.atlassian.net> wrote:

--
]

Alexander Barkov added a comment - 2013-10-15 10:23

So it worked fine. The table has been created.
You can now insert some data into it and try sorting.

I guess "SHOW COLLATION LIKE 'utf8_m%';" now will also report utf8_myanmar_ci
without problems.

Alexander Barkov added a comment - 2013-10-15 10:23 So it worked fine. The table has been created. You can now insert some data into it and try sorting. I guess "SHOW COLLATION LIKE 'utf8_m%';" now will also report utf8_myanmar_ci without problems.

Sithu Thwin (Inactive) added a comment - 2013-10-15 10:31

Ah, It's working now. Here is the output for SHOW COLLATION LIKE 'utf8_m%';

MariaDB [test]> SHOW COLLATION LIKE 'utf8_m%';
-------------------------------------------------+

Collation

Charset

Id

Default

Compiled

Sortlen

-------------------------------------------------+

utf8_myanmar_ci

utf8

220

8

-------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [test]>

I will start testing and checking sorting will work or not. Will report the result.

Sithu Thwin (Inactive) added a comment - 2013-10-15 10:31 Ah, It's working now. Here is the output for SHOW COLLATION LIKE 'utf8_m%'; MariaDB [test] > SHOW COLLATION LIKE 'utf8_m%'; ---------------- ------- --- ------- -------- --------+ Collation Charset Id Default Compiled Sortlen ---------------- ------- --- ------- -------- --------+ utf8_myanmar_ci utf8 220 8 ---------------- ------- --- ------- -------- --------+ 1 row in set (0.00 sec) MariaDB [test] > I will start testing and checking sorting will work or not. Will report the result.

Sithu Thwin (Inactive) added a comment - 2013-10-15 12:26

Some words not correctly sorted. I will check and make change.
BTW, in myanmar, we have more than 4 ethnic language with different sorting or encoding using same characters. I want know wheather those language can be added in collation.

Thanks

Sithu Thwin (Inactive) added a comment - 2013-10-15 12:26 Some words not correctly sorted. I will check and make change. BTW, in myanmar, we have more than 4 ethnic language with different sorting or encoding using same characters. I want know wheather those language can be added in collation. Thanks

Alexander Barkov added a comment - 2013-10-15 12:50

> Some words not correctly sorted.

Can you please give an example of what is not sorted correctly?
Please dump your table and attach the dump here into the issue,
so I can reproduce it on my machine.

> I will check and make change.

Are you going to change Index.xml ?
It might be tricky. The Myanmar collation definition is quite complex.
Can you please send the dump first?

> BTW, in myanmar, we have more than 4 ethnic language with different sorting or encoding using same characters.
> I want know wheather those language can be added in collation.

I need to check. What are these languages?

Alexander Barkov added a comment - 2013-10-15 12:50 > Some words not correctly sorted. Can you please give an example of what is not sorted correctly? Please dump your table and attach the dump here into the issue, so I can reproduce it on my machine. > I will check and make change. Are you going to change Index.xml ? It might be tricky. The Myanmar collation definition is quite complex. Can you please send the dump first? > BTW, in myanmar, we have more than 4 ethnic language with different sorting or encoding using same characters. > I want know wheather those language can be added in collation. I need to check. What are these languages?

Sithu Thwin (Inactive) added a comment - 2013-10-15 13:34

See attached screenshot. Top word ၎င်း (u104e + u1004 + u103a + u1038)
should not be sorted as first word. That word has at least 3 form and
sometime spell even with Myanmar Digit 4. I've attacted my testing script
and sql dump file. Sql dump's content is taken from hunspell my_MM.dic.

It might also have some other issues, I will check with Myanmar DIctionary
Sorting guide and report if found something incorrectly sorting.

Other languages are Shan, Mon, Kayin, Kayah(Kayini).

Thanks.

On Tue, Oct 15, 2013 at 4:21 PM, Alexander Barkov (JIRA) <

–

Sithu Thwin (Inactive) added a comment - 2013-10-15 13:34 See attached screenshot. Top word ၎င်း (u104e + u1004 + u103a + u1038) should not be sorted as first word. That word has at least 3 form and sometime spell even with Myanmar Digit 4. I've attacted my testing script and sql dump file. Sql dump's content is taken from hunspell my_MM.dic. It might also have some other issues, I will check with Myanmar DIctionary Sorting guide and report if found something incorrectly sorting. Other languages are Shan, Mon, Kayin, Kayah(Kayini). Thanks. On Tue, Oct 15, 2013 at 4:21 PM, Alexander Barkov (JIRA) < –

Sithu Thwin (Inactive) made changes - 2013-10-15 13:34

Attachment		sorting.zip [ 23905 ]
Attachment		sorting.png [ 23906 ]

Alexander Barkov added a comment - 2013-10-15 14:15

The collation engine is currently limited to 6 code points in a single collation element.
The first word is not sorted correctly because it should be sorted near:

u+101C u+100A u+103A u+1038 u+1000 u+1031 u+102C u+1004 u+103A u+1038

which makes 10 code points. I'll check how to fix this.

Is anything else sorted in a wrong way?

Alexander Barkov added a comment - 2013-10-15 14:15 The collation engine is currently limited to 6 code points in a single collation element. The first word is not sorted correctly because it should be sorted near: u+101C u+100A u+103A u+1038 u+1000 u+1031 u+102C u+1004 u+103A u+1038 which makes 10 code points. I'll check how to fix this. Is anything else sorted in a wrong way?

Sithu Thwin (Inactive) added a comment - 2013-10-17 07:27

Images in sorting.zip are sorting guide which is scanned from Myanmar
Dictionary 5 Books series printed/published around 1970. More than 30,000
people participated in collecting words for that books. The best reference
book in Myanmar.

Great SA (ဿ u+103F) must be sorted exactly after SA ( သ u+101E) group.
(ဌ္ဋ u+101C u+1039 u+100B) must be sorted exactly after ( ဌ u+101C) group.
It is just one letter with the combination of 3 glymps.

In sql dump file, consonants, syllabus, medial tables are written and
ordered as in the images from sorting.zip.

ဧ U+1027 has two different ways to sort how the word sound.

for example in the word ဧချင်း (u+1027 u+1001 u+103B u+1004 u+103A u+1038)
it treated as အေး (u+1021 u+1031 u+1038) ။
in ဧရာ it treated as အေ (u+1021 u+1031)
I think this cannot be sorted without wordlist dictionary.

On Tue, Oct 15, 2013 at 5:46 PM, Alexander Barkov (JIRA) <

–

Sithu Thwin (Inactive) added a comment - 2013-10-17 07:27 Images in sorting.zip are sorting guide which is scanned from Myanmar Dictionary 5 Books series printed/published around 1970. More than 30,000 people participated in collecting words for that books. The best reference book in Myanmar. Great SA (ဿ u+103F) must be sorted exactly after SA ( သ u+101E) group. (ဌ္ဋ u+101C u+1039 u+100B) must be sorted exactly after ( ဌ u+101C) group. It is just one letter with the combination of 3 glymps. In sql dump file, consonants, syllabus, medial tables are written and ordered as in the images from sorting.zip. ဧ U+1027 has two different ways to sort how the word sound. for example in the word ဧချင်း (u+1027 u+1001 u+103B u+1004 u+103A u+1038) it treated as အေး (u+1021 u+1031 u+1038) ။ in ဧရာ it treated as အေ (u+1021 u+1031) I think this cannot be sorted without wordlist dictionary. On Tue, Oct 15, 2013 at 5:46 PM, Alexander Barkov (JIRA) < –

Sithu Thwin (Inactive) made changes - 2013-10-17 07:27

Attachment		sorting.zip [ 23910 ]
Attachment		great_sa.jpg [ 23911 ]
Attachment		sort.sql [ 23912 ]

Alexander Barkov added a comment - 2013-10-17 14:57 - edited

Thanks for the information.

I have a question about the order in the table syllables:

From my understanding the record with sid=27 must be greater than the record sid=108.

Record with sid=108 is consonant u+1021 followed by vovel "u+102C u+101A u+103A".
Record with sid=27 is consonant u+1021 followed by vovel "u+102D u+102F u+101A u+103A".

The collation defition was taken from the Common Locale Data Repository:
http://unicode.org/repos/cldr/tags/release-23/common/collation/my.xml
(Please download this file for reference).

According to this collation definition, the above two vowels
are defined in the same big group, relatively to u+1034,
in this order:

<reset>\u1034</reset> (Index.xml:461, my.xml:37)
...
<s>\u102C\u101A\u103A</s> Index.xml:927, my.xml:503)
...
<s>\u102D\u102F\u101A\u103A</s> Index.xml:941, my.xml:517)

which means \u102C\u101A\u103A is smaller than \u102D\u102F\u101A\u103A,
which means the record sid=27 is bigger than the record sid=108.

Can you please confirm this?
It seems "sid" is not in alphabetic order in the table "syllables".

Also, I found that sid 28, 29,30,31, 32 are also not in the alphabetic order:

The records should be in this ascending order:

sid	Code points	where defined
109	1021 + 101B 103A	my.xml:519
110	1021 + 102C 101B 103A	my.xml:521
111	1021 + 1031 101B 103A	my.ml:529
28	1021 + 102D 102F 101B 103A	my.xml:535
112	1021 + 101C 103A	my.xml:537
29	1021 + 102D 102F 101C 103A	my.xml:553
113	1021 + 101E 103A	my.xml:563
30	1021 + 102D 102F 101E 103A	my.xml:580
114	1021 + 101F 103A	my.xml:582
31	1021 + 102D 102F 101F 103A	my.xml:598
32	1021 + 102D 102F 1020 103A	my.xml:607

Is that correct?

Alexander Barkov added a comment - 2013-10-17 14:57 - edited Thanks for the information. I have a question about the order in the table syllables: From my understanding the record with sid=27 must be greater than the record sid=108. Record with sid=108 is consonant u+1021 followed by vovel "u+102C u+101A u+103A". Record with sid=27 is consonant u+1021 followed by vovel "u+102D u+102F u+101A u+103A". The collation defition was taken from the Common Locale Data Repository: http://unicode.org/repos/cldr/tags/release-23/common/collation/my.xml (Please download this file for reference). According to this collation definition, the above two vowels are defined in the same big group, relatively to u+1034, in this order: <reset>\u1034</reset> (Index.xml:461, my.xml:37) ... <s>\u102C\u101A\u103A</s> Index.xml:927, my.xml:503) ... <s>\u102D\u102F\u101A\u103A</s> Index.xml:941, my.xml:517) which means \u102C\u101A\u103A is smaller than \u102D\u102F\u101A\u103A, which means the record sid=27 is bigger than the record sid=108. Can you please confirm this? It seems "sid" is not in alphabetic order in the table "syllables". Also, I found that sid 28, 29,30,31, 32 are also not in the alphabetic order: The records should be in this ascending order: sid Code points where defined 109 1021 + 101B 103A my.xml:519 110 1021 + 102C 101B 103A my.xml:521 111 1021 + 1031 101B 103A my.ml:529 28 1021 + 102D 102F 101B 103A my.xml:535 112 1021 + 101C 103A my.xml:537 29 1021 + 102D 102F 101C 103A my.xml:553 113 1021 + 101E 103A my.xml:563 30 1021 + 102D 102F 101E 103A my.xml:580 114 1021 + 101F 103A my.xml:582 31 1021 + 102D 102F 101F 103A my.xml:598 32 1021 + 102D 102F 1020 103A my.xml:607 Is that correct?

Sithu Thwin (Inactive) added a comment - 2013-10-18 11:57

See the attachment image, I've written unicode code point reference in
image.
Your understanding is right according to unicode.org collation definition.
But some part might be wrong with that collation definition.
For example u+1034 is never used in Myanmar Language. It is Mon vowel O. So
everything in that group might be related with Mon Language.

Myanmar Language sorting is mostly base on phonetic order. Sometime
phonetic order and alphabetic order are not the same.

regards,
Sithu

On Thu, Oct 17, 2013 at 6:31 PM, Alexander Barkov (JIRA) <

–

Sithu Thwin (Inactive) added a comment - 2013-10-18 11:57 See the attachment image, I've written unicode code point reference in image. Your understanding is right according to unicode.org collation definition. But some part might be wrong with that collation definition. For example u+1034 is never used in Myanmar Language. It is Mon vowel O. So everything in that group might be related with Mon Language. Myanmar Language sorting is mostly base on phonetic order. Sometime phonetic order and alphabetic order are not the same. regards, Sithu On Thu, Oct 17, 2013 at 6:31 PM, Alexander Barkov (JIRA) < –

Sithu Thwin (Inactive) made changes - 2013-10-18 11:57

Attachment

vowel O exception.jpg [ 23918 ]

Alexander Barkov added a comment - 2013-10-18 21:08

So are you Ok with this sorting order:

sid	Code points	where defined
109	1021 + 101B 103A	my.xml:519
110	1021 + 102C 101B 103A	my.xml:521
111	1021 + 1031 101B 103A	my.ml:529
28	1021 + 102D 102F 101B 103A	my.xml:535
112	1021 + 101C 103A	my.xml:537
29	1021 + 102D 102F 101C 103A	my.xml:553
113	1021 + 101E 103A	my.xml:563
30	1021 + 102D 102F 101E 103A	my.xml:580
114	1021 + 101F 103A	my.xml:582
31	1021 + 102D 102F 101F 103A	my.xml:598
32	1021 + 102D 102F 1020 103A	my.xml:607

If we fix the other problems and keep these rules for 103A,
will such collation work fine for you?

Alexander Barkov added a comment - 2013-10-18 21:08 So are you Ok with this sorting order: sid Code points where defined 109 1021 + 101B 103A my.xml:519 110 1021 + 102C 101B 103A my.xml:521 111 1021 + 1031 101B 103A my.ml:529 28 1021 + 102D 102F 101B 103A my.xml:535 112 1021 + 101C 103A my.xml:537 29 1021 + 102D 102F 101C 103A my.xml:553 113 1021 + 101E 103A my.xml:563 30 1021 + 102D 102F 101E 103A my.xml:580 114 1021 + 101F 103A my.xml:582 31 1021 + 102D 102F 101F 103A my.xml:598 32 1021 + 102D 102F 1020 103A my.xml:607 If we fix the other problems and keep these rules for 103A, will such collation work fine for you?

Sithu Thwin (Inactive) added a comment - 2013-10-21 10:52

I'm inviting language and IT professionals to discuss in this ticket. So
wait for their response or I will reply with more detail information about
Myanmar sorting. I'm trying to contact with Myanmar language professors or
tutors from university.

regards,
Sithu

On Sat, Oct 19, 2013 at 12:39 AM, Alexander Barkov (JIRA) <

–

Sithu Thwin (Inactive) added a comment - 2013-10-21 10:52 I'm inviting language and IT professionals to discuss in this ticket. So wait for their response or I will reply with more detail information about Myanmar sorting. I'm trying to contact with Myanmar language professors or tutors from university. regards, Sithu On Sat, Oct 19, 2013 at 12:39 AM, Alexander Barkov (JIRA) < –

Alexander Barkov added a comment - 2013-10-30 10:57

select words from km_alphabet where id <100 order by id

Alexander Barkov added a comment - 2013-10-30 10:57 select words from km_alphabet where id <100 order by id

Alexander Barkov made changes - 2013-10-30 10:57

Attachment

my100.txt [ 24101 ]

Alexander Barkov added a comment - 2013-10-30 11:07 - edited

Hello Sithu,
Any news about the correct Myanmar order?

In the meanwhile I made an experiment:
1. I created a file with the results of this SQL query:

select words from km_alphabet where id <100 order by id;
(see attached).

2. Opened this link (ICU locale explorer for the Myanmar collation):
http://demo.icu-project.org/icu-bin/locexp?_=my_MM&d_=en&x=col

3. Copied the file into clipboard and pasted it into the "source" section.
4. Checked the "Hide Collation Key" (for simpler results output) checkbox.
5. Pressed the "Sort" button
6. Checked the results in the "Original" and "Collated" fields.

The results in the "Collated" field appeared in this order:

02: က
04: ကကတိုး
03: ကကတစ်
05: ကကူရံ
09: ကကြိုးတန်ဆာ
10: ကကြိုးတန်ဆာဆင်
11: ကခုန်
14: ကချေသည်
15: ကချော်ကချင်
16: ကချော်ကချွတ်
12: ကချင်
13: ကချင်ပြည်နယ်
64: ကစား
65: ကစားဒိုင်
66: ကစားဝိုင်း
67: ကစားသမား
68: ကစီ
69: ကစီတင်
70: ကစီရည်
71: ကစော်
61: ကစစ်
63: ကစဉ်ကလျား
62: ကစဉ့်ကလျား
74: ကစွဲကစောင်း
72: ကစွန်း
73: ကစွန်းဥ
75: ကဆုန်
76: ကဆွဲ့ကနွဲ့
85: ကညာ
77: ကညင်
78: ကညင်ဆီ
79: ကညင်တိုင်
80: ကညင်နီ
81: ကညင်ပျံ
82: ကညင်ဖြူ
83: ကညစ်
84: ကညစ်သွား
86: ကညွတ်
87: ကညှပ်
94: ကတိ
95: ကတိကဝတ်
96: ကတိခံ
97: ကတိစောင့်
98: ကတိတည်
99: ကတိထား
91: ကတင်
92: ကတည်းက
93: ကတန်းကရမ်;
01: ကို
08: ကက်
06: ကက္ကုကမည်းပွင့်
07: ကက္ကော်တကန်
17: ကင်
60: ကင်္ကာ
18: ကင်ညှပ်
19: ကင်ပလစ်
20: ကင်ပျစ်
21: ကင်ပွန်း
22: ကင်း
23: ကင်းကစီ
24: ကင်းကိုင်
25: ကင်းကွာ
26: ကင်းခိုး
27: ကင်းချုပ်
28: ကင်းခြေများ
31: ကင်းခွေး
29: ကင်းခွန်
30: ကင်းခွန်ကင်းခ
32: ကင်းစ
33: ကင်းစား
34: ကင်းစီး
36: ကင်းစောင့်
37: ကင်းစောင့်ထား
35: ကင်းစုန်း
40: ကင်းတဲ
38: ကင်းတပ်
39: ကင်းတပ်ဥပဒေ
41: ကင်းထား
42: ကင်းထိုး
43: ကင်းထိုးမြင်းစီး
44: ကင်းထောက်
45: ကင်းနားသန်
46: ကင်းပုစွန်
47: ကင်းဖလောင်ကောင်
48: ကင်းဘူ
49: ကင်းမလက်မည်း
50: ကင်းမြီးကောက်
51: ကင်းရုံ
53: ကင်းလုလင်
52: ကင်းလိပ်ချော
54: ကင်းဝန်
55: ကင်းဝန်းကင်းပတ်လှည့်
58: ကင်းသား
56: ကင်းသင်း
57: ကင်းသန်း
59: ကင်းအုပ်
88: ကဏ္ဍ
89: ကဏ္ဍဇာ
90: ကဏ္ဏမူ

Can you please check if this order is correct? Thanks.

We can reproduce the same order in MariaDB.
But if this order does not look correct, then the collation definition in CLDR is wrong.
We cannot add a collation without having a correct definition.

Alexander Barkov added a comment - 2013-10-30 11:07 - edited Hello Sithu, Any news about the correct Myanmar order? In the meanwhile I made an experiment: 1. I created a file with the results of this SQL query: select words from km_alphabet where id <100 order by id; (see attached). 2. Opened this link (ICU locale explorer for the Myanmar collation): http://demo.icu-project.org/icu-bin/locexp?_=my_MM&d_=en&x=col 3. Copied the file into clipboard and pasted it into the "source" section. 4. Checked the "Hide Collation Key" (for simpler results output) checkbox. 5. Pressed the "Sort" button 6. Checked the results in the "Original" and "Collated" fields. The results in the "Collated" field appeared in this order: 02: က 04: ကကတိုး 03: ကကတစ် 05: ကကူရံ 09: ကကြိုးတန်ဆာ 10: ကကြိုးတန်ဆာဆင် 11: ကခုန် 14: ကချေသည် 15: ကချော်ကချင် 16: ကချော်ကချွတ် 12: ကချင် 13: ကချင်ပြည်နယ် 64: ကစား 65: ကစားဒိုင် 66: ကစားဝိုင်း 67: ကစားသမား 68: ကစီ 69: ကစီတင် 70: ကစီရည် 71: ကစော် 61: ကစစ် 63: ကစဉ်ကလျား 62: ကစဉ့်ကလျား 74: ကစွဲကစောင်း 72: ကစွန်း 73: ကစွန်းဥ 75: ကဆုန် 76: ကဆွဲ့ကနွဲ့ 85: ကညာ 77: ကညင် 78: ကညင်ဆီ 79: ကညင်တိုင် 80: ကညင်နီ 81: ကညင်ပျံ 82: ကညင်ဖြူ 83: ကညစ် 84: ကညစ်သွား 86: ကညွတ် 87: ကညှပ် 94: ကတိ 95: ကတိကဝတ် 96: ကတိခံ 97: ကတိစောင့် 98: ကတိတည် 99: ကတိထား 91: ကတင် 92: ကတည်းက 93: ကတန်းကရမ်; 01: ကို 08: ကက် 06: ကက္ကုကမည်းပွင့် 07: ကက္ကော်တကန် 17: ကင် 60: ကင်္ကာ 18: ကင်ညှပ် 19: ကင်ပလစ် 20: ကင်ပျစ် 21: ကင်ပွန်း 22: ကင်း 23: ကင်းကစီ 24: ကင်းကိုင် 25: ကင်းကွာ 26: ကင်းခိုး 27: ကင်းချုပ် 28: ကင်းခြေများ 31: ကင်းခွေး 29: ကင်းခွန် 30: ကင်းခွန်ကင်းခ 32: ကင်းစ 33: ကင်းစား 34: ကင်းစီး 36: ကင်းစောင့် 37: ကင်းစောင့်ထား 35: ကင်းစုန်း 40: ကင်းတဲ 38: ကင်းတပ် 39: ကင်းတပ်ဥပဒေ 41: ကင်းထား 42: ကင်းထိုး 43: ကင်းထိုးမြင်းစီး 44: ကင်းထောက် 45: ကင်းနားသန် 46: ကင်းပုစွန် 47: ကင်းဖလောင်ကောင် 48: ကင်းဘူ 49: ကင်းမလက်မည်း 50: ကင်းမြီးကောက် 51: ကင်းရုံ 53: ကင်းလုလင် 52: ကင်းလိပ်ချော 54: ကင်းဝန် 55: ကင်းဝန်းကင်းပတ်လှည့် 58: ကင်းသား 56: ကင်းသင်း 57: ကင်းသန်း 59: ကင်းအုပ် 88: ကဏ္ဍ 89: ကဏ္ဍဇာ 90: ကဏ္ဏမူ Can you please check if this order is correct? Thanks. We can reproduce the same order in MariaDB. But if this order does not look correct, then the collation definition in CLDR is wrong. We cannot add a collation without having a correct definition.

Sithu Thwin (Inactive) added a comment - 2013-11-02 19:13

Sorry for delay. My son is in hospital about 1 week because Dengue Fever
infected.

Above 100 words sorting order is correct. Because CLDR definition is
generally correct.
But some exception words might be wrong in some case.
I'm still looking for someone more professional about Myanmar Language.
Thanks

2013/10/30 Alexander Barkov (JIRA) <jira@mariadb.atlassian.net>

–

Sithu Thwin (Inactive) added a comment - 2013-11-02 19:13 Sorry for delay. My son is in hospital about 1 week because Dengue Fever infected. Above 100 words sorting order is correct. Because CLDR definition is generally correct. But some exception words might be wrong in some case. I'm still looking for someone more professional about Myanmar Language. Thanks 2013/10/30 Alexander Barkov (JIRA) <jira@mariadb.atlassian.net> –

Alexander Barkov added a comment - 2013-11-05 11:22

Sithu, I wish your son get well soon.

Please tell me when you are ready to do some more testing.
I made a few enhancements in the collation code, and in
the collation definition.

Alexander Barkov added a comment - 2013-11-05 11:22 Sithu, I wish your son get well soon. Please tell me when you are ready to do some more testing. I made a few enhancements in the collation code, and in the collation definition.

Sithu Thwin (Inactive) added a comment - 2013-11-05 13:06

Thank you, my son is fine now.

I'm ready to test. Do I need to update bzr source code? I will update
source code in my computer tomorrow.

On Tue, Nov 5, 2013 at 3:52 PM, Alexander Barkov (JIRA) <

–

Sithu Thwin (Inactive) added a comment - 2013-11-05 13:06 Thank you, my son is fine now. I'm ready to test. Do I need to update bzr source code? I will update source code in my computer tomorrow. On Tue, Nov 5, 2013 at 3:52 PM, Alexander Barkov (JIRA) < –

Alexander Barkov added a comment - 2013-11-12 15:05

Index.xml, version 2.

Alexander Barkov added a comment - 2013-11-12 15:05 Index.xml, version 2.

Alexander Barkov made changes - 2013-11-12 15:05

Attachment

Index.xml.gz [ 24305 ]

Alexander Barkov added a comment - 2013-11-12 15:08

Hi Sithu,

Please build MariaDB from the latest revision from:
https://launchpad.net/maria/10.0
and put the new version of Index.xml which I've just attached.
Then create your tables and test sorting order again.
Thanks.

Alexander Barkov added a comment - 2013-11-12 15:08 Hi Sithu, Please build MariaDB from the latest revision from: https://launchpad.net/maria/10.0 and put the new version of Index.xml which I've just attached. Then create your tables and test sorting order again. Thanks.

Alexander Barkov added a comment - 2013-11-18 13:36

Hi Sithu, did you have a chance to try the latest version? Thanks.

Alexander Barkov added a comment - 2013-11-18 13:36 Hi Sithu, did you have a chance to try the latest version? Thanks.

Sithu Thwin (Inactive) added a comment - 2013-11-22 10:51

Internet connection in my country is very slow. I'm trying to update
bzr source code. It never complete. So I can't test till now.
I will report as soon as I've updated my source code and send test
result. I'm trying to figure out to fix internet connection.

Thanks.

Sithu Thwin (Inactive) added a comment - 2013-11-22 10:51 Internet connection in my country is very slow. I'm trying to update bzr source code. It never complete. So I can't test till now. I will report as soon as I've updated my source code and send test result. I'm trying to figure out to fix internet connection. Thanks.

Sithu Thwin (Inactive) added a comment - 2013-12-03 10:08

sorry for long time waiting. I've already tested new version. I don't see any improvement. MariaDB version is 10.0.7 .
Which parts did you change in Index.xml? ၎င်း (u104e + u1004 + u103a + u1038) is still at the first place in sorting.

Sithu Thwin (Inactive) added a comment - 2013-12-03 10:08 sorry for long time waiting. I've already tested new version. I don't see any improvement. MariaDB version is 10.0.7 . Which parts did you change in Index.xml? ၎င်း (u104e + u1004 + u103a + u1038) is still at the first place in sorting.

Alexander Barkov added a comment - 2013-12-16 13:39

Make sure to use the version uploaded on 2013-11-12.

There are two differences:
1. This line makes it use Unicode-5.2.0, which has more Myanmar characters defined for sorting.
<collation name="utf8_myanmar_ci" id="220" shift-after-method="expand" version="5.2.0">

2. This line is not commented anymore:
<reset>\u101C\u100A\u103A\u1038\u1000\u1031\u102C\u1004\u103A\u1038</reset><s>\u104E\u1004\u103A\u1038</s>

Please try this query:
mysql> SELECT HEX,HEX(CONVERT(x USING utf8)) as utf8,HEX(WEIGHT_STRING(CONVERT(t1.x USING utf8) COLLATE utf8_myanmar_ci)) AS weight FROM (SELECT _ucs2 X'101C100A103A103810001031102C1004103A1038' AS x UNION SELECT _ucs2 X'104E1004103A1038') AS t1;

The expected result is:
----------------------------------------------------------------------------------------------------------------------------------------------

HEX

utf8

weight

----------------------------------------------------------------------------------------------------------------------------------------------

101C100A103A103810001031102C1004103A1038	E1809CE1808AE180BAE180B8E18080E180B1E180ACE18084E180BAE180B8	220D22483B1322593ACC21CD22483AEE22593ACC
104E1004103A1038	E1818EE18084E180BAE180B8	220D22483B1322593ACC21CD22483AEE22593ACC

----------------------------------------------------------------------------------------------------------------------------------------------

Notice, the "weight" value is the same in the two records.
If you get something else, you're probably still using the old version of Index.xml.

Alexander Barkov added a comment - 2013-12-16 13:39 Make sure to use the version uploaded on 2013-11-12. There are two differences: 1. This line makes it use Unicode-5.2.0, which has more Myanmar characters defined for sorting. <collation name="utf8_myanmar_ci" id="220" shift-after-method="expand" version="5.2.0"> 2. This line is not commented anymore: <reset>\u101C\u100A\u103A\u1038\u1000\u1031\u102C\u1004\u103A\u1038</reset><s>\u104E\u1004\u103A\u1038</s> Please try this query: mysql> SELECT HEX ,HEX(CONVERT(x USING utf8)) as utf8,HEX(WEIGHT_STRING(CONVERT(t1.x USING utf8) COLLATE utf8_myanmar_ci)) AS weight FROM (SELECT _ucs2 X'101C100A103A103810001031102C1004103A1038' AS x UNION SELECT _ucs2 X'104E1004103A1038') AS t1; The expected result is: ----------------------------------------- ------------------------------------------------------------ ----------------------------------------- HEX utf8 weight ----------------------------------------- ------------------------------------------------------------ ----------------------------------------- 101C100A103A103810001031102C1004103A1038 E1809CE1808AE180BAE180B8E18080E180B1E180ACE18084E180BAE180B8 220D22483B1322593ACC21CD22483AEE22593ACC 104E1004103A1038 E1818EE18084E180BAE180B8 220D22483B1322593ACC21CD22483AEE22593ACC ----------------------------------------- ------------------------------------------------------------ ----------------------------------------- Notice, the "weight" value is the same in the two records. If you get something else, you're probably still using the old version of Index.xml.

Sithu Thwin (Inactive) added a comment - 2013-12-17 09:29

Today, I've tested again. Everything work correctly. Sorting rules are
correct according to Unicode Collation Standard.
Thanks
Sithu

On Mon, Dec 16, 2013 at 6:11 PM, Alexander Barkov (JIRA) <

–

Sithu Thwin (Inactive) added a comment - 2013-12-17 09:29 Today, I've tested again. Everything work correctly. Sorting rules are correct according to Unicode Collation Standard. Thanks Sithu On Mon, Dec 16, 2013 at 6:11 PM, Alexander Barkov (JIRA) < –

Sithu Thwin (Inactive) added a comment - 2014-01-09 09:58

I hope Myanmar collation will be included in mariadb 10.* final release

On Tue, Dec 17, 2013 at 2:00 PM, Sithu Thwin (JIRA) <

–

Sithu Thwin (Inactive) added a comment - 2014-01-09 09:58 I hope Myanmar collation will be included in mariadb 10.* final release On Tue, Dec 17, 2013 at 2:00 PM, Sithu Thwin (JIRA) < –

Alexander Barkov added a comment - 2014-01-10 11:47

Hi Sithu.

It was included into 10.0.7.

See the full changelog here:

https://mariadb.com/kb/en/mariadb-1007-changelog/

Thanks for your help to make this happen!

Alexander Barkov added a comment - 2014-01-10 11:47 Hi Sithu. It was included into 10.0.7. See the full changelog here: https://mariadb.com/kb/en/mariadb-1007-changelog/ Thanks for your help to make this happen!

Sithu Thwin (Inactive) added a comment - 2014-01-10 17:55

I'm very happy and all Myanmar web developers will be happy
to hear this news.
Can we use MariaDB 10.0.7 on normal WordPress production site? Or do we
need to wait for final release ?
Thank you very much,
Sithu

On Fri, Jan 10, 2014 at 4:17 PM, Alexander Barkov (JIRA) <

–

Sithu Thwin (Inactive) added a comment - 2014-01-10 17:55 I'm very happy and all Myanmar web developers will be happy to hear this news. Can we use MariaDB 10.0.7 on normal WordPress production site? Or do we need to wait for final release ? Thank you very much, Sithu On Fri, Jan 10, 2014 at 4:17 PM, Alexander Barkov (JIRA) < –

Sergei Golubchik made changes - 2014-01-15 19:01

Fix Version/s		10.0.7 [ 14100 ]
Resolution		Fixed [ 1 ]
Status	Open [ 1 ]	Closed [ 6 ]

Alexander Barkov added a comment - 2014-01-20 09:20

Maria-10.0 is now in beta, which means it can still have some critical
bugs in the new 10.0 code, as well as in the code merged from MySQL-5.6.

10.0 is targeted to be GA in summer 2014.
If you don't use any other new 10.0 features (other than the Myanmar collation)
then it should be safe to use 10.0 beta.

Alexander Barkov added a comment - 2014-01-20 09:20 Maria-10.0 is now in beta, which means it can still have some critical bugs in the new 10.0 code, as well as in the code merged from MySQL-5.6. 10.0 is targeted to be GA in summer 2014. If you don't use any other new 10.0 features (other than the Myanmar collation) then it should be safe to use 10.0 beta.

Sergei Golubchik made changes - 2014-06-13 15:06

Workflow

defaullt [ 28509 ]

MariaDB v2 [ 42673 ]

Rasmus Johansson (Inactive) made changes - 2015-05-18 17:50

Workflow

MariaDB v2 [ 42673 ]

MariaDB v3 [ 61664 ]

Sergei Golubchik made changes - 2021-12-06 21:22

Workflow

MariaDB v3 [ 61664 ]

MariaDB v4 [ 132182 ]

MariaDB Server

Add Myanmar (Burmese) collation

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration