Details

    • Task
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Fixed
    • 10.0.7
    • None
    • None

    Description

      There is a long standing feature request from Myanmar users:

      http://bugs.mysql.com/bug.php?id=22008

      The users are still waiting for the collation.

      This task needs MDEV-4928 to be merged from MySQL-5.6 first.

      Attachments

        1. great_sa.jpg
          great_sa.jpg
          1.80 MB
        2. Index.xml.gz
          7 kB
        3. Index.xml.gz
          7 kB
        4. my100.txt
          2 kB
        5. sort.sql
          777 kB
        6. sorting.png
          sorting.png
          15 kB
        7. sorting.zip
          4.11 MB
        8. sorting.zip
          132 kB
        9. vowel O exception.jpg
          vowel O exception.jpg
          1.28 MB

        Issue Links

          Activity

            bar Alexander Barkov created issue -
            serg Sergei Golubchik made changes -
            Field Original Value New Value
            Description There is a long standing feature request from Myanmar users:

            http://bugs.mysql.com/bug.php?id=22008

            The users are still waiting for the collation:
            https://lists.askmonty.org/pipermailp/info/2013-August/001006.html

            This task needs MDEV-4928 to be merged from MySQL-5.6 first.

            There is a long standing feature request from Myanmar users:

            http://bugs.mysql.com/bug.php?id=22008

            The users are still waiting for the collation.

            This task needs MDEV-4928 to be merged from MySQL-5.6 first.

            serg Sergei Golubchik made changes -
            bar Alexander Barkov made changes -
            Issue Type Bug [ 1 ] Task [ 3 ]

            I want to help. But I don't know how. I've already checkout latest source code from launchpad. Can anyone guide me how to and where to start ?

            herzcthu Sithu Thwin (Inactive) added a comment - I want to help. But I don't know how. I've already checkout latest source code from launchpad. Can anyone guide me how to and where to start ?
            bar Alexander Barkov added a comment - - edited

            Hi Sithu. Sorry for a late reply.

            We're releasing 10.0.5 soon which will include "MDEV-4928 Collation customization improvements",
            which is prerequisite for the Myanmar collation.
            After the release I can give you instructions how to configure mysqld to support Myanmar
            as a dynamically loadable collation, so you can test it. If everything is fine with it, then
            we can include it into the next release (10.0.6) as a built-in collation, so it will work out
            of the box.

            Another option is not to wait for 10.0.5. You can download the 10.0.5 sources
            from Launchpad lp:~maria-captains/maria/10.0 (see https://launchpad.net/maria/10.0 for details),
            so we can start testing right now.

            Please let me know when you're ready.
            Thanks.

            bar Alexander Barkov added a comment - - edited Hi Sithu. Sorry for a late reply. We're releasing 10.0.5 soon which will include " MDEV-4928 Collation customization improvements", which is prerequisite for the Myanmar collation. After the release I can give you instructions how to configure mysqld to support Myanmar as a dynamically loadable collation, so you can test it. If everything is fine with it, then we can include it into the next release (10.0.6) as a built-in collation, so it will work out of the box. Another option is not to wait for 10.0.5. You can download the 10.0.5 sources from Launchpad lp:~maria-captains/maria/10.0 (see https://launchpad.net/maria/10.0 for details), so we can start testing right now. Please let me know when you're ready. Thanks.
            bar Alexander Barkov made changes -
            Comment [ I meant "Sorry for a late reply" of course :) ]

            I've already downloaded latest sources code from maria-captains launchpad
            repo. Where and How do I start testing ?
            Thanks
            Sithu

            On Mon, Oct 14, 2013 at 2:21 PM, Alexander Barkov (JIRA) <

            herzcthu Sithu Thwin (Inactive) added a comment - I've already downloaded latest sources code from maria-captains launchpad repo. Where and How do I start testing ? Thanks Sithu On Mon, Oct 14, 2013 at 2:21 PM, Alexander Barkov (JIRA) < –

            Index.xml with the Myanmar collation definition.

            bar Alexander Barkov added a comment - Index.xml with the Myanmar collation definition.
            bar Alexander Barkov made changes -
            Attachment Index.xml.gz [ 23900 ]
            bar Alexander Barkov added a comment - - edited

            Please do the following steps:

            • Download Index.xml.gz attached in this issue.
            • gunzip Index.xml.gz
            • Go to /share/charsets directory of your MariaDB-10.0.5 installation
            • Save the existing Index.xml: mv Index.xml Index.xml.orig
            • Put the downloaded Index.xml instead of the old one
            • Restart MariaDB server
            • Run this query:
              SHOW COLLATION LIKE 'utf8_m%';
              It should report the new collation utf8_myanmar_ci.
            • Try to create tables, insert some data and try sorting order.

            Thanks.

            bar Alexander Barkov added a comment - - edited Please do the following steps: Download Index.xml.gz attached in this issue. gunzip Index.xml.gz Go to /share/charsets directory of your MariaDB-10.0.5 installation Save the existing Index.xml: mv Index.xml Index.xml.orig Put the downloaded Index.xml instead of the old one Restart MariaDB server Run this query: SHOW COLLATION LIKE 'utf8_m%'; It should report the new collation utf8_myanmar_ci. Try to create tables, insert some data and try sorting order. Thanks.

            I've downloaded source codes using instruction on this link
            https://mariadb.com/kb/en/Getting_the_MariaDB_Source_Code/
            I got maria source tree. I found VERSION file in it and it shows 10.0.4.
            I branched with command bzr branch lp:maria/10.0 . I got new dir 10.0 and
            it's VERSION file still 10.0.4.
            I build mariadb server in 10.0 branch and installed. MySql version still
            show 10.0.4.
            I replaced index.xml and when I run SHOW COLLATION LIKE 'utf8_m%';
            I got

            Empty set (0.00 sec)

            in mysql prompt.
            What am I doing wrong?
            Thanks,
            Sithu

            On Mon, Oct 14, 2013 at 3:55 PM, Alexander Barkov (JIRA) <

            herzcthu Sithu Thwin (Inactive) added a comment - I've downloaded source codes using instruction on this link https://mariadb.com/kb/en/Getting_the_MariaDB_Source_Code/ I got maria source tree. I found VERSION file in it and it shows 10.0.4. I branched with command bzr branch lp:maria/10.0 . I got new dir 10.0 and it's VERSION file still 10.0.4. I build mariadb server in 10.0 branch and installed. MySql version still show 10.0.4. I replaced index.xml and when I run SHOW COLLATION LIKE 'utf8_m%'; I got Empty set (0.00 sec) in mysql prompt. What am I doing wrong? Thanks, Sithu On Mon, Oct 14, 2013 at 3:55 PM, Alexander Barkov (JIRA) < –

            Please try this SQL script:

            drop table if exists t1;
            create table t1 (a varchar(10) character set utf8 collate utf8_myanmar_ci);
            show warnings;

            What does "show warnings" return?

            Thanks.

            bar Alexander Barkov added a comment - Please try this SQL script: drop table if exists t1; create table t1 (a varchar(10) character set utf8 collate utf8_myanmar_ci); show warnings; What does "show warnings" return? Thanks.

            Which database I should use? mysql or information_scheme ?
            Thanks

            On Tue, Oct 15, 2013 at 12:59 PM, Alexander Barkov (JIRA) <

            herzcthu Sithu Thwin (Inactive) added a comment - Which database I should use? mysql or information_scheme ? Thanks On Tue, Oct 15, 2013 at 12:59 PM, Alexander Barkov (JIRA) < –

            Non of them. Please use "test" or some other non-system database you have access to.

            bar Alexander Barkov added a comment - Non of them. Please use "test" or some other non-system database you have access to.

            Sorry, the "SHOW WARNINGS" output in the above comment does not look right.

            I guess something happened during copy-and-paste.
            Can you please paste again?

            bar Alexander Barkov added a comment - Sorry, the "SHOW WARNINGS" output in the above comment does not look right. I guess something happened during copy-and-paste. Can you please paste again?

            Did you restart the server after replacing Index.xml ?

            bar Alexander Barkov added a comment - Did you restart the server after replacing Index.xml ?

            Sorry first output is not the right one. It is before replacing
            Index.xml.

            Here is after replacing Index.xml

            MariaDB [mysql]> drop table if exists t1;
            Query OK, 0 rows affected, 1 warning (0.00 sec)

            MariaDB [mysql]> create table t1 (a varchar(10) character set utf8 collate
            utf8_myanmar_ci);
            Query OK, 0 rows affected (0.35 sec)

            MariaDB [mysql]> show warnings;Empty set (0.00 sec)

            MariaDB [mysql]>

            herzcthu Sithu Thwin (Inactive) added a comment - Sorry first output is not the right one. It is before replacing Index.xml. Here is after replacing Index.xml MariaDB [mysql] > drop table if exists t1; Query OK, 0 rows affected, 1 warning (0.00 sec) MariaDB [mysql] > create table t1 (a varchar(10) character set utf8 collate utf8_myanmar_ci); Query OK, 0 rows affected (0.35 sec) MariaDB [mysql] > show warnings;Empty set (0.00 sec) MariaDB [mysql] > –

            I run killall mysqld before starting mysql.
            Here is output on test database.

            MariaDB [test]> drop table if exists t1;
            Query OK, 0 rows affected (0.18 sec)

            MariaDB [test]> create table t1 (a varchar(10) character set utf8 collate utf8_myanmar_ci);
            Query OK, 0 rows affected (0.38 sec)

            MariaDB [test]> show warnings;
            Empty set (0.00 sec)

            MariaDB [test]>

            herzcthu Sithu Thwin (Inactive) added a comment - I run killall mysqld before starting mysql. Here is output on test database. MariaDB [test] > drop table if exists t1; Query OK, 0 rows affected (0.18 sec) MariaDB [test] > create table t1 (a varchar(10) character set utf8 collate utf8_myanmar_ci); Query OK, 0 rows affected (0.38 sec) MariaDB [test] > show warnings; Empty set (0.00 sec) MariaDB [test] >
            herzcthu Sithu Thwin (Inactive) made changes -
            Comment [ Here is output

            MariaDB [mysql]> drop table if exists t1;
            Query OK, 0 rows affected, 1 warning (0.00 sec)

            MariaDB [mysql]> create table t1 (a varchar(10) character set utf8 collate
            utf8_myanmar_ci);
            ERROR 1273 (HY000): Unknown collation: 'utf8_myanmar_ci'
            MariaDB [mysql]> show warnings;
            +-------+------+--------------------------------------+
            +-------+------+--------------------------------------+
            +-------+------+--------------------------------------+
            1 row in set (0.00 sec)

            MariaDB [mysql]>



            On Tue, Oct 15, 2013 at 1:21 PM, Sithu Thwin (JIRA) <
            jira@mariadb.atlassian.net> wrote:




            --
            ]

            So it worked fine. The table has been created.
            You can now insert some data into it and try sorting.

            I guess "SHOW COLLATION LIKE 'utf8_m%';" now will also report utf8_myanmar_ci
            without problems.

            bar Alexander Barkov added a comment - So it worked fine. The table has been created. You can now insert some data into it and try sorting. I guess "SHOW COLLATION LIKE 'utf8_m%';" now will also report utf8_myanmar_ci without problems.

            Ah, It's working now. Here is the output for SHOW COLLATION LIKE 'utf8_m%';

            MariaDB [test]> SHOW COLLATION LIKE 'utf8_m%';
            -------------------------------------------------+

            Collation Charset Id Default Compiled Sortlen

            -------------------------------------------------+

            utf8_myanmar_ci utf8 220     8

            -------------------------------------------------+
            1 row in set (0.00 sec)

            MariaDB [test]>

            I will start testing and checking sorting will work or not. Will report the result.

            herzcthu Sithu Thwin (Inactive) added a comment - Ah, It's working now. Here is the output for SHOW COLLATION LIKE 'utf8_m%'; MariaDB [test] > SHOW COLLATION LIKE 'utf8_m%'; ---------------- ------- --- ------- -------- --------+ Collation Charset Id Default Compiled Sortlen ---------------- ------- --- ------- -------- --------+ utf8_myanmar_ci utf8 220     8 ---------------- ------- --- ------- -------- --------+ 1 row in set (0.00 sec) MariaDB [test] > I will start testing and checking sorting will work or not. Will report the result.

            Some words not correctly sorted. I will check and make change.
            BTW, in myanmar, we have more than 4 ethnic language with different sorting or encoding using same characters. I want know wheather those language can be added in collation.

            Thanks

            herzcthu Sithu Thwin (Inactive) added a comment - Some words not correctly sorted. I will check and make change. BTW, in myanmar, we have more than 4 ethnic language with different sorting or encoding using same characters. I want know wheather those language can be added in collation. Thanks

            > Some words not correctly sorted.

            Can you please give an example of what is not sorted correctly?
            Please dump your table and attach the dump here into the issue,
            so I can reproduce it on my machine.

            > I will check and make change.

            Are you going to change Index.xml ?
            It might be tricky. The Myanmar collation definition is quite complex.
            Can you please send the dump first?

            > BTW, in myanmar, we have more than 4 ethnic language with different sorting or encoding using same characters.
            > I want know wheather those language can be added in collation.

            I need to check. What are these languages?

            bar Alexander Barkov added a comment - > Some words not correctly sorted. Can you please give an example of what is not sorted correctly? Please dump your table and attach the dump here into the issue, so I can reproduce it on my machine. > I will check and make change. Are you going to change Index.xml ? It might be tricky. The Myanmar collation definition is quite complex. Can you please send the dump first? > BTW, in myanmar, we have more than 4 ethnic language with different sorting or encoding using same characters. > I want know wheather those language can be added in collation. I need to check. What are these languages?

            See attached screenshot. Top word ၎င်း (u104e + u1004 + u103a + u1038)
            should not be sorted as first word. That word has at least 3 form and
            sometime spell even with Myanmar Digit 4. I've attacted my testing script
            and sql dump file. Sql dump's content is taken from hunspell my_MM.dic.

            It might also have some other issues, I will check with Myanmar DIctionary
            Sorting guide and report if found something incorrectly sorting.

            Other languages are Shan, Mon, Kayin, Kayah(Kayini).

            Thanks.

            On Tue, Oct 15, 2013 at 4:21 PM, Alexander Barkov (JIRA) <

            herzcthu Sithu Thwin (Inactive) added a comment - See attached screenshot. Top word ၎င်း (u104e + u1004 + u103a + u1038) should not be sorted as first word. That word has at least 3 form and sometime spell even with Myanmar Digit 4. I've attacted my testing script and sql dump file. Sql dump's content is taken from hunspell my_MM.dic. It might also have some other issues, I will check with Myanmar DIctionary Sorting guide and report if found something incorrectly sorting. Other languages are Shan, Mon, Kayin, Kayah(Kayini). Thanks. On Tue, Oct 15, 2013 at 4:21 PM, Alexander Barkov (JIRA) < –
            herzcthu Sithu Thwin (Inactive) made changes -
            Attachment sorting.zip [ 23905 ]
            Attachment sorting.png [ 23906 ]

            The collation engine is currently limited to 6 code points in a single collation element.
            The first word is not sorted correctly because it should be sorted near:

            u+101C u+100A u+103A u+1038 u+1000 u+1031 u+102C u+1004 u+103A u+1038

            which makes 10 code points. I'll check how to fix this.

            Is anything else sorted in a wrong way?

            bar Alexander Barkov added a comment - The collation engine is currently limited to 6 code points in a single collation element. The first word is not sorted correctly because it should be sorted near: u+101C u+100A u+103A u+1038 u+1000 u+1031 u+102C u+1004 u+103A u+1038 which makes 10 code points. I'll check how to fix this. Is anything else sorted in a wrong way?

            Images in sorting.zip are sorting guide which is scanned from Myanmar
            Dictionary 5 Books series printed/published around 1970. More than 30,000
            people participated in collecting words for that books. The best reference
            book in Myanmar.

            Great SA (ဿ u+103F) must be sorted exactly after SA ( သ u+101E) group.
            (ဌ္ဋ u+101C u+1039 u+100B) must be sorted exactly after ( ဌ u+101C) group.
            It is just one letter with the combination of 3 glymps.

            In sql dump file, consonants, syllabus, medial tables are written and
            ordered as in the images from sorting.zip.

            ဧ U+1027 has two different ways to sort how the word sound.

            for example in the word ဧချင်း (u+1027 u+1001 u+103B u+1004 u+103A u+1038)
            it treated as အေး (u+1021 u+1031 u+1038) ။
            in ဧရာ it treated as အေ (u+1021 u+1031)
            I think this cannot be sorted without wordlist dictionary.

            On Tue, Oct 15, 2013 at 5:46 PM, Alexander Barkov (JIRA) <

            herzcthu Sithu Thwin (Inactive) added a comment - Images in sorting.zip are sorting guide which is scanned from Myanmar Dictionary 5 Books series printed/published around 1970. More than 30,000 people participated in collecting words for that books. The best reference book in Myanmar. Great SA (ဿ u+103F) must be sorted exactly after SA ( သ u+101E) group. (ဌ္ဋ u+101C u+1039 u+100B) must be sorted exactly after ( ဌ u+101C) group. It is just one letter with the combination of 3 glymps. In sql dump file, consonants, syllabus, medial tables are written and ordered as in the images from sorting.zip. ဧ U+1027 has two different ways to sort how the word sound. for example in the word ဧချင်း (u+1027 u+1001 u+103B u+1004 u+103A u+1038) it treated as အေး (u+1021 u+1031 u+1038) ။ in ဧရာ it treated as အေ (u+1021 u+1031) I think this cannot be sorted without wordlist dictionary. On Tue, Oct 15, 2013 at 5:46 PM, Alexander Barkov (JIRA) < –
            herzcthu Sithu Thwin (Inactive) made changes -
            Attachment sorting.zip [ 23910 ]
            Attachment great_sa.jpg [ 23911 ]
            Attachment sort.sql [ 23912 ]
            bar Alexander Barkov added a comment - - edited

            Thanks for the information.

            I have a question about the order in the table syllables:

            From my understanding the record with sid=27 must be greater than the record sid=108.

            Record with sid=108 is consonant u+1021 followed by vovel "u+102C u+101A u+103A".
            Record with sid=27 is consonant u+1021 followed by vovel "u+102D u+102F u+101A u+103A".

            The collation defition was taken from the Common Locale Data Repository:
            http://unicode.org/repos/cldr/tags/release-23/common/collation/my.xml
            (Please download this file for reference).

            According to this collation definition, the above two vowels
            are defined in the same big group, relatively to u+1034,
            in this order:

            <reset>\u1034</reset> (Index.xml:461, my.xml:37)
            ...
            <s>\u102C\u101A\u103A</s> Index.xml:927, my.xml:503)
            ...
            <s>\u102D\u102F\u101A\u103A</s> Index.xml:941, my.xml:517)

            which means \u102C\u101A\u103A is smaller than \u102D\u102F\u101A\u103A,
            which means the record sid=27 is bigger than the record sid=108.

            Can you please confirm this?
            It seems "sid" is not in alphabetic order in the table "syllables".

            Also, I found that sid 28, 29,30,31, 32 are also not in the alphabetic order:

            The records should be in this ascending order:

            sid Code points where defined
            109 1021 + 101B 103A my.xml:519
            110 1021 + 102C 101B 103A my.xml:521
            111 1021 + 1031 101B 103A my.ml:529
            28 1021 + 102D 102F 101B 103A my.xml:535
            112 1021 + 101C 103A my.xml:537
            29 1021 + 102D 102F 101C 103A my.xml:553
            113 1021 + 101E 103A my.xml:563
            30 1021 + 102D 102F 101E 103A my.xml:580
            114 1021 + 101F 103A my.xml:582
            31 1021 + 102D 102F 101F 103A my.xml:598
            32 1021 + 102D 102F 1020 103A my.xml:607

            Is that correct?

            bar Alexander Barkov added a comment - - edited Thanks for the information. I have a question about the order in the table syllables: From my understanding the record with sid=27 must be greater than the record sid=108. Record with sid=108 is consonant u+1021 followed by vovel "u+102C u+101A u+103A". Record with sid=27 is consonant u+1021 followed by vovel "u+102D u+102F u+101A u+103A". The collation defition was taken from the Common Locale Data Repository: http://unicode.org/repos/cldr/tags/release-23/common/collation/my.xml (Please download this file for reference). According to this collation definition, the above two vowels are defined in the same big group, relatively to u+1034, in this order: <reset>\u1034</reset> (Index.xml:461, my.xml:37) ... <s>\u102C\u101A\u103A</s> Index.xml:927, my.xml:503) ... <s>\u102D\u102F\u101A\u103A</s> Index.xml:941, my.xml:517) which means \u102C\u101A\u103A is smaller than \u102D\u102F\u101A\u103A, which means the record sid=27 is bigger than the record sid=108. Can you please confirm this? It seems "sid" is not in alphabetic order in the table "syllables". Also, I found that sid 28, 29,30,31, 32 are also not in the alphabetic order: The records should be in this ascending order: sid Code points where defined 109 1021 + 101B 103A my.xml:519 110 1021 + 102C 101B 103A my.xml:521 111 1021 + 1031 101B 103A my.ml:529 28 1021 + 102D 102F 101B 103A my.xml:535 112 1021 + 101C 103A my.xml:537 29 1021 + 102D 102F 101C 103A my.xml:553 113 1021 + 101E 103A my.xml:563 30 1021 + 102D 102F 101E 103A my.xml:580 114 1021 + 101F 103A my.xml:582 31 1021 + 102D 102F 101F 103A my.xml:598 32 1021 + 102D 102F 1020 103A my.xml:607 Is that correct?

            See the attachment image, I've written unicode code point reference in
            image.
            Your understanding is right according to unicode.org collation definition.
            But some part might be wrong with that collation definition.
            For example u+1034 is never used in Myanmar Language. It is Mon vowel O. So
            everything in that group might be related with Mon Language.

            Myanmar Language sorting is mostly base on phonetic order. Sometime
            phonetic order and alphabetic order are not the same.

            regards,
            Sithu

            On Thu, Oct 17, 2013 at 6:31 PM, Alexander Barkov (JIRA) <

            herzcthu Sithu Thwin (Inactive) added a comment - See the attachment image, I've written unicode code point reference in image. Your understanding is right according to unicode.org collation definition. But some part might be wrong with that collation definition. For example u+1034 is never used in Myanmar Language. It is Mon vowel O. So everything in that group might be related with Mon Language. Myanmar Language sorting is mostly base on phonetic order. Sometime phonetic order and alphabetic order are not the same. regards, Sithu On Thu, Oct 17, 2013 at 6:31 PM, Alexander Barkov (JIRA) < –
            herzcthu Sithu Thwin (Inactive) made changes -
            Attachment vowel O exception.jpg [ 23918 ]

            So are you Ok with this sorting order:

            sid Code points where defined
            109 1021 + 101B 103A my.xml:519
            110 1021 + 102C 101B 103A my.xml:521
            111 1021 + 1031 101B 103A my.ml:529
            28 1021 + 102D 102F 101B 103A my.xml:535
            112 1021 + 101C 103A my.xml:537
            29 1021 + 102D 102F 101C 103A my.xml:553
            113 1021 + 101E 103A my.xml:563
            30 1021 + 102D 102F 101E 103A my.xml:580
            114 1021 + 101F 103A my.xml:582
            31 1021 + 102D 102F 101F 103A my.xml:598
            32 1021 + 102D 102F 1020 103A my.xml:607

            If we fix the other problems and keep these rules for 103A,
            will such collation work fine for you?

            bar Alexander Barkov added a comment - So are you Ok with this sorting order: sid Code points where defined 109 1021 + 101B 103A my.xml:519 110 1021 + 102C 101B 103A my.xml:521 111 1021 + 1031 101B 103A my.ml:529 28 1021 + 102D 102F 101B 103A my.xml:535 112 1021 + 101C 103A my.xml:537 29 1021 + 102D 102F 101C 103A my.xml:553 113 1021 + 101E 103A my.xml:563 30 1021 + 102D 102F 101E 103A my.xml:580 114 1021 + 101F 103A my.xml:582 31 1021 + 102D 102F 101F 103A my.xml:598 32 1021 + 102D 102F 1020 103A my.xml:607 If we fix the other problems and keep these rules for 103A, will such collation work fine for you?

            I'm inviting language and IT professionals to discuss in this ticket. So
            wait for their response or I will reply with more detail information about
            Myanmar sorting. I'm trying to contact with Myanmar language professors or
            tutors from university.

            regards,
            Sithu

            On Sat, Oct 19, 2013 at 12:39 AM, Alexander Barkov (JIRA) <

            herzcthu Sithu Thwin (Inactive) added a comment - I'm inviting language and IT professionals to discuss in this ticket. So wait for their response or I will reply with more detail information about Myanmar sorting. I'm trying to contact with Myanmar language professors or tutors from university. regards, Sithu On Sat, Oct 19, 2013 at 12:39 AM, Alexander Barkov (JIRA) < –

            select words from km_alphabet where id <100 order by id

            bar Alexander Barkov added a comment - select words from km_alphabet where id <100 order by id
            bar Alexander Barkov made changes -
            Attachment my100.txt [ 24101 ]
            bar Alexander Barkov added a comment - - edited

            Hello Sithu,
            Any news about the correct Myanmar order?

            In the meanwhile I made an experiment:
            1. I created a file with the results of this SQL query:

            select words from km_alphabet where id <100 order by id;
            (see attached).

            2. Opened this link (ICU locale explorer for the Myanmar collation):
            http://demo.icu-project.org/icu-bin/locexp?_=my_MM&d_=en&x=col

            3. Copied the file into clipboard and pasted it into the "source" section.
            4. Checked the "Hide Collation Key" (for simpler results output) checkbox.
            5. Pressed the "Sort" button
            6. Checked the results in the "Original" and "Collated" fields.

            The results in the "Collated" field appeared in this order:

            02: က
            04: ကကတိုး
            03: ကကတစ်
            05: ကကူရံ
            09: ကကြိုးတန်ဆာ
            10: ကကြိုးတန်ဆာဆင်
            11: ကခုန်
            14: ကချေသည်
            15: ကချော်ကချင်
            16: ကချော်ကချွတ်
            12: ကချင်
            13: ကချင်ပြည်နယ်
            64: ကစား
            65: ကစားဒိုင်
            66: ကစားဝိုင်း
            67: ကစားသမား
            68: ကစီ
            69: ကစီတင်
            70: ကစီရည်
            71: ကစော်
            61: ကစစ်
            63: ကစဉ်ကလျား
            62: ကစဉ့်ကလျား
            74: ကစွဲကစောင်း
            72: ကစွန်း
            73: ကစွန်းဥ
            75: ကဆုန်
            76: ကဆွဲ့ကနွဲ့
            85: ကညာ
            77: ကညင်
            78: ကညင်ဆီ
            79: ကညင်တိုင်
            80: ကညင်နီ
            81: ကညင်ပျံ
            82: ကညင်ဖြူ
            83: ကညစ်
            84: ကညစ်သွား
            86: ကညွတ်
            87: ကညှပ်
            94: ကတိ
            95: ကတိကဝတ်
            96: ကတိခံ
            97: ကတိစောင့်
            98: ကတိတည်
            99: ကတိထား
            91: ကတင်
            92: ကတည်းက
            93: ကတန်းကရမ်;
            01: ကို
            08: ကက်
            06: ကက္ကုကမည်းပွင့်
            07: ကက္ကော်တကန်
            17: ကင်
            60: ကင်္ကာ
            18: ကင်ညှပ်
            19: ကင်ပလစ်
            20: ကင်ပျစ်
            21: ကင်ပွန်း
            22: ကင်း
            23: ကင်းကစီ
            24: ကင်းကိုင်
            25: ကင်းကွာ
            26: ကင်းခိုး
            27: ကင်းချုပ်
            28: ကင်းခြေများ
            31: ကင်းခွေး
            29: ကင်းခွန်
            30: ကင်းခွန်ကင်းခ
            32: ကင်းစ
            33: ကင်းစား
            34: ကင်းစီး
            36: ကင်းစောင့်
            37: ကင်းစောင့်ထား
            35: ကင်းစုန်း
            40: ကင်းတဲ
            38: ကင်းတပ်
            39: ကင်းတပ်ဥပဒေ
            41: ကင်းထား
            42: ကင်းထိုး
            43: ကင်းထိုးမြင်းစီး
            44: ကင်းထောက်
            45: ကင်းနားသန်
            46: ကင်းပုစွန်
            47: ကင်းဖလောင်ကောင်
            48: ကင်းဘူ
            49: ကင်းမလက်မည်း
            50: ကင်းမြီးကောက်
            51: ကင်းရုံ
            53: ကင်းလုလင်
            52: ကင်းလိပ်ချော
            54: ကင်းဝန်
            55: ကင်းဝန်းကင်းပတ်လှည့်
            58: ကင်းသား
            56: ကင်းသင်း
            57: ကင်းသန်း
            59: ကင်းအုပ်
            88: ကဏ္ဍ
            89: ကဏ္ဍဇာ
            90: ကဏ္ဏမူ

            Can you please check if this order is correct? Thanks.

            We can reproduce the same order in MariaDB.
            But if this order does not look correct, then the collation definition in CLDR is wrong.
            We cannot add a collation without having a correct definition.

            bar Alexander Barkov added a comment - - edited Hello Sithu, Any news about the correct Myanmar order? In the meanwhile I made an experiment: 1. I created a file with the results of this SQL query: select words from km_alphabet where id <100 order by id; (see attached). 2. Opened this link (ICU locale explorer for the Myanmar collation): http://demo.icu-project.org/icu-bin/locexp?_=my_MM&d_=en&x=col 3. Copied the file into clipboard and pasted it into the "source" section. 4. Checked the "Hide Collation Key" (for simpler results output) checkbox. 5. Pressed the "Sort" button 6. Checked the results in the "Original" and "Collated" fields. The results in the "Collated" field appeared in this order: 02: က 04: ကကတိုး 03: ကကတစ် 05: ကကူရံ 09: ကကြိုးတန်ဆာ 10: ကကြိုးတန်ဆာဆင် 11: ကခုန် 14: ကချေသည် 15: ကချော်ကချင် 16: ကချော်ကချွတ် 12: ကချင် 13: ကချင်ပြည်နယ် 64: ကစား 65: ကစားဒိုင် 66: ကစားဝိုင်း 67: ကစားသမား 68: ကစီ 69: ကစီတင် 70: ကစီရည် 71: ကစော် 61: ကစစ် 63: ကစဉ်ကလျား 62: ကစဉ့်ကလျား 74: ကစွဲကစောင်း 72: ကစွန်း 73: ကစွန်းဥ 75: ကဆုန် 76: ကဆွဲ့ကနွဲ့ 85: ကညာ 77: ကညင် 78: ကညင်ဆီ 79: ကညင်တိုင် 80: ကညင်နီ 81: ကညင်ပျံ 82: ကညင်ဖြူ 83: ကညစ် 84: ကညစ်သွား 86: ကညွတ် 87: ကညှပ် 94: ကတိ 95: ကတိကဝတ် 96: ကတိခံ 97: ကတိစောင့် 98: ကတိတည် 99: ကတိထား 91: ကတင် 92: ကတည်းက 93: ကတန်းကရမ်; 01: ကို 08: ကက် 06: ကက္ကုကမည်းပွင့် 07: ကက္ကော်တကန် 17: ကင် 60: ကင်္ကာ 18: ကင်ညှပ် 19: ကင်ပလစ် 20: ကင်ပျစ် 21: ကင်ပွန်း 22: ကင်း 23: ကင်းကစီ 24: ကင်းကိုင် 25: ကင်းကွာ 26: ကင်းခိုး 27: ကင်းချုပ် 28: ကင်းခြေများ 31: ကင်းခွေး 29: ကင်းခွန် 30: ကင်းခွန်ကင်းခ 32: ကင်းစ 33: ကင်းစား 34: ကင်းစီး 36: ကင်းစောင့် 37: ကင်းစောင့်ထား 35: ကင်းစုန်း 40: ကင်းတဲ 38: ကင်းတပ် 39: ကင်းတပ်ဥပဒေ 41: ကင်းထား 42: ကင်းထိုး 43: ကင်းထိုးမြင်းစီး 44: ကင်းထောက် 45: ကင်းနားသန် 46: ကင်းပုစွန် 47: ကင်းဖလောင်ကောင် 48: ကင်းဘူ 49: ကင်းမလက်မည်း 50: ကင်းမြီးကောက် 51: ကင်းရုံ 53: ကင်းလုလင် 52: ကင်းလိပ်ချော 54: ကင်းဝန် 55: ကင်းဝန်းကင်းပတ်လှည့် 58: ကင်းသား 56: ကင်းသင်း 57: ကင်းသန်း 59: ကင်းအုပ် 88: ကဏ္ဍ 89: ကဏ္ဍဇာ 90: ကဏ္ဏမူ Can you please check if this order is correct? Thanks. We can reproduce the same order in MariaDB. But if this order does not look correct, then the collation definition in CLDR is wrong. We cannot add a collation without having a correct definition.

            Sorry for delay. My son is in hospital about 1 week because Dengue Fever
            infected.

            Above 100 words sorting order is correct. Because CLDR definition is
            generally correct.
            But some exception words might be wrong in some case.
            I'm still looking for someone more professional about Myanmar Language.
            Thanks

            2013/10/30 Alexander Barkov (JIRA) <jira@mariadb.atlassian.net>

            herzcthu Sithu Thwin (Inactive) added a comment - Sorry for delay. My son is in hospital about 1 week because Dengue Fever infected. Above 100 words sorting order is correct. Because CLDR definition is generally correct. But some exception words might be wrong in some case. I'm still looking for someone more professional about Myanmar Language. Thanks 2013/10/30 Alexander Barkov (JIRA) <jira@mariadb.atlassian.net> –

            Sithu, I wish your son get well soon.

            Please tell me when you are ready to do some more testing.
            I made a few enhancements in the collation code, and in
            the collation definition.

            bar Alexander Barkov added a comment - Sithu, I wish your son get well soon. Please tell me when you are ready to do some more testing. I made a few enhancements in the collation code, and in the collation definition.

            Thank you, my son is fine now.

            I'm ready to test. Do I need to update bzr source code? I will update
            source code in my computer tomorrow.

            On Tue, Nov 5, 2013 at 3:52 PM, Alexander Barkov (JIRA) <

            herzcthu Sithu Thwin (Inactive) added a comment - Thank you, my son is fine now. I'm ready to test. Do I need to update bzr source code? I will update source code in my computer tomorrow. On Tue, Nov 5, 2013 at 3:52 PM, Alexander Barkov (JIRA) < –

            Index.xml, version 2.

            bar Alexander Barkov added a comment - Index.xml, version 2.
            bar Alexander Barkov made changes -
            Attachment Index.xml.gz [ 24305 ]

            Hi Sithu,

            Please build MariaDB from the latest revision from:
            https://launchpad.net/maria/10.0
            and put the new version of Index.xml which I've just attached.
            Then create your tables and test sorting order again.
            Thanks.

            bar Alexander Barkov added a comment - Hi Sithu, Please build MariaDB from the latest revision from: https://launchpad.net/maria/10.0 and put the new version of Index.xml which I've just attached. Then create your tables and test sorting order again. Thanks.

            Hi Sithu, did you have a chance to try the latest version? Thanks.

            bar Alexander Barkov added a comment - Hi Sithu, did you have a chance to try the latest version? Thanks.

            Internet connection in my country is very slow. I'm trying to update
            bzr source code. It never complete. So I can't test till now.
            I will report as soon as I've updated my source code and send test
            result. I'm trying to figure out to fix internet connection.

            Thanks.

            herzcthu Sithu Thwin (Inactive) added a comment - Internet connection in my country is very slow. I'm trying to update bzr source code. It never complete. So I can't test till now. I will report as soon as I've updated my source code and send test result. I'm trying to figure out to fix internet connection. Thanks.

            sorry for long time waiting. I've already tested new version. I don't see any improvement. MariaDB version is 10.0.7 .
            Which parts did you change in Index.xml? ၎င်း (u104e + u1004 + u103a + u1038) is still at the first place in sorting.

            herzcthu Sithu Thwin (Inactive) added a comment - sorry for long time waiting. I've already tested new version. I don't see any improvement. MariaDB version is 10.0.7 . Which parts did you change in Index.xml? ၎င်း (u104e + u1004 + u103a + u1038) is still at the first place in sorting.

            Make sure to use the version uploaded on 2013-11-12.

            There are two differences:
            1. This line makes it use Unicode-5.2.0, which has more Myanmar characters defined for sorting.
            <collation name="utf8_myanmar_ci" id="220" shift-after-method="expand" version="5.2.0">

            2. This line is not commented anymore:
            <reset>\u101C\u100A\u103A\u1038\u1000\u1031\u102C\u1004\u103A\u1038</reset><s>\u104E\u1004\u103A\u1038</s>

            Please try this query:
            mysql> SELECT HEX,HEX(CONVERT(x USING utf8)) as utf8,HEX(WEIGHT_STRING(CONVERT(t1.x USING utf8) COLLATE utf8_myanmar_ci)) AS weight FROM (SELECT _ucs2 X'101C100A103A103810001031102C1004103A1038' AS x UNION SELECT _ucs2 X'104E1004103A1038') AS t1;

            The expected result is:
            ----------------------------------------------------------------------------------------------------------------------------------------------

            HEX utf8 weight

            ----------------------------------------------------------------------------------------------------------------------------------------------

            101C100A103A103810001031102C1004103A1038 E1809CE1808AE180BAE180B8E18080E180B1E180ACE18084E180BAE180B8 220D22483B1322593ACC21CD22483AEE22593ACC
            104E1004103A1038 E1818EE18084E180BAE180B8 220D22483B1322593ACC21CD22483AEE22593ACC

            ----------------------------------------------------------------------------------------------------------------------------------------------

            Notice, the "weight" value is the same in the two records.
            If you get something else, you're probably still using the old version of Index.xml.

            bar Alexander Barkov added a comment - Make sure to use the version uploaded on 2013-11-12. There are two differences: 1. This line makes it use Unicode-5.2.0, which has more Myanmar characters defined for sorting. <collation name="utf8_myanmar_ci" id="220" shift-after-method="expand" version="5.2.0"> 2. This line is not commented anymore: <reset>\u101C\u100A\u103A\u1038\u1000\u1031\u102C\u1004\u103A\u1038</reset><s>\u104E\u1004\u103A\u1038</s> Please try this query: mysql> SELECT HEX ,HEX(CONVERT(x USING utf8)) as utf8,HEX(WEIGHT_STRING(CONVERT(t1.x USING utf8) COLLATE utf8_myanmar_ci)) AS weight FROM (SELECT _ucs2 X'101C100A103A103810001031102C1004103A1038' AS x UNION SELECT _ucs2 X'104E1004103A1038') AS t1; The expected result is: ----------------------------------------- ------------------------------------------------------------ ----------------------------------------- HEX utf8 weight ----------------------------------------- ------------------------------------------------------------ ----------------------------------------- 101C100A103A103810001031102C1004103A1038 E1809CE1808AE180BAE180B8E18080E180B1E180ACE18084E180BAE180B8 220D22483B1322593ACC21CD22483AEE22593ACC 104E1004103A1038 E1818EE18084E180BAE180B8 220D22483B1322593ACC21CD22483AEE22593ACC ----------------------------------------- ------------------------------------------------------------ ----------------------------------------- Notice, the "weight" value is the same in the two records. If you get something else, you're probably still using the old version of Index.xml.

            Today, I've tested again. Everything work correctly. Sorting rules are
            correct according to Unicode Collation Standard.
            Thanks
            Sithu

            On Mon, Dec 16, 2013 at 6:11 PM, Alexander Barkov (JIRA) <

            herzcthu Sithu Thwin (Inactive) added a comment - Today, I've tested again. Everything work correctly. Sorting rules are correct according to Unicode Collation Standard. Thanks Sithu On Mon, Dec 16, 2013 at 6:11 PM, Alexander Barkov (JIRA) < –

            I hope Myanmar collation will be included in mariadb 10.* final release

            On Tue, Dec 17, 2013 at 2:00 PM, Sithu Thwin (JIRA) <

            herzcthu Sithu Thwin (Inactive) added a comment - I hope Myanmar collation will be included in mariadb 10.* final release On Tue, Dec 17, 2013 at 2:00 PM, Sithu Thwin (JIRA) < –

            Hi Sithu.

            It was included into 10.0.7.

            See the full changelog here:

            https://mariadb.com/kb/en/mariadb-1007-changelog/

            Thanks for your help to make this happen!

            bar Alexander Barkov added a comment - Hi Sithu. It was included into 10.0.7. See the full changelog here: https://mariadb.com/kb/en/mariadb-1007-changelog/ Thanks for your help to make this happen!

            I'm very happy and all Myanmar web developers will be happy
            to hear this news.
            Can we use MariaDB 10.0.7 on normal WordPress production site? Or do we
            need to wait for final release ?
            Thank you very much,
            Sithu

            On Fri, Jan 10, 2014 at 4:17 PM, Alexander Barkov (JIRA) <

            herzcthu Sithu Thwin (Inactive) added a comment - I'm very happy and all Myanmar web developers will be happy to hear this news. Can we use MariaDB 10.0.7 on normal WordPress production site? Or do we need to wait for final release ? Thank you very much, Sithu On Fri, Jan 10, 2014 at 4:17 PM, Alexander Barkov (JIRA) < –
            serg Sergei Golubchik made changes -
            Fix Version/s 10.0.7 [ 14100 ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]

            Maria-10.0 is now in beta, which means it can still have some critical
            bugs in the new 10.0 code, as well as in the code merged from MySQL-5.6.

            10.0 is targeted to be GA in summer 2014.
            If you don't use any other new 10.0 features (other than the Myanmar collation)
            then it should be safe to use 10.0 beta.

            bar Alexander Barkov added a comment - Maria-10.0 is now in beta, which means it can still have some critical bugs in the new 10.0 code, as well as in the code merged from MySQL-5.6. 10.0 is targeted to be GA in summer 2014. If you don't use any other new 10.0 features (other than the Myanmar collation) then it should be safe to use 10.0 beta.
            serg Sergei Golubchik made changes -
            Workflow defaullt [ 28509 ] MariaDB v2 [ 42673 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Workflow MariaDB v2 [ 42673 ] MariaDB v3 [ 61664 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 61664 ] MariaDB v4 [ 132182 ]

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.