[MDEV-4929] Add Myanmar (Burmese) collation Created: 2013-08-20 Updated: 2014-01-20 Resolved: 2014-01-15 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Fix Version/s: | 10.0.7 |
| Type: | Task | Priority: | Minor |
| Reporter: | Alexander Barkov | Assignee: | Alexander Barkov |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
There is a long standing feature request from Myanmar users: http://bugs.mysql.com/bug.php?id=22008 The users are still waiting for the collation. This task needs |
| Comments |
| Comment by Sithu Thwin (Inactive) [ 2013-10-06 ] | ||||||||||||||||||||||||||||||||||||
|
I want to help. But I don't know how. I've already checkout latest source code from launchpad. Can anyone guide me how to and where to start ? | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-10-14 ] | ||||||||||||||||||||||||||||||||||||
|
Hi Sithu. Sorry for a late reply. We're releasing 10.0.5 soon which will include " Another option is not to wait for 10.0.5. You can download the 10.0.5 sources Please let me know when you're ready. | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-10-14 ] | ||||||||||||||||||||||||||||||||||||
|
I've already downloaded latest sources code from maria-captains launchpad On Mon, Oct 14, 2013 at 2:21 PM, Alexander Barkov (JIRA) < – | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-10-14 ] | ||||||||||||||||||||||||||||||||||||
|
Index.xml with the Myanmar collation definition. | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-10-14 ] | ||||||||||||||||||||||||||||||||||||
|
Please do the following steps:
Thanks. | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-10-14 ] | ||||||||||||||||||||||||||||||||||||
|
I've downloaded source codes using instruction on this link Empty set (0.00 sec) in mysql prompt. On Mon, Oct 14, 2013 at 3:55 PM, Alexander Barkov (JIRA) < – | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-10-15 ] | ||||||||||||||||||||||||||||||||||||
|
Please try this SQL script: drop table if exists t1; What does "show warnings" return? Thanks. | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-10-15 ] | ||||||||||||||||||||||||||||||||||||
|
Which database I should use? mysql or information_scheme ? On Tue, Oct 15, 2013 at 12:59 PM, Alexander Barkov (JIRA) < – | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-10-15 ] | ||||||||||||||||||||||||||||||||||||
|
Non of them. Please use "test" or some other non-system database you have access to. | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-10-15 ] | ||||||||||||||||||||||||||||||||||||
|
Sorry, the "SHOW WARNINGS" output in the above comment does not look right. I guess something happened during copy-and-paste. | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-10-15 ] | ||||||||||||||||||||||||||||||||||||
|
Did you restart the server after replacing Index.xml ? | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-10-15 ] | ||||||||||||||||||||||||||||||||||||
|
Sorry Here is after replacing Index.xml MariaDB [mysql]> drop table if exists t1; MariaDB [mysql]> create table t1 (a varchar(10) character set utf8 collate MariaDB [mysql]> show warnings;Empty set (0.00 sec) MariaDB [mysql]> – | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-10-15 ] | ||||||||||||||||||||||||||||||||||||
|
I run killall mysqld before starting mysql. MariaDB [test]> drop table if exists t1; MariaDB [test]> create table t1 (a varchar(10) character set utf8 collate utf8_myanmar_ci); MariaDB [test]> show warnings; MariaDB [test]> | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-10-15 ] | ||||||||||||||||||||||||||||||||||||
|
So it worked fine. The table has been created. I guess "SHOW COLLATION LIKE 'utf8_m%';" now will also report utf8_myanmar_ci | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-10-15 ] | ||||||||||||||||||||||||||||||||||||
|
Ah, It's working now. Here is the output for SHOW COLLATION LIKE 'utf8_m%'; MariaDB [test]> SHOW COLLATION LIKE 'utf8_m%';
----------------
---------------- MariaDB [test]> I will start testing and checking sorting will work or not. Will report the result. | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-10-15 ] | ||||||||||||||||||||||||||||||||||||
|
Some words not correctly sorted. I will check and make change. Thanks | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-10-15 ] | ||||||||||||||||||||||||||||||||||||
|
> Some words not correctly sorted. Can you please give an example of what is not sorted correctly? > I will check and make change. Are you going to change Index.xml ? > BTW, in myanmar, we have more than 4 ethnic language with different sorting or encoding using same characters. I need to check. What are these languages? | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-10-15 ] | ||||||||||||||||||||||||||||||||||||
|
See attached screenshot. Top word ၎င်း (u104e + u1004 + u103a + u1038) It might also have some other issues, I will check with Myanmar DIctionary Other languages are Shan, Mon, Kayin, Kayah(Kayini). Thanks. On Tue, Oct 15, 2013 at 4:21 PM, Alexander Barkov (JIRA) < – | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-10-15 ] | ||||||||||||||||||||||||||||||||||||
|
The collation engine is currently limited to 6 code points in a single collation element. u+101C u+100A u+103A u+1038 u+1000 u+1031 u+102C u+1004 u+103A u+1038 which makes 10 code points. I'll check how to fix this. Is anything else sorted in a wrong way? | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-10-17 ] | ||||||||||||||||||||||||||||||||||||
|
Images in sorting.zip are sorting guide which is scanned from Myanmar Great SA (ဿ u+103F) must be sorted exactly after SA ( သ u+101E) group. In sql dump file, consonants, syllabus, medial tables are written and ဧ U+1027 has two different ways to sort how the word sound. for example in the word ဧချင်း (u+1027 u+1001 u+103B u+1004 u+103A u+1038) On Tue, Oct 15, 2013 at 5:46 PM, Alexander Barkov (JIRA) < – | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-10-17 ] | ||||||||||||||||||||||||||||||||||||
|
Thanks for the information. I have a question about the order in the table syllables: From my understanding the record with sid=27 must be greater than the record sid=108. Record with sid=108 is consonant u+1021 followed by vovel "u+102C u+101A u+103A". The collation defition was taken from the Common Locale Data Repository: According to this collation definition, the above two vowels <reset>\u1034</reset> (Index.xml:461, my.xml:37) which means \u102C\u101A\u103A is smaller than \u102D\u102F\u101A\u103A, Can you please confirm this? Also, I found that sid 28, 29,30,31, 32 are also not in the alphabetic order: The records should be in this ascending order:
Is that correct? | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-10-18 ] | ||||||||||||||||||||||||||||||||||||
|
See the attachment image, I've written unicode code point reference in Myanmar Language sorting is mostly base on phonetic order. Sometime regards, On Thu, Oct 17, 2013 at 6:31 PM, Alexander Barkov (JIRA) < – | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-10-18 ] | ||||||||||||||||||||||||||||||||||||
|
So are you Ok with this sorting order:
If we fix the other problems and keep these rules for 103A, | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-10-21 ] | ||||||||||||||||||||||||||||||||||||
|
I'm inviting language and IT professionals to discuss in this ticket. So regards, On Sat, Oct 19, 2013 at 12:39 AM, Alexander Barkov (JIRA) < – | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-10-30 ] | ||||||||||||||||||||||||||||||||||||
|
select words from km_alphabet where id <100 order by id | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-10-30 ] | ||||||||||||||||||||||||||||||||||||
|
Hello Sithu, In the meanwhile I made an experiment: select words from km_alphabet where id <100 order by id; 2. Opened this link (ICU locale explorer for the Myanmar collation): 3. Copied the file into clipboard and pasted it into the "source" section. The results in the "Collated" field appeared in this order: 02: က Can you please check if this order is correct? Thanks. We can reproduce the same order in MariaDB. | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-11-02 ] | ||||||||||||||||||||||||||||||||||||
|
Sorry for delay. My son is in hospital about 1 week because Dengue Fever Above 100 words sorting order is correct. Because CLDR definition is 2013/10/30 Alexander Barkov (JIRA) <jira@mariadb.atlassian.net> – | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-11-05 ] | ||||||||||||||||||||||||||||||||||||
|
Sithu, I wish your son get well soon. Please tell me when you are ready to do some more testing. | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-11-05 ] | ||||||||||||||||||||||||||||||||||||
|
Thank you, my son is fine now. I'm ready to test. Do I need to update bzr source code? I will update On Tue, Nov 5, 2013 at 3:52 PM, Alexander Barkov (JIRA) < – | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-11-12 ] | ||||||||||||||||||||||||||||||||||||
|
Index.xml, version 2. | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-11-12 ] | ||||||||||||||||||||||||||||||||||||
|
Hi Sithu, Please build MariaDB from the latest revision from: | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-11-18 ] | ||||||||||||||||||||||||||||||||||||
|
Hi Sithu, did you have a chance to try the latest version? Thanks. | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-11-22 ] | ||||||||||||||||||||||||||||||||||||
|
Internet connection in my country is very slow. I'm trying to update Thanks. | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-12-03 ] | ||||||||||||||||||||||||||||||||||||
|
sorry for long time waiting. I've already tested new version. I don't see any improvement. MariaDB version is 10.0.7 . | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2013-12-16 ] | ||||||||||||||||||||||||||||||||||||
|
Make sure to use the version uploaded on 2013-11-12. There are two differences: 2. This line is not commented anymore: Please try this query: The expected result is:
-----------------------------------------
----------------------------------------- Notice, the "weight" value is the same in the two records. | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2013-12-17 ] | ||||||||||||||||||||||||||||||||||||
|
Today, I've tested again. Everything work correctly. Sorting rules are On Mon, Dec 16, 2013 at 6:11 PM, Alexander Barkov (JIRA) < – | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2014-01-09 ] | ||||||||||||||||||||||||||||||||||||
|
I hope Myanmar collation will be included in mariadb 10.* final release On Tue, Dec 17, 2013 at 2:00 PM, Sithu Thwin (JIRA) < – | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2014-01-10 ] | ||||||||||||||||||||||||||||||||||||
|
Hi Sithu. It was included into 10.0.7. See the full changelog here: https://mariadb.com/kb/en/mariadb-1007-changelog/ Thanks for your help to make this happen! | ||||||||||||||||||||||||||||||||||||
| Comment by Sithu Thwin (Inactive) [ 2014-01-10 ] | ||||||||||||||||||||||||||||||||||||
|
I'm very happy and all Myanmar web developers will be happy On Fri, Jan 10, 2014 at 4:17 PM, Alexander Barkov (JIRA) < – | ||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2014-01-20 ] | ||||||||||||||||||||||||||||||||||||
|
Maria-10.0 is now in beta, which means it can still have some critical 10.0 is targeted to be GA in summer 2014. |