Details
-
New Feature
-
Status: In Review (View Workflow)
-
Major
-
Resolution: Unresolved
Description
This is something added in MySQL 5.7, so perhaps it should be included in MariaDB.
"ngram and MeCab full-text parser plugins. As of MySQL 5.7.6, MySQL provides a built-in full-text ngram parser plugin that supports Chinese, Japanese, and Korean (CJK).
For more information, see Section 13.9.8, “ngram Full-Text Parser”
http://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html
And the link referenced above is:
http://dev.mysql.com/doc/refman/5.7/en/fulltext-search-ngram.html
The MeCab task previously described here is now listed separated at MDEV-22987
Attachments
Issue Links
- relates to
-
MDEV-22987 Implement MeCab Full-text parser
-
- Closed
-
-
MDEV-32578 row_merge_fts_doc_tokenize() handles FTS plugin parser inconsistently
-
- Closed
-
-
MDEV-10268 Add "MeCab" support to MariaDB
-
- Open
-
greenman, thank you for pointing out the existing design problem with the various innodb_ft_ global variables. It could actually be acceptable to continue on the same path, to avoid the extra development at this stage. I think that we should try to allow the ngram token size to be changed freely between 2 and 10. Preferably it should be done without any server restart (maybe with plugin uninstall/install if necessary) so that we can reasonably write a test that changes the parameter between writes and reads. (In my opinion, it is unreasonable to write a regression test that restarts the entire server many times, because a single restart can easily take 1 or 2 seconds of time.) We must ensure that such index corruption will not lead into any crashes.
I really hope that we can replace the InnoDB FULLTEXT INDEX implementation in some not too distant major release. https://mariadb.com/resources/blog/initial-impressions-of-innodb-fulltext/ pointed out several design issues that I do not think can be addressed easily.