Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33481

Fulltext search seems failed at utf8mb4 chars (emoji, etc.)

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 11.2.3
    • None
    • Full-text Search
    • None
    • Docker image "mariadb:11.2.3"

    Description

      With this extra config binded into mariadb container:

      [mysqld]
      innodb_ft_min_token_size=1
      innodb_ft_enable_stopword = OFF
      init-connect='SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci'
      collation-server = utf8mb4_unicode_ci
      [client]
      default-character-set=utf8mb4
      [mysql]
      default-character-set=utf8mb4
      

      And create table with explicit utf8mb4 charset and collate:

      create database test;
      use test;
      create table `test` (`id` bigint(20) not null auto_increment, `text` text character set utf8mb4 collate utf8mb4_unicode_ci default null, primary key (`id`), fulltext key `ftidx` (`text`)) default charset=utf8mb4 collate=utf8mb4_unicode_ci;
      

      If we insert a row like this:

      insert into `test` (`text`) values ('a ε“ˆ πŸ˜‚ 🐧');
      

      And try with like and match ... against:

      MariaDB [test]> select * from test where text like "%πŸ˜‚%";
      +----+-----------------+
      | id | text            |
      +----+-----------------+
      |  1 | a ε“ˆ πŸ˜‚ 🐧        |
      +----+-----------------+
      1 row in set (0.001 sec)
      Β 
      MariaDB [test]> select * from test where match text against ("a");
      +----+-----------------+
      | id | text            |
      +----+-----------------+
      |  1 | a ε“ˆ πŸ˜‚ 🐧        |
      +----+-----------------+
      1 row in set (0.001 sec)
      Β 
      MariaDB [test]> select * from test where match text against ("ε“ˆ");
      +----+-----------------+
      | id | text            |
      +----+-----------------+
      |  1 | a ε“ˆ πŸ˜‚ 🐧        |
      +----+-----------------+
      1 row in set (0.001 sec)
      Β 
      MariaDB [test]> select * from test where match text against ("πŸ˜‚");
      Empty set (0.001 sec)
      

      With min token size setting to 1 and stopword disabled, full text search in mariadb could give correct results when searching "a" or "ε“ˆ" in this case, but searching with single emoji character ("πŸ˜‚") fails.

      This does not look like a configuration mistake, as text like "%πŸ˜‚%" prints out the row without any problem.

      Attachments

        Activity

          People

            Unassigned Unassigned
            taoky Keyu Tao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.