Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6105

Emoji unicode character string search query makes mariadb performance down

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 5.5.37
    • 5.5.38, 10.0.12
    • None
    • None
    • CentOS 5.6 x86_64

    Description

      4 bytes emoji unicode character string search query on UTF8 table
      could make mariadb performance down, if there's a lot of empty string('') index key.

      Test case

      CREATE TABLE test(
        id int AUTO_INCREMENT,
        fd varchar(20),
        primary key(id),
        index ix_fd(fd)
      )engine=innodb default charset=UTF8;
       
      INSERT INTO test(id, fd) VALUES (null, ''), (null, 'matt'), (null, 'pitt'), (null, 'lee'), (null, 'kim');
      # Run 20 times below query
         INSERT INTO test SELECT * FROM test;
       
      SELECT * FROM test WHERE fd=*some_4bytes_unicode_char*;

      ref emoji unicode character from below site
      http://apps.timwhitlock.info/emoji/tables/unicode
      ------------------------------------------------------------

      4bytes emoji unicode character can't convert to utf8,
      But mariadb doesn't check string conversion function's return code. (Check stack trace below)
      And Xtradb will search empty string key from ix_fd index.
      If ix_fd index has a lot of empty string key, then innodb will read every empty string key's record and pass it to mariadb engine.
      But mariadb will check it with original 4 bytes emoji character. (evaluate_join_record()

      {... select_cond_result= test(select_cond->val_int()); ...}

      function of sql_select.cc)

      So, client can't get anything from the query, but query takes a lot of time.
      iF there's many same type of query, MariaDB server would be busy to read useless record from the table.

      Stack trace of copy of use input (4 bytes emoji string)

      Thread [18] 21629 [core: 1] (Suspended : Step)	
      	Field_varstring::store() at field.cc:6,586 0x6d989c ==> RETURN 2	
      	copy_inner() at sql_select.h:1,662 0x5cc27c	
      	copy() at sql_select.h:1,552 0x5cc27c	
      	create_ref_for_key() at sql_select.cc:8,102 0x5cc27 ==> Ignore return value on "sql_select.cc:8102  tmp.copy();"
      	get_best_combination() at sql_select.cc:7,775 0x5d820f	
      	make_join_statistics() at sql_select.cc:3,799 0x5ed818	
      	JOIN::optimize() at sql_select.cc:1,218 0x5f0522	
      	mysql_select() at sql_select.cc:3,065 0x5f943b	
      	handle_select() at sql_select.cc:319 0x5f943b	
      	execute_sqlcom_select() at sql_parse.cc:4,689 0x59dcad	
      	<...more frames...>

      ==> Suggest
      on tmp.copy() code line of sql_select.cc::create_ref_for_key(), MariaDB should check return value of copy().
      and if there's some conversion error, MariaDB should throw ERROR not WARNING. or throw ERROR when STRICT_ALL_TABLES sql_mode is on.

      Attachments

        Issue Links

          Activity

            ... Now, if we want index lookups to work in the same way as comparison works with non-indexed columns:

            • "key_col= $unrepresentable_char" is always false, because $unrepresentable_char will not be byte-by-byte equal to value of key_col (attempt to INSERT an $unrepresentable_char into the table will force its conversion into a question mark)
            • as for "key_col < $unrepresentable_char", and other comparisons: byte-by-byte comparison doesn't map into any range in UTF-8 collation.
            psergei Sergei Petrunia added a comment - ... Now, if we want index lookups to work in the same way as comparison works with non-indexed columns: "key_col= $unrepresentable_char" is always false, because $unrepresentable_char will not be byte-by-byte equal to value of key_col (attempt to INSERT an $unrepresentable_char into the table will force its conversion into a question mark) as for "key_col < $unrepresentable_char", and other comparisons: byte-by-byte comparison doesn't map into any range in UTF-8 collation.

            For non-equality comparisons, index scans are not equivalent to evaluating the WHERE (which is a separate bug):

            MariaDB [j5]> select count(*) from t1 where fd <'��';
            +----------+
            | count(*) |
            +----------+
            |    16384 |
            +----------+
            1 row in set, 1 warning (0.08 sec)
             
            MariaDB [j5]> select count(*) from t1 ignore index (ix_fd) where fd <'��';
            +----------+
            | count(*) |
            +----------+
            |    81920 |
            +----------+
            1 row in set (0.37 sec)

            psergei Sergei Petrunia added a comment - For non-equality comparisons, index scans are not equivalent to evaluating the WHERE (which is a separate bug): MariaDB [j5]> select count(*) from t1 where fd <'��'; +----------+ | count(*) | +----------+ | 16384 | +----------+ 1 row in set, 1 warning (0.08 sec)   MariaDB [j5]> select count(*) from t1 ignore index (ix_fd) where fd <'��'; +----------+ | count(*) | +----------+ | 81920 | +----------+ 1 row in set (0.37 sec)

            Possible solution:

            • "key_col=$unrepresentable_char" should produce SEL_TREE(SEL_TREE::IMPOSSIBLE)
            • all other "key_col CMP $unrepresentable_char" should produce NULL (i.e "cant use this for range access")

            the second may cause queries to run slower than they ran before, but they will not produce incorrect results (like shown in the previous comment).

            psergei Sergei Petrunia added a comment - Possible solution: "key_col=$unrepresentable_char" should produce SEL_TREE(SEL_TREE::IMPOSSIBLE) all other "key_col CMP $unrepresentable_char" should produce NULL (i.e "cant use this for range access") the second may cause queries to run slower than they ran before, but they will not produce incorrect results (like shown in the previous comment).

            Oracle's MySQL has some related fixes:

            • Bug#11752201 ( jorgen.loland@oracle.com-20120525130050-mejp6m431gyx2z0d) - it fixes a different problem, but the effect is that they don't get the wrong query results like shown two comments earlier
            • A refactoring of Item::save_in_field and co.: jorgen.loland@oracle.com-20120507082918-pex86bcedbzemtsb .
            psergei Sergei Petrunia added a comment - Oracle's MySQL has some related fixes: Bug#11752201 ( jorgen.loland@oracle.com-20120525130050-mejp6m431gyx2z0d) - it fixes a different problem, but the effect is that they don't get the wrong query results like shown two comments earlier A refactoring of Item::save_in_field and co.: jorgen.loland@oracle.com-20120507082918-pex86bcedbzemtsb .

            Pushed a "simple" fix into 5.5, without backport of 5.6's refactoring. (porting 5.6's refactoring would not fully this this bug, so it would be overdoing it. refactoring should be pushed into 10.0)

            psergei Sergei Petrunia added a comment - Pushed a "simple" fix into 5.5, without backport of 5.6's refactoring. (porting 5.6's refactoring would not fully this this bug, so it would be overdoing it. refactoring should be pushed into 10.0)

            People

              psergei Sergei Petrunia
              Matt74 Seunguck Lee
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.