[MDEV-23037] Multibyte character sets parse identifiers slow Created: 2020-06-29  Updated: 2020-08-01

Status: Open
Project: MariaDB Server
Component/s: Character Sets, Parser
Affects Version/s: 10.3, 10.4, 10.5
Fix Version/s: 10.5

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-23359 Lex_input_stream: buggy code duplicat... Open

 Description   

Note, the problem should be repeatable in the versions before 10.3, but they do not support EXECUTE IMMEDIATE.

Lex_input_stream::scan_ident_start() calls charlen() excessively in case of a multi-byte character set:

  if (use_mb(cs))
  {
    is_8bit= true;
    while (ident_map[c= yyGet()])
    {
      int char_length= my_charlen(cs, get_ptr() - 1, get_end_of_query());
      if (char_length <= 0)
        break;
      skip_binary(char_length - 1);
    }
  }
  else
  {
    is_8bit= get_7bit_or_8bit_ident(thd, &c);
  }

I run an SQL statement with a lot of identifiers consisting of ASCII letters. With multi-byte character sets it gets much slower that with latin1.

SET NAMES latin1;
EXECUTE IMMEDIATE CONCAT('SELECT ', REPEAT('1 AS bbbbbbbbbbbbbbbb,',700000),'1 LIMIT 0');

Empty set (1.580 sec)

SET NAMES utf8;
EXECUTE IMMEDIATE CONCAT('SELECT ', REPEAT('1 AS bbbbbbbbbbbbbbbb,',700000),'1 LIMIT 0');

Empty set (1.765 sec)

SET NAMES sjis;
EXECUTE IMMEDIATE CONCAT('SELECT ', REPEAT('1 AS bbbbbbbbbbbbbbbb,',700000),'1 LIMIT 0');

Empty set (2.043 sec)

We should consider adding a new virtual function in MY_CHARSET_HANDLER, to scan identifiers in one short.


Generated at Thu Feb 08 09:19:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.