Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23037

Multibyte character sets parse identifiers slow

    XMLWordPrintable

    Details

      Description

      Note, the problem should be repeatable in the versions before 10.3, but they do not support EXECUTE IMMEDIATE.

      Lex_input_stream::scan_ident_start() calls charlen() excessively in case of a multi-byte character set:

        if (use_mb(cs))
        {
          is_8bit= true;
          while (ident_map[c= yyGet()])
          {
            int char_length= my_charlen(cs, get_ptr() - 1, get_end_of_query());
            if (char_length <= 0)
              break;
            skip_binary(char_length - 1);
          }
        }
        else
        {
          is_8bit= get_7bit_or_8bit_ident(thd, &c);
        }
      

      I run an SQL statement with a lot of identifiers consisting of ASCII letters. With multi-byte character sets it gets much slower that with latin1.

      SET NAMES latin1;
      EXECUTE IMMEDIATE CONCAT('SELECT ', REPEAT('1 AS bbbbbbbbbbbbbbbb,',700000),'1 LIMIT 0');
      

      Empty set (1.580 sec)
      

      SET NAMES utf8;
      EXECUTE IMMEDIATE CONCAT('SELECT ', REPEAT('1 AS bbbbbbbbbbbbbbbb,',700000),'1 LIMIT 0');
      

      Empty set (1.765 sec)
      

      SET NAMES sjis;
      EXECUTE IMMEDIATE CONCAT('SELECT ', REPEAT('1 AS bbbbbbbbbbbbbbbb,',700000),'1 LIMIT 0');
      

      Empty set (2.043 sec)
      

      We should consider adding a new virtual function in MY_CHARSET_HANDLER, to scan identifiers in one short.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              bar Alexander Barkov
              Reporter:
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:

                  Git Integration