Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31748

Reuse Simple_tokenizer

    XMLWordPrintable

Details

    • Task
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • Variables
    • None

    Description

      MDEV-30164 introduced a class Simple_tokenizer. It supports loose parsing of a list of name=value pairs:

      ,,name1=value1, name2 =   value2 ,,, name3=value3,,
      

      • Any number of spaces are allowed at any place between tokens
      • Empty name=value pairs are allowed, i.e. multiple commas can go in a row. This makes it convenient to "edit" the value of @@character_set_collations, for example, pass it to REGEX_REPLACE() and cut a fragment using a simpler regular expression.

      SET @@character_set_collations='big5=big5_bin,latin1=latin1_bin,utf8mb4=utf8mb4_bin';
      SELECT REGEXP_REPLACE(@@character_set_collations,'big5=[a-z0-9_]*','');
      

      +-----------------------------------------------------------------+
      | REGEXP_REPLACE(@@character_set_collations,'big5=[a-z0-9_]*','') |
      +-----------------------------------------------------------------+
      |                          ,latin1=latin1_bin,utf8mb4=utf8mb4_bin |
      +-----------------------------------------------------------------+
      

      Notice, REGEXP_REPLACE() made an empty pair in the beginning of the result, however SET still understands the result:

      SET @@character_set_collations=REGEXP_REPLACE(@@character_set_collations,'big5=[a-z0-9_]*','');
      SELECT @@character_set_collations;
      

      +---------------------------------------+
      | @@character_set_collations            |
      +---------------------------------------+
      | latin1=latin1_bin,utf8mb4=utf8mb4_bin |
      +---------------------------------------+
      

      Simple_tokenizer is charset-unaware. It expects only ASCII data. It can be reused for some other system variables where we parse pure ASCII data with complex format, such as lists:

      • sql_mode
      • optimizer_switch
      • log_slow_filter
      • myisam_recover_options
      • slave_transaction_retry_errors

      For now every variable implement its own tokenizers, so behaviour can vary:

      -- Only-spaces are not allowed
      SET optimizer_switch=' ';
      ERROR 1231 (42000): Variable 'optimizer_switch' can't be set to the value of ' '
      

      -- Empty pairs are not allowed
      SET optimizer_switch=',';
      ERROR 1231 (42000): Variable 'optimizer_switch' can't be set to the value of ','
      

      -- Spaces before commas are allowed
      SET optimizer_switch='index_merge=on ,index_merge_union=on';
      Query OK, 0 rows affected (0.000 sec)
      

      -- However spaces after commas are not allowed
      SET optimizer_switch='index_merge=on, index_merge_union=on';
      ERROR 1231 (42000): Variable 'optimizer_switch' can't be set to the value of ' index_merge_union=on'
      

      Let's reuse the class Simple_tokenizer to:

      • make all variables work in the same style
      • reduce duplicate/similar code

      Attachments

        Issue Links

          Activity

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.