[MDEV-31748] Reuse Simple_tokenizer Created: 2023-07-20  Updated: 2023-12-22

Status: Open
Project: MariaDB Server
Component/s: Variables
Fix Version/s: 11.5

Type: Task Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-30164 System variable for default collations Closed

 Description   

MDEV-30164 introduced a class Simple_tokenizer. It supports loose parsing of a list of name=value pairs:

,,name1=value1, name2 =   value2 ,,, name3=value3,,

  • Any number of spaces are allowed at any place between tokens
  • Empty name=value pairs are allowed, i.e. multiple commas can go in a row. This makes it convenient to "edit" the value of @@character_set_collations, for example, pass it to REGEX_REPLACE() and cut a fragment using a simpler regular expression.

SET @@character_set_collations='big5=big5_bin,latin1=latin1_bin,utf8mb4=utf8mb4_bin';
SELECT REGEXP_REPLACE(@@character_set_collations,'big5=[a-z0-9_]*','');

+-----------------------------------------------------------------+
| REGEXP_REPLACE(@@character_set_collations,'big5=[a-z0-9_]*','') |
+-----------------------------------------------------------------+
|                          ,latin1=latin1_bin,utf8mb4=utf8mb4_bin |
+-----------------------------------------------------------------+

Notice, REGEXP_REPLACE() made an empty pair in the beginning of the result, however SET still understands the result:

SET @@character_set_collations=REGEXP_REPLACE(@@character_set_collations,'big5=[a-z0-9_]*','');
SELECT @@character_set_collations;

+---------------------------------------+
| @@character_set_collations            |
+---------------------------------------+
| latin1=latin1_bin,utf8mb4=utf8mb4_bin |
+---------------------------------------+

Simple_tokenizer is charset-unaware. It expects only ASCII data. It can be reused for some other system variables where we parse pure ASCII data with complex format, such as lists:

  • sql_mode
  • optimizer_switch
  • log_slow_filter
  • myisam_recover_options
  • slave_transaction_retry_errors

For now every variable implement its own tokenizers, so behaviour can vary:

-- Only-spaces are not allowed
SET optimizer_switch=' ';
ERROR 1231 (42000): Variable 'optimizer_switch' can't be set to the value of ' '

-- Empty pairs are not allowed
SET optimizer_switch=',';
ERROR 1231 (42000): Variable 'optimizer_switch' can't be set to the value of ','

-- Spaces before commas are allowed
SET optimizer_switch='index_merge=on ,index_merge_union=on';
Query OK, 0 rows affected (0.000 sec)

-- However spaces after commas are not allowed
SET optimizer_switch='index_merge=on, index_merge_union=on';
ERROR 1231 (42000): Variable 'optimizer_switch' can't be set to the value of ' index_merge_union=on'

Let's reuse the class Simple_tokenizer to:

  • make all variables work in the same style
  • reduce duplicate/similar code

Generated at Thu Feb 08 10:26:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.