[MDEV-6643] Improve performance of string processing in the parser Created: 2014-08-26  Updated: 2023-11-13

Status: Stalled
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Alexander Barkov Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-6520 Unexpected syntax error with VIEW + g... Open
Relates
relates to MDEV-6218 Wrong result of CHAR_LENGTH(non-BMP-c... Stalled
relates to MDEV-8922 Bug#20238729 MySQL sometimes produces... Open

 Description   

There is a bottleneck in how string literals are processed in the parser.
It especially affects long BLOB/TEXT values.

1. The function get_text() is sql_lex.cc allocates an unescaped copy of every string literal (unescaping backslashes and double quotes, if any). Strangely, copying happens even if there are no really any escapes in the string.

2. The syntax parser in sql_yacc.yy creates Item_string using the new unescaped buffer.
Furthermore, in case of a multi-byte connection character set (e.g. utf8), the constructor for Item_string performs another loop on the unescaped buffer, to calculate length in characters, which is needed to set max_length properly.

I would be nice to create Items using directly the SQL fragment, without making a copy, including escaped values.

Length in characters can also be calculated during the very first pass in get_text(), without any additional loops in the Item constructors.

Unescaping can be done in the very end, when the value is actually needed:

  • Either in Field::store(), if the string value is used for:

    INSERT INTO t1 VALUES('string');

    Unescaping should be done directly to the Field buffer, without any intermediary temporary storage.

  • Or in val_str(), if the string value is used elsewhere (in SELECT list, functions, operators, etc).

The unescaped value should be cached, to make sure that val_str() does not do unescaping multiple times (e.g.per multiple rows), like in:

SELECT * FROM t1 WHERE a='string with backslash or quote escapes';



 Comments   
Comment by Alexander Barkov [ 2014-10-28 ]

Sent a new version for review, with the most important problems fixed:

  • removed a number of small classes (grouped them into bigger ones)
  • moved most of the new code into a separate file sql_strconv.h

Now it should be somewhat easier to review.

Generated at Thu Feb 08 07:13:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.