Details
-
Task
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
There is a bottleneck in how string literals are processed in the parser.
It especially affects long BLOB/TEXT values.
1. The function get_text() is sql_lex.cc allocates an unescaped copy of every string literal (unescaping backslashes and double quotes, if any). Strangely, copying happens even if there are no really any escapes in the string.
2. The syntax parser in sql_yacc.yy creates Item_string using the new unescaped buffer.
Furthermore, in case of a multi-byte connection character set (e.g. utf8), the constructor for Item_string performs another loop on the unescaped buffer, to calculate length in characters, which is needed to set max_length properly.
I would be nice to create Items using directly the SQL fragment, without making a copy, including escaped values.
Length in characters can also be calculated during the very first pass in get_text(), without any additional loops in the Item constructors.
Unescaping can be done in the very end, when the value is actually needed:
- Either in Field::store(), if the string value is used for:
INSERT INTO t1 VALUES('string');
Unescaping should be done directly to the Field buffer, without any intermediary temporary storage.
- Or in val_str(), if the string value is used elsewhere (in SELECT list, functions, operators, etc).
The unescaped value should be cached, to make sure that val_str() does not do unescaping multiple times (e.g.per multiple rows), like in:
SELECT * FROM t1 WHERE a='string with backslash or quote escapes'; |