[MDEV-6882] TOKENIZE query Created: 2014-10-16  Updated: 2022-12-06

Status: Open
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: VAROQUI Stephane Assignee: Unassigned
Resolution: Unresolved Votes: 8
Labels: None

Issue Links:
Duplicate
duplicates MDEV-5596 Create a statement to check if a quer... Closed
Relates
relates to MDEV-4680 PARSER - new sql function Open
relates to MDEV-26634 Feature request: add STATEMENT_DIGEST... Closed

 Description   

Derived from the "EXPLAIN query" "TOKENIZE query" print a resultset like

TOKEN  | TOKEN_TYPE 
select | T_SQL
c1     | T_COLUMN 
from   | T_SQL
t1     | T_TABLE
where  | T_SQL
c2     | T_COLUMN 
=      | T_SQL
3      | T_CONST



 Comments   
Comment by Sergei Petrunia [ 2014-10-19 ]

The question is whether the server should be used to do it. MaxScale has a similar feature where it uses MariaDB's parser to parse the query and then replaces constants with '?'.

Comment by Sergei Petrunia [ 2014-10-19 ]

More details about how MaxScale does it:

see skygw_get_canonical(). It walks through thd->free_list and replaces Item::STRING_ITEM, Item::INT_ITEM, Item::DECIMAL_ITEM etc with '?'.

I have a doubt about how it does this, though. It calls replace_literal(). Is there a warranty that it replaces the right occurence of the literal?

Comment by Sergei Petrunia [ 2014-10-19 ]

The technique used by maxscale to catch constants is not applicable to
table/column names. The problem is, Item_field's db_name, table_name,
field_name point to the data in temporary buffers. They do not point into the
query string.

The copying is done in sql_lex.cc, get_token(), get_quoted_token().

So, if we want to have info about where "table.column" was located in the
original query, it needs to be saved here.

One way to save it would be to add another element into %union and then the
lexer, instead of just doing assignments like

      yylval->lex_str=get_token(lip, 0, length);

should also save the data about the source's location.

Comment by Sergei Petrunia [ 2014-10-19 ]

The above says how to get info about token locations from the lexer.

Lexer itself doesn't know about whether the tokens are table names or column
names or something else. So, we need to pass this info to the parser
(sql_yacc.yy).

In the parser, when we use a token as e.g. a table name, we could record that
somewhere in THD. Then, after the parsing is complete, we would know which
bytes in the original query text were table names or column names, or something
else.

Generated at Thu Feb 08 07:15:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.