[MXS-558] Investigation of using SQLite as query classifier. Created: 2016-01-26  Updated: 2016-03-08  Resolved: 2016-03-08

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: None
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Johan Wikman Assignee: Johan Wikman
Resolution: Done Votes: 0
Labels: None

Epic Link: Pluggable Parser
Sprint: 1.4.0-2, 2016-3, 2016-4

 Description   

Investigate whether the tokenizer/parser of SQLite can be used as query classifier.



 Comments   
Comment by Johan Wikman [ 2016-02-09 ]

It is non-trivial to use the parser of sqlite.

Sqlite is structured so that the tokenizer calls the parser that calls the code generator that generates code for a virtual machine. That code is then subsequently executed. And these steps are closely intertwined; for instance, using only the tokenizer and parser is not readily possible.

The general architecture of sqlite is shown here: https://www.sqlite.org/arch.html

It appears that in order to use the sqlite tokenizer and parser you would essentially have to replace the code generator part. Unfortunately there is no single point to do that but as the documentation on the page above says:

There are many files in the code generator: attach.c, auth.c, build.c, delete.c, expr.c, insert.c, pragma.c, select.c, trigger.c, update.c, vacuum.c and where.c. In these files is where most of the serious magic happens. expr.c handles code generation for expressions. where.c handles code generation for WHERE clauses on SELECT, UPDATE and DELETE statements. The files attach.c, delete.c, insert.c, select.c, trigger.c update.c, and vacuum.c handle the code generation for SQL statements with the same names. (Each of these files calls routines in expr.c and where.c as necessary.) All other SQL statements are coded out of build.c.

Comment by Johan Wikman [ 2016-02-18 ]

Take a second look.

Comment by Johan Wikman [ 2016-02-19 ]

The general architecture is shown here: https://www.sqlite.org/arch.html

Sqlite does things backwards compared to how things usually are done; the
SQL statement is given to the tokenizer, which calls the parser, which
calls the code generator. The parser for SQLite is generated using Lemon,
a parser generator that like sqlite itself is created by Richard Hipp.

If the actual parsing goes ok, then parser calls into a specific function
depending on the statement. For instance, a simple select statement ends
up with a call to

int sqlite3Select(Parse* pParse, Select* p, SelectDest* pDest);

At that point, p contains information about the select statement and
could be used for extracting whatever is interesting.

However, what that sqlite3Select then does is to generate code for
sqlite's virtual machine and at that point it is also checked whether the
statement really can be executed.

For instance, any select will at that point fail unless e.g. all tables
referred to exist in the dummy sqlite database that needs to be created
in order to at all be able to parse something.

That Select structure is subsequently deleted (irrespective of the
outcome of sqlite3Select) when sqlite3Select returns.

So, it just might be possible to use sqlite's parser by replacing
sqlite3Select and all other relevant functions (at this point I can't say
how many there would be), with our own versions that merely would record
that information (in some thread specific location) and then report
failure.

Comment by Johan Wikman [ 2016-03-08 ]

After all, it seems like using the parser of sqlite when doing query classification is an option. The next steps would be to build sqlite as part of MaxScale, and create a query classifier plugin that uses sqlite.

Generated at Thu Feb 08 04:00:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.