[MDEV-5561] Query Rewrite Plugin API Created: 2014-01-25  Updated: 2023-11-30

Status: Stalled
Project: MariaDB Server
Component/s: None
Fix Version/s: 11.0

Type: New Feature Priority: Major
Reporter: VAROQUI Stephane Assignee: Sergei Golubchik
Resolution: Unresolved Votes: 7
Labels: compat57, gsoc14, gsoc16

Issue Links:
Blocks
blocks MDEV-5562 LOR Cluster Closed
blocks MDEV-5569 Rewrite query to master Uri plugin Closed
blocks MDEV-5570 Spider to promote slave on Master Fai... Closed
Relates
relates to MDEV-19332 When to support MySQL 5.7 Query Rewri... Closed
relates to MDEV-4680 PARSER - new sql function Open

 Description   

Create a plugin API and an example plugin to do query transformations.

This API should not force plugin to parse or work with the SQL string. Instead it may provide a DOM-like representation of the query and let the plugin to manipulate the tree nodes.



 Comments   
Comment by Daniël van Eeden [ 2015-10-17 ]

Please keep it compatible with the MySQL 5.7 if possible/applicable.
https://dev.mysql.com/doc/refman/5.7/en/rewriter-query-rewrite-plugin.html
That plugin is an audit plugin.

Comment by Justin Swanhart (Inactive) [ 2015-10-27 ]

I think the interface should send the SQL as sent by the client to the plugin. There should be a function to call to turn the SQL into an AST and a function call to turn an AST into a SQL statement. The rewrite plugin would return an SQL statement.

This allows for plugins that already have their own parser to simply use the SQL statement. For those plugins that need a parser, they can get an AST representation using the function, modify that, and then convert it back to SQL to execute.

It would be really nice to make the AST conversion/manipulation functions their own open source library as there are no good generic C++ SQL parsers that I'm aware of.

Comment by Erik Cederstrand [ 2016-04-21 ]

This would be a great workaround for MDEV-9964

The syntax in the MySQL plugin is very restricted. You can only rewrite exact matches on the normalized form of a query. This means you can't even rewrite

SELECT * FROM t1 WHERE i in (1, 2, 3, [...])

because you wouldn't know the number of parameter markers to use. A regex syntax (at least as an option) would be much more powerful. Imagine renaming a table and rewriting any query that uses that table. I could find lots of other use cases.

Comment by Kentoku Shiba (Inactive) [ 2019-02-22 ]

I pushed this to bb-10.4-spider-ks. Please review.
c511453

Comment by Kentoku Shiba (Inactive) [ 2019-04-12 ]

Requirement

  • api call before parsing query
  1. for writing rewrited query to general log or not
  2. for choosing rewrited query or original query for parsing and executing
  • api call before writing statement binary log
  1. for choosing rewrited query or original query for writing to binary log
  • api call before writing slow query log
  1. for choosing rewrited query or original query for writing to slow query log
  • api call after writing binary log and slow query log
  1. for releasing memory for rewriting

Not requirement but need to discus that I think

  • api call after parsing query
  1. for changing parsing tree
  • api call after executing query (before writing binary log and slow query log)
  1. for doing something after executing
  • priority per plugin (because multiple rewrite plugins are workable for a query)
Comment by Kentoku Shiba (Inactive) [ 2019-04-12 ]

Need to discus about implementation

  • Expanding audit plugin (current commit)
  • Create new plugin

Current audit plugin is for writing logs, we need to think about it is fitable for query rewrite plugin or not.

Comment by Sergei Golubchik [ 2020-08-17 ]

this implementation from https://github.com/mariadb/server/commit/c511453f2ff22addde52e4fda3795755f52a5a77 has a couple of issues:

  • it used the audit plugin, which is conceptually wrong, audit plugins are essentially only observers by definition and cannot change anything in the query
  • it is very spider-specific, the plugin has minimal changes in the server and all the logic is done inside spider, basically it has a whole SQL parser internally now. We definitely cannot request all query rewrite plugins to implement SQL parsers, we need a much simpler solution.

Let's discuss the API first.

Comment by Bert Scalzo [ 2020-08-17 ]

If the initial version should work 100% the same as the current 5.7 MySQL plug-in - that would be ideal and would keep things very simple. The MySQL implementation essentially has a table with five main columns - id (pk), database, original sql, replacement SQL, and enabled flag. The idea is that a DBA or developer can review the slow log to find a problematic SQL, and then simply insert a row with the original vs the replacement. That allows a simple way for correcting bad SQL where you cannot access the code (i.e. third party tools). Note that the SQL strings do NOT replace constant or literal values with ? or anything. This is acceptable to match MySQL implementation for compatibility.

In the future, maybe we could discuss making this feature smarter and work more like PostgreSQL - where we could consider allowing replacement of constant and literal values with a ?. But remember that if the SQL says select * from employee where col=value, then the value really does matter. If the table has fewer rows matching the value then using the index if better, otherwise doing a full table scan is better. The point is that just using a ? rather than the value would not make such replacement easy. In fact PostgreSQL offers the ability to take the SQL string with ? such as select * from table where col=? and to run a function to return the value to facilitate doing such replacement. So we should discuss and debate before choosing this direction.

In the distant future, having the ability to work like Oracle or SQL Server and creating a named execution plan "stored profile" or "SQL Store" object where the plan is the main object could also be discussed, but there are problem with this approach as well. If I find a query where the original has N where clause conditions such as col1=1 and col2=2, etc and then the rewrite has either N+1 or N-1 conditions in the rewrite, then the optimizer in these other databases sees that difference and uses the original SQL and redoes the execution plan. So these databases will not allow for the tuning person to replace a SQL unless the numbers of conditions with values are identical. That can defaet the prupose of a SQL rewrite entirely. So again recommend that we discuss in more detail.

Note that right now there is a 3rd party product from QIKR called HyperQuery which uses the MySQL plug-in as is to take bad SQL strings found in the slow log, runs an AI engine against a knowledge base to remove all bad programming mistakes and that further tries to rewrite the SQL to be more efficient for the original intent than what was coded. That tool offers magnitudes of performance improvement. That is not because of any issue or flaw with the optimizer, but rather that many SQL coders and tools that generate SQL for many databases (BI tools) create ASCII correct SQL that often does not perform well universally. So having the ability to do a simple replacement as with the MySQL plug-in allows for tools like this to add significant value to the database platform.

Comment by james wang [ 2022-12-23 ]

Can I also suggest that the rewrite plugin able to:

rewrites OR with UNION automatically if ORed tables are different and their columns are indexed?
e.g.
select t1.col2,t2.col3 from table t1 inner join table t2 using id where t1.col1=blah_blah OR t2.col2=blah_blah;
both t1.col1 and t2.col2 are indexed.

replaced with:
(select t1.col2,t2.col3 from table t1 inner join table t2 using id where t1.col1=blah_blah)
UNION
(select t1.col2,t2.col3 from table t1 inner join table t2 using id where t2.col2=blah_blah);

Thanks

Comment by VAROQUI Stephane [ 2022-12-24 ]

@james wang , i think this case is already covered by index merge optimization or is not the case ?
But for sure having an UNION optimization for C1 IN RANGE AND C2 IN RANGE for covering index (C1,C2) could be more efficient compare to intersect or ICP , also in your case UNION does a distinct that could cost a lot more if you don't work on index but on temporary tables with big dataset.

Comment by james wang [ 2022-12-25 ]

@VAROQUI Stephane if ORs for the same table, yes "index merge optimization" covers .

If different tables, nope

Generated at Thu Feb 08 07:05:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.