[MDEV-13369] Optimization for equi-joins of derived tables with GROUP BY Created: 2017-07-21 Updated: 2024-01-17 Resolved: 2018-01-19 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Fix Version/s: | 10.3.4 |
| Type: | Task | Priority: | Major |
| Reporter: | Igor Babaev | Assignee: | Igor Babaev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Sprint: | 10.2.12 | ||||||||||||||||
| Description |
|
Consider the following tables
and the query
The following re-writing could be applied to this query:
The result of this re-writing is a query with so-called lateral derived table. This query requires refilling of the temporary table created the derived table for every new value t1.a. As the size of the derived table tl usually is much smaller than the size of the derived table t this transformation will be always beneficial. Especially this transformation will be beneficial when the join operation that joins derived table is used in a multi-way join query where only a few records of t1 are selected. The difference of this task from the task |
| Comments |
| Comment by Igor Babaev [ 2017-08-04 ] |
|
A patch for this task that can be considered as a 'proof of concept' was sent for evaluation to Andrii. This patch introduced two-phase optimization for some queries. |
| Comment by Igor Babaev [ 2017-08-10 ] |
|
Alvin, |
| Comment by Igor Babaev [ 2017-11-21 ] |
|
Cost aware implementation of Design. 1. Every materialized derived is checked for being potentially splittable: If all the conditions above are satisfied then the materialized derived is considered as potentially splittable and a structure of the type SPLITTABLE_DERIVED_INFO is attached 2. The join specifying a potentially splittable derived is subject to two-phase optimization. The first phase ends when the cheapest join order is found. When the optimization of the main query asks for the optimization of the materialized derived tables only the first phase of the optimization is performed for each derived that is optimized in two phases. 3. Let v be a potentially derived used in the main query. 4. Let assume that v can be accessed via a split technique after tables t1, t2 in a partial order for the main query. Let assume that the split is performed using the equality conditions v.a=f(t1.a), v(b)=g(t2.b). When in the partial join order t1,t2,v the table v is accessed the values of f(t1.a) and g(t2.b) are fixed (considered as constants). The equalities vt.a=f(t1.a), vt.b=g(t2.b) must be added to the WHERE condition of v to get the partition of v that is determined by the conditions v.a=f(t1.a), v(b)=g(t2.b). Now the best plan is built for v with extended WHERE that actually can produce any partition p from v. At this point the cost of this plan C(p) must be evaluated . Let C(v) be the cost of producing rows of v. Let in(v) be the expected row count for the partial join t1,t2. If in(v)*C(p) < C(v) then accessing v with a splitting technique is more beneficial with this partial order. 5. Adding conditions to existing one is not so cheap. Besides it's clear that when looking for the best plan for the main query the optimizer has not only to add conditions that can be used, but also has to remove the ones that are not valid for the current partial join order. For example when the partial order t1,v is evaluated the condition vt(b)=g(t2.b) must be removed from consideration. 6. The array of extended KEYUSEs is built only once when the first usage of the split technique is evaluated. |
| Comment by Igor Babaev [ 2018-01-19 ] |
|
A full cost-base solution was pushed into the 10.3 tree |