[MDEV-350] self-tuning cost coefficients Created: 2012-06-16 Updated: 2018-11-11 |
|
| Status: | Stalled |
| Project: | MariaDB Server |
| Component/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Minor |
| Reporter: | Sergei Golubchik | Assignee: | Sergei Golubchik |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | gsoc13, gsoc14, optimizer | ||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
One of the reasons of bad query plans is inadequate cost estimation of individual operations. A cost of reading a row in one engine might be a lot higher than in some other, but optimizer cannot know it. Also, it uses hard-coded constants, assuming, for example, that evaluating a WHERE clause is 5 times cheaper than reading a row from a table (it used to 10 earlier, now it's 5 Obviously, some kind of calibration procedure is needed to get these cost estimates to be relatively correct. It is not easy, because the estimates depend on the actual hardware where MariaDB is run (a cost of a row read is different on HD and SSD), and also - somewhat - on the application (optimizer model isn't perfect, and cost factors vary for different types of queries, thus it's better to calibrate on the queries that the user application will run). A simple and low-maintenance solution would be to use self-tuning cost coefficients. They measure the timing and adjust automatically to the configuration where MariaDB is run. Assorted thoughts:
|
| Comments |
| Comment by roberto spadim [ 2013-06-16 ] |
|
hi sergei, there's a file that constants are declared? i could start changing the constants to variables without problem |
| Comment by Sergei Golubchik [ 2013-06-16 ] |
|
Part of this task is, exactly, finding and identifying these constants. Some of them are implicit and not easy to find. Another big part if building the framework for collecting and storing time measurements and solving equations against these constants. |
| Comment by roberto spadim [ 2013-06-17 ] |
|
ok, what about files that i should start finding? sql_select.cc, sql_select.h, opt_range.cc (i found a constant inside NOT IN that rewrite to IN(), it's a kind of constant that i should look or not?), update and delete should be find too? |
| Comment by Sergei Golubchik [ 2013-06-17 ] |
|
I'm afraid, it's something that not even I will be able to do, one of Sometimes there is no constant at all, it is implicitly assumed to be 1. For example, optimizer compares table->file->read_time() values for Those places are very difficult to find. We did have a GSoC-2013 project for this task, but it was assumed that |
| Comment by roberto spadim [ 2013-06-19 ] |
|
ok, should i start a list of constants that i find in the optimizer? or i will waste my time? |
| Comment by Anshu Avinash [ 2014-05-26 ] |
|
I'll be pushing the code on my github repo: https://github.com/igniting/server/tree/selfTuningOptimizer . @serg: I guess you can assign this issue to me. |
| Comment by Sergei Golubchik [ 2014-06-15 ] |
|
So, to summarize: there are two approaches to obtaining timing data:
In the first approach we only count number of index lookups (N i), rows read in a table scan (N s), etc. And time the query as a whole (T q). This gives an equation:
Collecting the data from many queries, we have a system of linear equations (may be even over-defined). It can be solved to give durations on individual operations. In the second approach we directly time individual operations, much like PERFORMANCE_SCHEMA does. Preferably, reusing P_S measurements, not duplicating them.
But
|
| Comment by Sergei Golubchik [ 2014-06-17 ] |
|
more comments:
second
third - combined method
Also, the one index read might have vastly different cost even for the same engine, same table, and the same index. For example, depending on whether it's cached or not. This is the job of the engine to recognize this situation and adjust the read_time() accordingly. The problem is - over time we'll see different timings for the same number of index reads, so the corresponding factor won't converge. For it to converge we need to take engine's read_time() value into account. That is, we'll have not
but
(where Nrows is how many rows were actually read from the index, while Nrows_estimated is how many rows optimizer expected to read, the number of rows that read_time() got as an argument) |
| Comment by Eric Bergen [ 2014-08-05 ] |
|
I can see this going horribly wrong both when system load changes and due to hardware anomalies. For example when a hard drive hits a bad sector it may do an i/o which takes up to 8 seconds by default which may throw off all of your measurement for that query. If that query ends up updating the global statistics then all future queries may be dramatically punished all because of a single isolated problem. cpu overload can also impact timing issues. For example if you have a burst of threads running then certain parts of a query may take longer as they fight for locks or cpu time. It would be nice to run a sample workload to gather these statistics then be able to freeze them on an instance. This would take care of the cases where the optimizer makes assumptions about disk seek time that were true on hard drives but aren't true on flash hardware but without having to worry about hosts all the sudden wildly changing their query plans due to hardware blips. |
| Comment by Sergei Golubchik [ 2014-08-05 ] |
|
A specific value (say, "read_time") will be measured many times and the on-disk table will store the sum of all measurements and the number of them. The optimizer will use the average (sum/count). This means:
|
| Comment by roberto spadim [ 2014-08-05 ] |
|
Eric, yes that's a problem, but... that's a nice solution to many problems |
| Comment by roberto spadim [ 2014-09-22 ] |
|
Hi guys |
| Comment by VAROQUI Stephane [ 2015-02-26 ] |
|
I would also give +1 for a global statistic approach more that gathering fixe constants like in a benchmark But i would add that a per table set of constant would be better because in any workload some specifics tables may cause more often non determinist cache overflow
Non determinism's is also driven by the workload itself, like :
looks like such metrics are mostly per table dependent , and can help Query Plan to better estimate join cost at each depth |
| Comment by james wang [ 2017-06-23 ] |
|
Hi All, Went to MariaDB road Show yesterday at London, I suggested: 1). look at top slow queries in performance_schema table Hope this makes sense |
| Comment by roberto spadim [ 2017-06-23 ] |
|
1). look at top slow queries in performance_schema table 3). try different index (especially compound index) 4). if new index works better, use it for the same query in future. |