[MDEV-350] self-tuning cost coefficients - Jira

Details

Type: Task
Status: Stalled (View Workflow)
Priority: Minor
Resolution: Unresolved
Fix Version/s: None
Component/s: None
Labels:
- gsoc13
- gsoc14
- optimizer

Description

One of the reasons of bad query plans is inadequate cost estimation of individual operations. A cost of reading a row in one engine might be a lot higher than in some other, but optimizer cannot know it. Also, it uses hard-coded constants, assuming, for example, that evaluating a WHERE clause is 5 times cheaper than reading a row from a table (it used to 10 earlier, now it's 5 ).

Obviously, some kind of calibration procedure is needed to get these cost estimates to be relatively correct. It is not easy, because the estimates depend on the actual hardware where MariaDB is run (a cost of a row read is different on HD and SSD), and also - somewhat - on the application (optimizer model isn't perfect, and cost factors vary for different types of queries, thus it's better to calibrate on the queries that the user application will run).

A simple and low-maintenance solution would be to use self-tuning cost coefficients. They measure the timing and adjust automatically to the configuration where MariaDB is run.

Assorted thoughts:

create a tuning coefficient for every low-level cost generator in the optimizer. That is for every function or a handler method that generates a cost value. For every hard-coded constant or a macro too. But not for functions that create cost values from other cost values. For a virtual method - for every implementation of it, that is, one coefficient for every read_time in every handler class. But not for different parameters of it. For example, different tables in some engine might have different costs for reading a row, but we will still have one coefficient for read_time in this engine, not one for each table. The engine is suppose to provide different costs internally, if it needs to. The goal of these coefficients is to normalize the cost between different engines.

measure the time that the query took, split proportionally between tables (according to the number of rows), save the statistics per coefficient.

collect the statistics locally in the THD, add to the global statistics on disconnect. it helps to avoid contention on a shared resource

optimizer will use the global statistics, not thread local. It shouldn't matter, as coefficients will change very slowly.

in splitting the time use the actual number of rows, not the estimated one, that the optimizer used.

per coefficient store - counter (bigint), sum of times (double), sum of times squared (double).

store the results persistently in mysql database

make them available via the I_S table. In this table show two more columns - the average and the standard deviation.

report these data via the feedback plugin. we can adjust built-in constants or initial values of these coefficients. and very large deviation is a sign that an engine estimates the cost incorrectly (e.g. doesn't take the table name into account, see above)

a user can update the table manually, if she wishes so. she can even freeze the coefficients by setting the count column to the very large value.

system load anomalies may introduce undesired changes in the coefficients. is it a problem? should we implement some countermeasures?

Attachments

Issue Links

is blocked by

MDEV-223 Encapsulate calculations of the table access cost and make them consistent and tunable

Closed

MDEV-9076 Make optimizer const constants configurable

Closed

relates to

MDEV-17436 Add new meta system configuration variables

Open

links to

Adaptive query optimization for PostgreSQL

Activity

Ascending order - Click to sort in descending order

roberto spadim added a comment - 2013-06-16 22:02

hi sergei, there's a file that constants are declared? i could start changing the constants to variables without problem

roberto spadim added a comment - 2013-06-16 22:02 hi sergei, there's a file that constants are declared? i could start changing the constants to variables without problem

Sergei Golubchik added a comment - 2013-06-16 22:47

Part of this task is, exactly, finding and identifying these constants. Some of them are implicit and not easy to find.

Another big part if building the framework for collecting and storing time measurements and solving equations against these constants.

Sergei Golubchik added a comment - 2013-06-16 22:47 Part of this task is, exactly, finding and identifying these constants. Some of them are implicit and not easy to find. Another big part if building the framework for collecting and storing time measurements and solving equations against these constants.

roberto spadim added a comment - 2013-06-17 06:02

ok, what about files that i should start finding? sql_select.cc, sql_select.h, opt_range.cc (i found a constant inside NOT IN that rewrite to IN(), it's a kind of constant that i should look or not?), update and delete should be find too?
they are CAPS_LOCK_ON or they could be CAPS_LOCK_ON_and_off?

roberto spadim added a comment - 2013-06-17 06:02 ok, what about files that i should start finding? sql_select.cc, sql_select.h, opt_range.cc (i found a constant inside NOT IN that rewrite to IN(), it's a kind of constant that i should look or not?), update and delete should be find too? they are CAPS_LOCK_ON or they could be CAPS_LOCK_ON_and_off?

Sergei Golubchik added a comment - 2013-06-17 13:47

I'm afraid, it's something that not even I will be able to do, one of
our optimizer gurus should.

Sometimes there is no constant at all, it is implicitly assumed to be 1.

For example, optimizer compares table->file->read_time() values for
different tables. Here it assumes that results of ::read_time() for
different storage engines are directly comparable. They're not, they
should be multiplied by a storage engine specific factor.

Those places are very difficult to find.

We did have a GSoC-2013 project for this task, but it was assumed that
we will provide a list of constants to tune, and the student will only
work on time measurements and aggregation.

Sergei Golubchik added a comment - 2013-06-17 13:47 I'm afraid, it's something that not even I will be able to do, one of our optimizer gurus should. Sometimes there is no constant at all, it is implicitly assumed to be 1. For example, optimizer compares table->file->read_time() values for different tables. Here it assumes that results of ::read_time() for different storage engines are directly comparable. They're not, they should be multiplied by a storage engine specific factor. Those places are very difficult to find. We did have a GSoC-2013 project for this task, but it was assumed that we will provide a list of constants to tune, and the student will only work on time measurements and aggregation.

roberto spadim added a comment - 2013-06-19 23:33

ok, should i start a list of constants that i find in the optimizer? or i will waste my time?

roberto spadim added a comment - 2013-06-19 23:33 ok, should i start a list of constants that i find in the optimizer? or i will waste my time?

Anshu Avinash added a comment - 2014-05-26 13:53

I'll be pushing the code on my github repo: https://github.com/igniting/server/tree/selfTuningOptimizer . @serg: I guess you can assign this issue to me.

Anshu Avinash added a comment - 2014-05-26 13:53 I'll be pushing the code on my github repo: https://github.com/igniting/server/tree/selfTuningOptimizer . @serg: I guess you can assign this issue to me.

Sergei Golubchik added a comment - 2014-06-15 16:59 - edited

So, to summarize: there are two approaches to obtaining timing data:

counting operations, timing the query as a whole
actually timing individual operations, P_S-style

In the first approach we only count number of index lookups (N _i), rows read in a table scan (N _s), etc. And time the query as a whole (T _q). This gives an equation:

T _i * N _i + T _s * N _s + ... = T _q

Collecting the data from many queries, we have a system of linear equations (may be even over-defined). It can be solved to give durations on individual operations.

In the second approach we directly time individual operations, much like PERFORMANCE_SCHEMA does. Preferably, reusing P_S measurements, not duplicating them.
These approaches both have their benefits:

The first one is potentially much cheaper, it does not add timing overhead to every operation (and with constants like TIME_FOR_COMPARE, the individual operation is a comparison, timing it might be prohibitively expensive).
Also, it uses the complete query execution time — the only time users care about. If we see that values of the coefficients don't converge over time, it means that there's an unaccounted-for part in the query time, and we add another coefficient.
Making the second approach to use P_S when P_S is disabled is not exactly trivial.

But

The second needs less memory: for N coefficients, it'll need — per THD — 4N doubles at most (value, sum, sum squared, counter). The first one needs to store(in addition to the above) N equations, that's N² values. Although for large N this might be a sparse matrix, I don't know.
The second one measures only what we use in the optimizer, directly. No hidden parameters, and most coefficients should converge nicely. Still, it's meaningless from the user point of view.

Sergei Golubchik added a comment - 2014-06-15 16:59 - edited So, to summarize: there are two approaches to obtaining timing data: counting operations, timing the query as a whole actually timing individual operations, P_S-style In the first approach we only count number of index lookups ( N  i ), rows read in a table scan ( N  s ), etc. And time the query as a whole ( T  q ). This gives an equation: T i * N i + T s * N s + ... = T q Collecting the data from many queries, we have a system of linear equations (may be even over-defined). It can be solved to give durations on individual operations. In the second approach we directly time individual operations, much like PERFORMANCE_SCHEMA does. Preferably, reusing P_S measurements, not duplicating them. These approaches both have their benefits: The first one is potentially much cheaper, it does not add timing overhead to every operation (and with constants like TIME_FOR_COMPARE, the individual operation is a comparison, timing it might be prohibitively expensive). Also, it uses the complete query execution time — the only time users care about. If we see that values of the coefficients don't converge over time, it means that there's an unaccounted-for part in the query time, and we add another coefficient. Making the second approach to use P_S when P_S is disabled is not exactly trivial. But The second needs less memory: for N coefficients, it'll need — per THD — 4N doubles at most (value, sum, sum squared, counter). The first one needs to store(in addition to the above) N equations, that's N² values. Although for large N this might be a sparse matrix, I don't know. The second one measures only what we use in the optimizer, directly. No hidden parameters, and most coefficients should converge nicely. Still, it's meaningless from the user point of view.

Sergei Golubchik added a comment - 2014-06-17 23:22 - edited

more comments:
first approach

time spent on locks, in the optimizer, etc
- measure only query execution time, after the optimizer, this will also cut off table locks
- but row locks happen during execution, they will distort the statistics
- they will show up significantly different and can be detected statistically and ignored

second

P_S is disabled in MariaDB by default
the hypothesys: most of the P_S overhead comes from housekeeping, storing the data, particularly in the shared data structures, calculating aggregations.
that is, simply timing the waits is cheap
so even with the disabled P_S we can enable the timing instrumentation

third - combined method

measure all waits directly, using P_S instrumentation
TIME_FOR_COMPARE and other internal stuff - indirectly
this gives exact data for longer operations
but doesn't introduce timing overhead for short operations
and reduces the number of coefficients that we put into a system of equations, thus reducing memory footprints

Also, the one index read might have vastly different cost even for the same engine, same table, and the same index. For example, depending on whether it's cached or not. This is the job of the engine to recognize this situation and adjust the read_time() accordingly. The problem is - over time we'll see different timings for the same number of index reads, so the corresponding factor won't converge. For it to converge we need to take engine's read_time() value into account. That is, we'll have not

total_index_read_time = factor * Nrows

but

total_index_read_time = factor * Nrows * read_time() / Nrows_estimated

(where Nrows is how many rows were actually read from the index, while Nrows_estimated is how many rows optimizer expected to read, the number of rows that read_time() got as an argument)

Sergei Golubchik added a comment - 2014-06-17 23:22 - edited more comments: first approach time spent on locks, in the optimizer, etc measure only query execution time, after the optimizer, this will also cut off table locks but row locks happen during execution, they will distort the statistics they will show up significantly different and can be detected statistically and ignored second P_S is disabled in MariaDB by default the hypothesys: most of the P_S overhead comes from housekeeping, storing the data, particularly in the shared data structures, calculating aggregations. that is, simply timing the waits is cheap so even with the disabled P_S we can enable the timing instrumentation third - combined method measure all waits directly, using P_S instrumentation TIME_FOR_COMPARE and other internal stuff - indirectly this gives exact data for longer operations but doesn't introduce timing overhead for short operations and reduces the number of coefficients that we put into a system of equations, thus reducing memory footprints Also, the one index read might have vastly different cost even for the same engine, same table, and the same index. For example, depending on whether it's cached or not. This is the job of the engine to recognize this situation and adjust the read_time() accordingly. The problem is - over time we'll see different timings for the same number of index reads, so the corresponding factor won't converge. For it to converge we need to take engine's read_time() value into account. That is, we'll have not total_index_read_time = factor * Nrows but total_index_read_time = factor * Nrows * read_time() / Nrows_estimated (where Nrows is how many rows were actually read from the index, while Nrows_estimated is how many rows optimizer expected to read, the number of rows that read_time() got as an argument)

Eric Bergen added a comment - 2014-08-05 00:35

I can see this going horribly wrong both when system load changes and due to hardware anomalies. For example when a hard drive hits a bad sector it may do an i/o which takes up to 8 seconds by default which may throw off all of your measurement for that query. If that query ends up updating the global statistics then all future queries may be dramatically punished all because of a single isolated problem.

cpu overload can also impact timing issues. For example if you have a burst of threads running then certain parts of a query may take longer as they fight for locks or cpu time.

It would be nice to run a sample workload to gather these statistics then be able to freeze them on an instance. This would take care of the cases where the optimizer makes assumptions about disk seek time that were true on hard drives but aren't true on flash hardware but without having to worry about hosts all the sudden wildly changing their query plans due to hardware blips.

Eric Bergen added a comment - 2014-08-05 00:35 I can see this going horribly wrong both when system load changes and due to hardware anomalies. For example when a hard drive hits a bad sector it may do an i/o which takes up to 8 seconds by default which may throw off all of your measurement for that query. If that query ends up updating the global statistics then all future queries may be dramatically punished all because of a single isolated problem. cpu overload can also impact timing issues. For example if you have a burst of threads running then certain parts of a query may take longer as they fight for locks or cpu time. It would be nice to run a sample workload to gather these statistics then be able to freeze them on an instance. This would take care of the cases where the optimizer makes assumptions about disk seek time that were true on hard drives but aren't true on flash hardware but without having to worry about hosts all the sudden wildly changing their query plans due to hardware blips.

Sergei Golubchik added a comment - 2014-08-05 10:28

A specific value (say, "read_time") will be measured many times and the on-disk table will store the sum of all measurements and the number of them. The optimizer will use the average (sum/count). This means:

hopefully, short spikes won't disrupt the average too much
one can effectively "freeze" the average by updating the table and putting a huge "count" value.

Sergei Golubchik added a comment - 2014-08-05 10:28 A specific value (say, "read_time") will be measured many times and the on-disk table will store the sum of all measurements and the number of them. The optimizer will use the average (sum/count). This means: hopefully, short spikes won't disrupt the average too much one can effectively "freeze" the average by updating the table and putting a huge "count" value.

roberto spadim added a comment - 2014-08-05 19:13

Eric, yes that's a problem, but... that's a nice solution to many problems
The statistic is the main part here, if the standard mean (or any other usable variable) become too distance from "current" measures we could alert DBA about disk problems or too much queries/second, think about a usefull feature to alert DBA and developers/engeneer about a possible problem =)
about increasing robustness of the self-tunning cost coefficients algorithm we can develop with more time and use cases, for the first version, a useless (no changes to cost coefficients/optimizer) but measurable feature is ok, with time we get the experience to write a good (robust/inteligent) control =]

roberto spadim added a comment - 2014-08-05 19:13 Eric, yes that's a problem, but... that's a nice solution to many problems The statistic is the main part here, if the standard mean (or any other usable variable) become too distance from "current" measures we could alert DBA about disk problems or too much queries/second, think about a usefull feature to alert DBA and developers/engeneer about a possible problem =) about increasing robustness of the self-tunning cost coefficients algorithm we can develop with more time and use cases, for the first version, a useless (no changes to cost coefficients/optimizer) but measurable feature is ok, with time we get the experience to write a good (robust/inteligent) control =]

roberto spadim added a comment - 2014-09-22 21:33

Hi guys any news here?

roberto spadim added a comment - 2014-09-22 21:33 Hi guys any news here?

VAROQUI Stephane added a comment - 2015-02-26 16:10

I would also give +1 for a global statistic approach more that gathering fixe constants like in a benchmark

But i would add that a per table set of constant would be better because in any workload some specifics tables may cause more often non determinist cache overflow

Data Cache
Index Cache
FS Cache
Controller Cache
CPU Caches

Non determinism's is also driven by the workload itself, like :

writing reading from last time range ,
number of secondary indexes in a write,
storage engine
randomness data access pattern
lock time
column and row size fetching

looks like such metrics are mostly per table dependent , and can help Query Plan to better estimate join cost at each depth

VAROQUI Stephane added a comment - 2015-02-26 16:10 I would also give +1 for a global statistic approach more that gathering fixe constants like in a benchmark But i would add that a per table set of constant would be better because in any workload some specifics tables may cause more often non determinist cache overflow Data Cache Index Cache FS Cache Controller Cache CPU Caches Non determinism's is also driven by the workload itself, like : writing reading from last time range , number of secondary indexes in a write, storage engine randomness data access pattern lock time column and row size fetching looks like such metrics are mostly per table dependent , and can help Query Plan to better estimate join cost at each depth

james wang added a comment - 2017-06-23 11:15

Hi All,

Went to MariaDB road Show yesterday at London, I suggested:

1). look at top slow queries in performance_schema table
2). record the query execution plan
3). try different index (especially compound index)
4). if new index works better, use it for the same query in future.

Hope this makes sense

james wang added a comment - 2017-06-23 11:15 Hi All, Went to MariaDB road Show yesterday at London, I suggested: 1). look at top slow queries in performance_schema table 2). record the query execution plan 3). try different index (especially compound index) 4). if new index works better, use it for the same query in future. Hope this makes sense

roberto spadim added a comment - 2017-06-23 11:24

1). look at top slow queries in performance_schema table
not only slow queries, but stats about these queries, they should be grouped by hash, like percona do "select * from table where column=? and column1>=? and column2>=?" constants are changed to "?" and operations (graph and/or/>/</=/etc...) are ordered

3). try different index (especially compound index)
gridsearching index is ok to do, consume resources but find many solutions, it's import to record not only time to execute, but resources and metadata information to create a good metric function, for example: one index with (int,int) is "better" than an index with (double,double,string,double) with the same time or not? a performace metric should be created to evaluate the best search path

4). if new index works better, use it for the same query in future.
at some point the problem is how to "same query" (the same of 1) at parser/optimizer and don't penalize global performace

roberto spadim added a comment - 2017-06-23 11:24 1). look at top slow queries in performance_schema table not only slow queries, but stats about these queries, they should be grouped by hash, like percona do "select * from table where column=? and column1>=? and column2>=?" constants are changed to "?" and operations (graph and/or/>/</=/etc...) are ordered 3). try different index (especially compound index) gridsearching index is ok to do, consume resources but find many solutions, it's import to record not only time to execute, but resources and metadata information to create a good metric function, for example: one index with (int,int) is "better" than an index with (double,double,string,double) with the same time or not? a performace metric should be created to evaluate the best search path 4). if new index works better, use it for the same query in future. at some point the problem is how to "same query" (the same of 1) at parser/optimizer and don't penalize global performace

People

Assignee:: Sergei Golubchik

Reporter:: Sergei Golubchik

Votes:: 2 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 2012-06-16 16:40

Updated:: 2018-11-11 12:14

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server