One of the reasons of bad query plans is inadequate cost estimation of individual operations. A cost of reading a row in one engine might be a lot higher than in some other, but optimizer cannot know it. Also, it uses hard-coded constants, assuming, for example, that evaluating a WHERE clause is 5 times cheaper than reading a row from a table (it used to 10 earlier, now it's 5 ).
Obviously, some kind of calibration procedure is needed to get these cost estimates to be relatively correct. It is not easy, because the estimates depend on the actual hardware where MariaDB is run (a cost of a row read is different on HD and SSD), and also - somewhat - on the application (optimizer model isn't perfect, and cost factors vary for different types of queries, thus it's better to calibrate on the queries that the user application will run).
A simple and low-maintenance solution would be to use self-tuning cost coefficients. They measure the timing and adjust automatically to the configuration where MariaDB is run.
Assorted thoughts:
create a tuning coefficient for every low-level cost generator in the optimizer. That is for every function or a handler method that generates a cost value. For every hard-coded constant or a macro too. But not for functions that create cost values from other cost values. For a virtual method - for every implementation of it, that is, one coefficient for every read_time in every handler class. But not for different parameters of it. For example, different tables in some engine might have different costs for reading a row, but we will still have one coefficient for read_time in this engine, not one for each table. The engine is suppose to provide different costs internally, if it needs to. The goal of these coefficients is to normalize the cost between different engines.
measure the time that the query took, split proportionally between tables (according to the number of rows), save the statistics per coefficient.
collect the statistics locally in the THD, add to the global statistics on disconnect. it helps to avoid contention on a shared resource
optimizer will use the global statistics, not thread local. It shouldn't matter, as coefficients will change very slowly.
in splitting the time use the actual number of rows, not the estimated one, that the optimizer used.
per coefficient store - counter (bigint), sum of times (double), sum of times squared (double).
store the results persistently in mysql database
make them available via the I_S table. In this table show two more columns - the average and the standard deviation.
report these data via the feedback plugin. we can adjust built-in constants or initial values of these coefficients. and very large deviation is a sign that an engine estimates the cost incorrectly (e.g. doesn't take the table name into account, see above)
a user can update the table manually, if she wishes so. she can even freeze the coefficients by setting the count column to the very large value.
system load anomalies may introduce undesired changes in the coefficients. is it a problem? should we implement some countermeasures?
Attachments
Issue Links
is blocked by
MDEV-223Encapsulate calculations of the table access cost and make them consistent and tunable
Eric, yes that's a problem, but... that's a nice solution to many problems
The statistic is the main part here, if the standard mean (or any other usable variable) become too distance from "current" measures we could alert DBA about disk problems or too much queries/second, think about a usefull feature to alert DBA and developers/engeneer about a possible problem =)
about increasing robustness of the self-tunning cost coefficients algorithm we can develop with more time and use cases, for the first version, a useless (no changes to cost coefficients/optimizer) but measurable feature is ok, with time we get the experience to write a good (robust/inteligent) control =]
roberto spadim
added a comment - Eric, yes that's a problem, but... that's a nice solution to many problems
The statistic is the main part here, if the standard mean (or any other usable variable) become too distance from "current" measures we could alert DBA about disk problems or too much queries/second, think about a usefull feature to alert DBA and developers/engeneer about a possible problem =)
about increasing robustness of the self-tunning cost coefficients algorithm we can develop with more time and use cases, for the first version, a useless (no changes to cost coefficients/optimizer) but measurable feature is ok, with time we get the experience to write a good (robust/inteligent) control =]
I would also give +1 for a global statistic approach more that gathering fixe constants like in a benchmark
But i would add that a per table set of constant would be better because in any workload some specifics tables may cause more often non determinist cache overflow
Data Cache
Index Cache
FS Cache
Controller Cache
CPU Caches
Non determinism's is also driven by the workload itself, like :
writing reading from last time range ,
number of secondary indexes in a write,
storage engine
randomness data access pattern
lock time
column and row size fetching
looks like such metrics are mostly per table dependent , and can help Query Plan to better estimate join cost at each depth
VAROQUI Stephane
added a comment - I would also give +1 for a global statistic approach more that gathering fixe constants like in a benchmark
But i would add that a per table set of constant would be better because in any workload some specifics tables may cause more often non determinist cache overflow
Data Cache
Index Cache
FS Cache
Controller Cache
CPU Caches
Non determinism's is also driven by the workload itself, like :
writing reading from last time range ,
number of secondary indexes in a write,
storage engine
randomness data access pattern
lock time
column and row size fetching
looks like such metrics are mostly per table dependent , and can help Query Plan to better estimate join cost at each depth
Went to MariaDB road Show yesterday at London, I suggested:
1). look at top slow queries in performance_schema table
2). record the query execution plan
3). try different index (especially compound index)
4). if new index works better, use it for the same query in future.
Hope this makes sense
james wang
added a comment - Hi All,
Went to MariaDB road Show yesterday at London, I suggested:
1). look at top slow queries in performance_schema table
2). record the query execution plan
3). try different index (especially compound index)
4). if new index works better, use it for the same query in future.
Hope this makes sense
1). look at top slow queries in performance_schema table
not only slow queries, but stats about these queries, they should be grouped by hash, like percona do "select * from table where column=? and column1>=? and column2>=?" constants are changed to "?" and operations (graph and/or/>/</=/etc...) are ordered
3). try different index (especially compound index)
gridsearching index is ok to do, consume resources but find many solutions, it's import to record not only time to execute, but resources and metadata information to create a good metric function, for example: one index with (int,int) is "better" than an index with (double,double,string,double) with the same time or not? a performace metric should be created to evaluate the best search path
4). if new index works better, use it for the same query in future.
at some point the problem is how to "same query" (the same of 1) at parser/optimizer and don't penalize global performace
roberto spadim
added a comment - 1). look at top slow queries in performance_schema table
not only slow queries, but stats about these queries, they should be grouped by hash, like percona do "select * from table where column=? and column1>=? and column2>=?" constants are changed to "?" and operations (graph and/or/>/</=/etc...) are ordered
3). try different index (especially compound index)
gridsearching index is ok to do, consume resources but find many solutions, it's import to record not only time to execute, but resources and metadata information to create a good metric function, for example: one index with (int,int) is "better" than an index with (double,double,string,double) with the same time or not? a performace metric should be created to evaluate the best search path
4). if new index works better, use it for the same query in future.
at some point the problem is how to "same query" (the same of 1) at parser/optimizer and don't penalize global performace
People
Sergei Golubchik
Sergei Golubchik
Votes:
2Vote for this issue
Watchers:
12Start watching this issue
Dates
Created:
Updated:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":1150.0999999046326,"ttfb":461.80000019073486,"pageVisibility":"visible","entityId":12125,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"3abbbde9-d50f-4d9f-8012-b3bc192d8fc7","navigationType":0,"readyForUser":1241.0999999046326,"redirectCount":0,"resourceLoadedEnd":1765.4000000953674,"resourceLoadedStart":469.69999980926514,"resourceTiming":[{"duration":77.30000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":469.69999980926514,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":469.69999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":547,"responseStart":0,"secureConnectionStart":0},{"duration":77.40000009536743,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":470,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":470,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":547.4000000953674,"responseStart":0,"secureConnectionStart":0},{"duration":155.40000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":470.19999980926514,"connectEnd":470.19999980926514,"connectStart":470.19999980926514,"domainLookupEnd":470.19999980926514,"domainLookupStart":470.19999980926514,"fetchStart":470.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":470.19999980926514,"responseEnd":625.5999999046326,"responseStart":625.5999999046326,"secureConnectionStart":470.19999980926514},{"duration":256.90000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":470.40000009536743,"connectEnd":470.40000009536743,"connectStart":470.40000009536743,"domainLookupEnd":470.40000009536743,"domainLookupStart":470.40000009536743,"fetchStart":470.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":470.40000009536743,"responseEnd":727.3000001907349,"responseStart":727.3000001907349,"secureConnectionStart":470.40000009536743},{"duration":262.09999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":470.59999990463257,"connectEnd":470.59999990463257,"connectStart":470.59999990463257,"domainLookupEnd":470.59999990463257,"domainLookupStart":470.59999990463257,"fetchStart":470.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":470.59999990463257,"responseEnd":732.6999998092651,"responseStart":732.6999998092651,"secureConnectionStart":470.59999990463257},{"duration":262.7999997138977,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":470.80000019073486,"connectEnd":470.80000019073486,"connectStart":470.80000019073486,"domainLookupEnd":470.80000019073486,"domainLookupStart":470.80000019073486,"fetchStart":470.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":470.80000019073486,"responseEnd":733.5999999046326,"responseStart":733.5999999046326,"secureConnectionStart":470.80000019073486},{"duration":263.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":471,"connectEnd":471,"connectStart":471,"domainLookupEnd":471,"domainLookupStart":471,"fetchStart":471,"redirectEnd":0,"redirectStart":0,"requestStart":471,"responseEnd":734.5,"responseStart":734.5,"secureConnectionStart":471},{"duration":320.69999980926514,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":471.30000019073486,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":471.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":792,"responseStart":0,"secureConnectionStart":0},{"duration":264.2999997138977,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":471.40000009536743,"connectEnd":471.40000009536743,"connectStart":471.40000009536743,"domainLookupEnd":471.40000009536743,"domainLookupStart":471.40000009536743,"fetchStart":471.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":471.40000009536743,"responseEnd":735.6999998092651,"responseStart":735.6999998092651,"secureConnectionStart":471.40000009536743},{"duration":320.59999990463257,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":471.59999990463257,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":471.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":792.1999998092651,"responseStart":0,"secureConnectionStart":0},{"duration":266.40000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":471.69999980926514,"connectEnd":471.69999980926514,"connectStart":471.69999980926514,"domainLookupEnd":471.69999980926514,"domainLookupStart":471.69999980926514,"fetchStart":471.69999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":471.69999980926514,"responseEnd":738.0999999046326,"responseStart":738.0999999046326,"secureConnectionStart":471.69999980926514},{"duration":473.09999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":472.90000009536743,"connectEnd":472.90000009536743,"connectStart":472.90000009536743,"domainLookupEnd":472.90000009536743,"domainLookupStart":472.90000009536743,"fetchStart":472.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":472.90000009536743,"responseEnd":946,"responseStart":946,"secureConnectionStart":472.90000009536743},{"duration":1253.7999997138977,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":483.30000019073486,"connectEnd":483.30000019073486,"connectStart":483.30000019073486,"domainLookupEnd":483.30000019073486,"domainLookupStart":483.30000019073486,"fetchStart":483.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":483.30000019073486,"responseEnd":1737.0999999046326,"responseStart":1737.0999999046326,"secureConnectionStart":483.30000019073486},{"duration":141.2999997138977,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":805.4000000953674,"connectEnd":805.4000000953674,"connectStart":805.4000000953674,"domainLookupEnd":805.4000000953674,"domainLookupStart":805.4000000953674,"fetchStart":805.4000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":805.4000000953674,"responseEnd":946.6999998092651,"responseStart":946.6999998092651,"secureConnectionStart":805.4000000953674},{"duration":675.7000002861023,"initiatorType":"link","name":"https://jira.mariadb.org/s/d5715adaadd168a9002b108b2b039b50-CDN/lu2cib/820016/12ta74/be4b45e9cec53099498fa61c8b7acba4/_/download/contextbatch/css/jira.project.sidebar,-_super,-project.issue.navigator,-jira.general,-jira.browse.project,-jira.view.issue,-jira.global,-atl.general,-com.atlassian.jira.projects.sidebar.init/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":1089.6999998092651,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":1089.6999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1765.4000000953674,"responseStart":0,"secureConnectionStart":0},{"duration":665.6000003814697,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/e65b778d185daf5aee24936755b43da6/_/download/contextbatch/js/browser-metrics-plugin.contrib,-_super,-project.issue.navigator,-jira.view.issue,-atl.general/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":1090.6999998092651,"connectEnd":1090.6999998092651,"connectStart":1090.6999998092651,"domainLookupEnd":1090.6999998092651,"domainLookupStart":1090.6999998092651,"fetchStart":1090.6999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":1090.6999998092651,"responseEnd":1756.3000001907349,"responseStart":1756.1999998092651,"secureConnectionStart":1090.6999998092651},{"duration":669.1000003814697,"initiatorType":"script","name":"https://jira.mariadb.org/s/097ae97cb8fbec7d6ea4bbb1f26955b9-CDN/lu2cib/820016/12ta74/be4b45e9cec53099498fa61c8b7acba4/_/download/contextbatch/js/jira.project.sidebar,-_super,-project.issue.navigator,-jira.general,-jira.browse.project,-jira.view.issue,-jira.global,-atl.general,-com.atlassian.jira.projects.sidebar.init/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":1091.1999998092651,"connectEnd":1091.1999998092651,"connectStart":1091.1999998092651,"domainLookupEnd":1091.1999998092651,"domainLookupStart":1091.1999998092651,"fetchStart":1091.1999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":1091.1999998092651,"responseEnd":1760.3000001907349,"responseStart":1760.3000001907349,"secureConnectionStart":1091.1999998092651},{"duration":691.1000003814697,"initiatorType":"script","name":"https://www.google-analytics.com/analytics.js","startTime":1143.1999998092651,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":1143.1999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1834.3000001907349,"responseStart":0,"secureConnectionStart":0}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":305,"responseStart":462,"responseEnd":467,"domLoading":467,"domInteractive":1777,"domContentLoadedEventStart":1777,"domContentLoadedEventEnd":1834,"domComplete":2638,"loadEventStart":2638,"loadEventEnd":2639,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1739},{"name":"bigPipe.sidebar-id.end","time":1739.8000001907349},{"name":"bigPipe.activity-panel-pipe-id.start","time":1740.0999999046326},{"name":"bigPipe.activity-panel-pipe-id.end","time":1741.9000000953674},{"name":"activityTabFullyLoaded","time":1850.9000000953674}],"measures":[],"correlationId":"b0298bda7e3e76","effectiveType":"4g","downlink":10,"rtt":0,"serverDuration":101,"dbReadsTimeInMs":11,"dbConnsTimeInMs":20,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
Eric, yes that's a problem, but... that's a nice solution to many problems
The statistic is the main part here, if the standard mean (or any other usable variable) become too distance from "current" measures we could alert DBA about disk problems or too much queries/second, think about a usefull feature to alert DBA and developers/engeneer about a possible problem =)
about increasing robustness of the self-tunning cost coefficients algorithm we can develop with more time and use cases, for the first version, a useless (no changes to cost coefficients/optimizer) but measurable feature is ok, with time we get the experience to write a good (robust/inteligent) control =]