For certain data distributions, histogram code can produce selectivity>1.0 for a column=const equality on a DOUBLE_PREC_HB histogram (I think one can achieve the same on SINGLE_PREC_HB, the point is that it's not JSON_HB histogram).
Sergei Petrunia
added a comment - - edited The problem is in Histogram::point_selectivity(), this part of code:
else
{
/*
The value 'pos' fits within one single histogram bucket.
...
*/
HERE!
In particular, we arrive at the value that's greater than 1.0 here:
/*
So:
- each bucket has the same #rows
- values are unformly distributed across the [min_value,max_value] domain.
If a bucket has value range that's N times bigger then average, than
each value will have to have N times fewer rows than average.
*/
sel= avg_sel * avg_bucket_width / current_bucket_width;
How many rows are in the bucket that have value X?
point_selectivity() has avg_sel parameter, but that's a global per-table average.
Suppose the histogram for this particular table has two buckets:
bucketY1 = {left_endp1, left_endp1 + 0.5}
bucketY2 = {left_endp2, left_endp2 + 0.0001}
both buckets hold the same number of rows.
But do they hold the same number of different values?
If we assume that the values are distributed uniformly across the 0.0 (min value) to 1.0 (max value) range, we might guess that bucketY1 has more different values in it than bucketY2.
This is what this code is trying to do. It assumes that buckets that span bigger value range have lower avg_frequency (rows have different values, plenty of values to choose from). Contrary, a bucket that span small value range has fewer different values, and so rows in the bucket will tend to have the same value.
Sergei Petrunia
added a comment - Trying to understand the logic behind the
sel= avg_sel * avg_bucket_width / current_bucket_width;
So, we are looking to get #rows estimate for a value X that fits within a single bucket. That is,
bucket_Y.left_endpoint < position(X) < bucket_Y.right_endpoint
How many rows are in the bucket that have value X?
point_selectivity() has avg_sel parameter, but that's a global per-table average.
Suppose the histogram for this particular table has two buckets:
bucketY1 = {left_endp1, left_endp1 + 0.5}
bucketY2 = {left_endp2, left_endp2 + 0.0001}
both buckets hold the same number of rows.
But do they hold the same number of different values?
If we assume that the values are distributed uniformly across the 0.0 (min value) to 1.0 (max value) range, we might guess that bucketY1 has more different values in it than bucketY2.
This is what this code is trying to do. It assumes that buckets that span bigger value range have lower avg_frequency (rows have different values, plenty of values to choose from). Contrary, a bucket that span small value range has fewer different values, and so rows in the bucket will tend to have the same value.
In the testcase for this MDEV, the bucket has a very narrow value range: its "right_endpoint - left_endpoint" is 500x smaller than what the average bucket's right_endpoint - left_endpoint has.
The code multiplies the estimate by this factor and ends up with a non-sensical estimate.
Sergei Petrunia
added a comment - In the testcase for this MDEV, the bucket has a very narrow value range: its "right_endpoint - left_endpoint" is 500x smaller than what the average bucket's right_endpoint - left_endpoint has.
The code multiplies the estimate by this factor and ends up with a non-sensical estimate.
Igor would prefer to see the "brave heuristic" removed completely. I'm hesitant as that will cause more changes in the estimates (and so changes in the query plans).
But let me implement that in a patch and see if that one would have an agreement.
Sergei Petrunia
added a comment - - edited Takeaways from the optimizer call:
Igor would prefer to see the "brave heuristic" removed completely. I'm hesitant as that will cause more changes in the estimates (and so changes in the query plans).
But let me implement that in a patch and see if that one would have an agreement.
Sergei Petrunia
added a comment - The alternative patch implementing suggestions from the above call: https://github.com/MariaDB/server/commits/bb-10.6-mdev31067-variant2 . igor please review.
psergei Sergei,
1. This is a bug of 10.0 code. Thus the fix should be applied starting from 10.4.
2. Let's call a value frequently used if its position in histogram coincides with the beginning of two adjacent buckets. Whether the probed value is frequently used can be easily checked. The number of buckets with the same starting point coinciding with the position of the value provides us with a good low bound for the selectivity of such value. For not frequently used values we should calculate the average frequency for them only and it will be less than the average frequency for all values.
Here's an example:
100 rows with 7 values and 5 buckets with 20 rows in each bucket, one value is encountered in 40 rows and this is the only frequently used value, all remaining values are encountered in 10 rows. The average frequency is 100/7 ~ 14.3, so the average selectivity is 14.3% that is greater than 10% expected for non-frequently used values.
Igor Babaev (Inactive)
added a comment - psergei Sergei,
1. This is a bug of 10.0 code. Thus the fix should be applied starting from 10.4.
2. Let's call a value frequently used if its position in histogram coincides with the beginning of two adjacent buckets. Whether the probed value is frequently used can be easily checked. The number of buckets with the same starting point coinciding with the position of the value provides us with a good low bound for the selectivity of such value. For not frequently used values we should calculate the average frequency for them only and it will be less than the average frequency for all values.
Here's an example:
100 rows with 7 values and 5 buckets with 20 rows in each bucket, one value is encountered in 40 rows and this is the only frequently used value, all remaining values are encountered in 10 rows. The average frequency is 100/7 ~ 14.3, so the average selectivity is 14.3% that is greater than 10% expected for non-frequently used values.
In my opinion the function Histogram::point_selectivity() the has patch touched still requires some improvements.
Igor Babaev (Inactive)
added a comment - In my opinion the function Histogram::point_selectivity() the has patch touched still requires some improvements.
Yet I really mind the code that calculates selectivity for popular values: it looks too complicated and your comment does not help to understand it.
How is this part of this MDEV? Nor the testcase for this MDEV, neither any of the fixes touch that part of the code.
Sergei Petrunia
added a comment - igor ,
Yet I really mind the code that calculates selectivity for popular values: it looks too complicated and your comment does not help to understand it.
How is this part of this MDEV? Nor the testcase for this MDEV, neither any of the fixes touch that part of the code.
It is a conservative upper-bound estimate, and is delivers lower estimates than a complex magic of popular values in the variant 3. Thus I see no reason to use the complex approach if the simple and safe approach produces better estimates.
Sergei Golubchik
added a comment - ok to push variant 2. commit https://github.com/MariaDB/server/commit/a87c17ab566def817da2f79fb7badb7318a1e311
It is a conservative upper-bound estimate, and is delivers lower estimates than a complex magic of popular values in the variant 3. Thus I see no reason to use the complex approach if the simple and safe approach produces better estimates.
People
Sergei Petrunia
Sergei Petrunia
Votes:
1Vote for this issue
Watchers:
7Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":1487,"ttfb":875,"pageVisibility":"visible","entityId":121131,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"5dce40aa-617e-4ff8-a823-2b460dbe0f46","navigationType":0,"readyForUser":1594,"redirectCount":0,"resourceLoadedEnd":1641.6000000238419,"resourceLoadedStart":881.8000000715256,"resourceTiming":[{"duration":34.699999928474426,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2bu7/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":881.8000000715256,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":881.8000000715256,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":916.5,"responseStart":0,"secureConnectionStart":0},{"duration":34.800000071525574,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2bu7/820016/12ta74/8679b4946efa1a0bb029a3a22206fb5d/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":882.1000000238419,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":882.1000000238419,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":916.9000000953674,"responseStart":0,"secureConnectionStart":0},{"duration":95,"initiatorType":"script","name":"https://jira.mariadb.org/s/fbf975c0cce4b1abf04784eeae9ba1f4-CDN/lu2bu7/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":882.2000000476837,"connectEnd":882.2000000476837,"connectStart":882.2000000476837,"domainLookupEnd":882.2000000476837,"domainLookupStart":882.2000000476837,"fetchStart":882.2000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":882.2000000476837,"responseEnd":977.2000000476837,"responseStart":977.2000000476837,"secureConnectionStart":882.2000000476837},{"duration":176.19999992847443,"initiatorType":"script","name":"https://jira.mariadb.org/s/099b33461394b8015fc36c0a4b96e19f-CDN/lu2bu7/820016/12ta74/8679b4946efa1a0bb029a3a22206fb5d/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":882.4000000953674,"connectEnd":882.4000000953674,"connectStart":882.4000000953674,"domainLookupEnd":882.4000000953674,"domainLookupStart":882.4000000953674,"fetchStart":882.4000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":882.4000000953674,"responseEnd":1058.6000000238419,"responseStart":1058.6000000238419,"secureConnectionStart":882.4000000953674},{"duration":179.20000004768372,"initiatorType":"script","name":"https://jira.mariadb.org/s/94c15bff32baef80f4096a08aceae8bc-CDN/lu2bu7/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":882.7000000476837,"connectEnd":882.7000000476837,"connectStart":882.7000000476837,"domainLookupEnd":882.7000000476837,"domainLookupStart":882.7000000476837,"fetchStart":882.7000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":882.7000000476837,"responseEnd":1061.9000000953674,"responseStart":1061.9000000953674,"secureConnectionStart":882.7000000476837},{"duration":179.60000002384186,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":882.8000000715256,"connectEnd":882.8000000715256,"connectStart":882.8000000715256,"domainLookupEnd":882.8000000715256,"domainLookupStart":882.8000000715256,"fetchStart":882.8000000715256,"redirectEnd":0,"redirectStart":0,"requestStart":882.8000000715256,"responseEnd":1062.4000000953674,"responseStart":1062.4000000953674,"secureConnectionStart":882.8000000715256},{"duration":179.80000007152557,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":883,"connectEnd":883,"connectStart":883,"domainLookupEnd":883,"domainLookupStart":883,"fetchStart":883,"redirectEnd":0,"redirectStart":0,"requestStart":883,"responseEnd":1062.8000000715256,"responseStart":1062.8000000715256,"secureConnectionStart":883},{"duration":237.29999995231628,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2bu7/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":883.2000000476837,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":883.2000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1120.5,"responseStart":0,"secureConnectionStart":0},{"duration":179.69999992847443,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":883.4000000953674,"connectEnd":883.4000000953674,"connectStart":883.4000000953674,"domainLookupEnd":883.4000000953674,"domainLookupStart":883.4000000953674,"fetchStart":883.4000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":883.4000000953674,"responseEnd":1063.1000000238419,"responseStart":1063.1000000238419,"secureConnectionStart":883.4000000953674},{"duration":237.10000002384186,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2bu7/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":883.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":883.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1120.6000000238419,"responseStart":0,"secureConnectionStart":0},{"duration":180.10000002384186,"initiatorType":"script","name":"https://jira.mariadb.org/s/3339d87fa2538a859872f2df449bf8d0-CDN/lu2bu7/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":883.6000000238419,"connectEnd":883.6000000238419,"connectStart":883.6000000238419,"domainLookupEnd":883.6000000238419,"domainLookupStart":883.6000000238419,"fetchStart":883.6000000238419,"redirectEnd":0,"redirectStart":0,"requestStart":883.6000000238419,"responseEnd":1063.7000000476837,"responseStart":1063.7000000476837,"secureConnectionStart":883.6000000238419},{"duration":393,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":884.5,"connectEnd":884.5,"connectStart":884.5,"domainLookupEnd":884.5,"domainLookupStart":884.5,"fetchStart":884.5,"redirectEnd":0,"redirectStart":0,"requestStart":884.5,"responseEnd":1277.5,"responseStart":1277.5,"secureConnectionStart":884.5},{"duration":627.1000000238419,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":884.6000000238419,"connectEnd":884.6000000238419,"connectStart":884.6000000238419,"domainLookupEnd":884.6000000238419,"domainLookupStart":884.6000000238419,"fetchStart":884.6000000238419,"redirectEnd":0,"redirectStart":0,"requestStart":884.6000000238419,"responseEnd":1511.7000000476837,"responseStart":1511.7000000476837,"secureConnectionStart":884.6000000238419},{"duration":146.39999997615814,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":1131.8000000715256,"connectEnd":1131.8000000715256,"connectStart":1131.8000000715256,"domainLookupEnd":1131.8000000715256,"domainLookupStart":1131.8000000715256,"fetchStart":1131.8000000715256,"redirectEnd":0,"redirectStart":0,"requestStart":1131.8000000715256,"responseEnd":1278.2000000476837,"responseStart":1278.2000000476837,"secureConnectionStart":1131.8000000715256},{"duration":249.89999997615814,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/e65b778d185daf5aee24936755b43da6/_/download/contextbatch/js/browser-metrics-plugin.contrib,-_super,-atl.general/batch.js?agile_global_admin_condition=true&jag=true&slack-enabled=true","startTime":1391.7000000476837,"connectEnd":1391.7000000476837,"connectStart":1391.7000000476837,"domainLookupEnd":1391.7000000476837,"domainLookupStart":1391.7000000476837,"fetchStart":1391.7000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":1391.7000000476837,"responseEnd":1641.6000000238419,"responseStart":1641.6000000238419,"secureConnectionStart":1391.7000000476837},{"duration":254,"initiatorType":"script","name":"https://www.google-analytics.com/analytics.js","startTime":1480.9000000953674,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":1480.9000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1734.9000000953674,"responseStart":0,"secureConnectionStart":0}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":98,"responseStart":875,"responseEnd":878,"domLoading":878,"domInteractive":1681,"domContentLoadedEventStart":1681,"domContentLoadedEventEnd":1734,"domComplete":2263,"loadEventStart":2263,"loadEventEnd":2264,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1652.2000000476837},{"name":"bigPipe.sidebar-id.end","time":1653},{"name":"bigPipe.activity-panel-pipe-id.start","time":1653.1000000238419},{"name":"bigPipe.activity-panel-pipe-id.end","time":1656.6000000238419},{"name":"activityTabFullyLoaded","time":1767.5}],"measures":[],"correlationId":"67b234c5fdbaf2","effectiveType":"4g","downlink":9.6,"rtt":0,"serverDuration":686,"dbReadsTimeInMs":14,"dbConnsTimeInMs":23,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
The problem is in Histogram::point_selectivity(), this part of code:
{
The value 'pos' fits within one single histogram bucket.
...
*/
HERE!
In particular, we arrive at the value that's greater than 1.0 here:
So:
- each bucket has the same #rows
- values are unformly distributed across the [min_value,max_value] domain.
If a bucket has value range that's N times bigger then average, than
each value will have to have N times fewer rows than average.
*/
sel= avg_sel * avg_bucket_width / current_bucket_width;