[MDEV-7383] [PATCH] engine-independent-stats column_stats has limited values for max/min values Created: 2014-12-27 Updated: 2015-11-18 Resolved: 2015-11-18 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Affects Version/s: | 10.0.15 |
| Fix Version/s: | 10.0.23 |
| Type: | Bug | Priority: | Major |
| Reporter: | Daniel Black | Assignee: | Sergei Petrunia |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | eits, optimizer | ||
| Environment: |
Debian Wheezy 10.0.15 |
||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Sprint: | 10.0.20 | ||||||||||||||||||||
| Description |
|
Engine-independent stats can't parse some values into min/max_value in column stats. The result is a NON NULL empty string. I suspect query plans based on this information won't work too well.
|
| Comments |
| Comment by Daniel Black [ 2014-12-27 ] | |||||||||||||||||||||||||||
|
Is varbinary(255) a sufficient solution? | |||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2014-12-27 ] | |||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-01-15 ] | |||||||||||||||||||||||||||
|
assuming the move to varbinary(255) is an acceptable solution here is the patch and test case. | |||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-01-15 ] | |||||||||||||||||||||||||||
|
patch 2 of the statistics.cc to handle as my_charset_bin rather than my_charset_utf8_bin. | |||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2015-02-20 ] | |||||||||||||||||||||||||||
|
Trying to figure out what column_stats.min/max_value should store
| |||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-08-31 ] | |||||||||||||||||||||||||||
Makes sense to me. Which would mean changing to a binary type for storage (for compatibility with direct manipulation of the stats tables) (patch as attached) and changing the store/retrieve methods in sql_statistics.cc | |||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-09-01 ] | |||||||||||||||||||||||||||
|
From one of our clients that does a lot of multilingual work:
5.87% of columns couldn't set the value of min or max in the column stats. note: updated to exclude empty tables and columsn where all was null which warped results significantly | |||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-09-01 ] | |||||||||||||||||||||||||||
|
breakdown:
So it looks to need text truncation to the right length at the same time. updated to exclude empty tables and those colums with all nulls: | |||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2015-11-09 ] | |||||||||||||||||||||||||||
|
danblack, thanks for the patch. Pushed into 10.0 tree. | |||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-11-09 ] | |||||||||||||||||||||||||||
|
Thanks psergey Sorry, there's one more case that I didn't patch here. I should of included a full text case from my last comment. blob and text column end up with NULL in max/min too (without warnings).
| |||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-11-10 ] | |||||||||||||||||||||||||||
|
Field_blob::update_min and Field_blob::update_max appear to be intentionally not implemented (comment /* Never update the value of min_val for a blob field */ in field.h) and hence the result is NULL ( and not empty as per the original encoding problem). | |||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-11-10 ] | |||||||||||||||||||||||||||
|
I'm part way filling out these functions and limiting comparison/storage to 255 bytes. Please tell me if its a bad idea. | |||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2015-11-11 ] | |||||||||||||||||||||||||||
|
danblack, two reactions: 1. I am wondering whether this is at all important to have good statistics for blob fields. My impression is that the only kind of condition on blob fields that occurs in real life is blob_field LIKE '%foo' (Does this match your observations?) , for which statistics is not helpful. It is bad that the server writes an invalid column stats record, and the simplest solution is to not collect or store any statistics for blob columns. This would make ANALYZE TABLE faster, too. 2. Capping comparison/storage looks like a correct thing to do, although #1 is a simpler solution. | |||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-11-12 ] | |||||||||||||||||||||||||||
|
psergey, agree with 1 (and arjen does too), anything searching/joining/ordering on a blob column or part there of is probably doing something wrong and implementing stats for it isn't going to add much value to the case where people have proper queries. Should just document somewhere that there is no column_stats on blob columns. I'll create a new task for ANALYZE TABLE to skip blob column retrieval for the purpose of engine independent stats. So happy to close this. For option 2, I did get part way as per attached patch (fails for varying length of blob columns) but if there is someone who can put a decent case for it in. | |||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2015-11-12 ] | |||||||||||||||||||||||||||
|
Please don't forget to watch buildbot. The push broke 10.0 tests. | |||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-11-13 ] | |||||||||||||||||||||||||||