Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
5.5(EOL), 10.0(EOL), 10.1(EOL)
-
WHM on CentOS 6.7 x64, XAMPP 3.2.2 on Windows 7 x64, WAMP 2.4 on Windows 7 x64
-
10.2.13, 10.3.6-1
Description
I am having a strange issue with a slow query under certain conditions:
I have made a few tests and was able to isolate the issue to one single table called `products_description` (all the tables are MyISAM).
At first I noticed that when this table is new (ie. just imported), the query always performs fast (~0.3s).
However, if I perform any of these operations on this specific table, at any time (even immediately after importing it):
`CHECK`, `OPTIMIZE`, `ANALYZE` or `REPAIR`, the query suddenly slows down x10 times (takes ~4.5s) and keeps staying always slow.
Note that I forced no-caching when running the query to ensure the results are correct.
I am able to restore the performance only if I am then performing any of these operations on that table:
1) `DROP` the table and import it again.
or
2) `ALTER` any of the following of that table: `Collation` or `CHECKSUM` or `DELAY_KEY_WRITE`. It then runs fast with the altered value and when reverting back to the old value, the performance remains fast.
Even then, if I perform any of the `CHECK`, `OPTIMIZE`, `ANALYZE` or `REPAIR` operations on that table, the query speed drops until I do either 1) or 2)
One more thing I tested:
Before performing any operation on the table, I backed up the table's files (`products_description.frm`, `products_description.MYD`, `products_description.MYI`), ran the query, it ran fast. Then I performed `CHECK` on the table, ran the query, speed was x10 times slower, I copied the backed up files and overwrote the 3 files, ran the query again, slow speed again.
I have compressed the database in a ~5mb zip file (~80mb unzipped) and uploaded it to the private folder on MariaDB ftp.
Here are the EXPLAIN EXTENDED results. Seems like the optimizer is working differently on both tables, but that doesn't really explain why that happens, as the copied database is supposedly identical.
http://i.stack.imgur.com/NQ9YQ.png
http://i.stack.imgur.com/pLyOt.png
These are the screenshots of the profiling of the query on the slow and fast database:
http://i.stack.imgur.com/JBlwP.png
http://i.stack.imgur.com/E1qf0.png
This the SQL query that I am running and you should test with is:
SELECT DISTINCT pav.products_options_values_id,
|
pav.products_options_values_name,
|
pav.products_options_values_sort_order
|
FROM products_stock ps,
|
products_options_values pav,
|
(SELECT DISTINCT pa.products_id
|
FROM products_attributes pa,
|
products_options_values pov,
|
(SELECT p.products_id,
|
p.products_image,
|
p.products_subimage1,
|
pd.products_name,
|
p.products_quantity,
|
p.products_model,
|
p.products_ordered,
|
p.products_price,
|
p.products_date_added,
|
p.products_weight,
|
p.products_length,
|
p.products_width,
|
p.products_height,
|
p.products_tax_class_id,
|
p.products_status,
|
IF(s.status, s.specials_new_products_price, NULL)
|
AS
|
specials_new_products_price,
|
IF(s.status, s.specials_new_products_price,
|
p.products_price) AS
|
final_price,
|
IF(p.clearance_price < p.products_cost * 2.25,
|
p.clearance_price,
|
p.products_cost * 2.25)
|
AS
|
sorting_price
|
FROM products p
|
LEFT JOIN specials s
|
ON p.products_id = s.products_id
|
LEFT JOIN products_description pd
|
ON p.products_id = pd.products_id
|
WHERE
|
/*FASTIDS*/
|
p.products_status = '1'
|
AND Date_sub('2016-04-19', INTERVAL 7000 day) <=
|
p.products_date_added
|
) m
|
WHERE m.products_id = pa.products_id
|
AND pa.options_id = 1
|
AND pa.options_values_id = pov.products_options_values_id
|
AND pov.language_id = '1') q
|
WHERE q.products_id = ps.products_id
|
AND ps.products_stock_attributes =
|
Concat('1-', pav.products_options_values_id)
|
AND ps.products_stock_quantity > 0
|
ORDER BY pav.products_options_values_sort_order ASC
|
Attachments
Activity
Hi, Daniel,
The issue is not with the query (although it might be able to optimize it as you suggested and/or another way, but that's a different topic entirely).
The issue is with the database server, as explained in my post above. I posted the query only so that you can reproduce the database issue I am seeing.
This is my client's database, thus I am not at liberty to upload this database to a public location, otherwise I would have uploaded it as attachment here.
I am going to send you a download link to your email address danielgb [at] au.ibm.com, please keep the file private. If anyone else wants the file, I will send it the same way to him/her.
Thank you
~Yo, the ftp uploads to the private directory isn't a public location. Its accessible to a few mariadb core developers only (i.e. not me).
Looking closer at the query I does look entire possible that the two subqueries can be removed and indeed the DISTINCT on the first subquery isn't needed. As its your client I suggest you help them with that to give them a fast solution now (probably faster than the fast version now).
I'm not convinced its a different topic as its a bit unfair for a server to always optimize a bad query correctly. It seems you have got luckly with some aspect of statistics sampling so far.
Can you include the "SHOW INDEX FROM
{table}" here for the fast/slow version in text form? This could explain the difference in query plan.
Thanks Daniel,
I have uploaded the file to the ftp (private folder).
As for the query, again, this is just a way to reproduce the issue. If you read my post very thoroughly and the steps to reproduce - you will see that there is no reason for the optimizer to work differently, it seems to be a server bug.
I encourage you to test my data and try to reproduce the issue the same way I did, you will see what I mean.
Thanks again
I am also able to reproduce the issue easily in windows 7 x64, on XAMPP 3.2.2 (which is using MariaDB 10.1.10) and WAMP 2.4 (which is using 5.6.12-log - MySQL Community Server (GPL)).
Note that I am using phpMyAdmin for all steps. To reproduce:
1) Import the database.
2) Run the above query (it should run fast) and time it.
3) Run OPTIMIZE (or any other command I listed above) command on products_description table.
4) Run the above query again, this time it should run much much slower.
5) Change the table products_description checksum to 1 and back to 0
6) Run the above query, it now should run as fast as it did in step 2.
Again, all tests were done through phpMyAdmin, I did not test using the command line.
Reproducible as described.
Instead of altering the table back and forth, ALTER TABLE products_description FORCE can be used.
As a workaround, SELECT STRAIGHT_JOIN ... seems to work.
The difference in plan between slow and fast execution:
--- slow_plan.sql 2016-04-22 16:17:43.224080338 +0300
|
+++ fast_plan.sql 2016-04-22 16:18:25.179596410 +0300
|
@@ -1,11 +1,11 @@
|
id select_type table type possible_keys key key_len ref rows filtered Extra
|
1 PRIMARY pav ALL NULL NULL NULL NULL 70 100.00 Using temporary; Using filesort
|
-1 PRIMARY <derived2> ALL NULL NULL NULL NULL 55242 100.00 Distinct; Using join buffer (flat, BNL join)
|
-1 PRIMARY ps eq_ref idx_products_stock_attributes idx_products_stock_attributes 771 q.products_id,func 1 100.00 Using index condition; Using where; Distinct
|
+1 PRIMARY ps ALL idx_products_stock_attributes NULL NULL NULL 55266 75.00 Using where; Distinct; Using join buffer (flat, BNL join)
|
+1 PRIMARY <derived2> ref key0 key0 4 test.ps.products_id 959 100.00 Distinct
|
2 DERIVED pa ALL idx_products_attributes_products_id NULL NULL NULL 55242 100.00 Using where; Using temporary
|
2 DERIVED pov eq_ref PRIMARY PRIMARY 8 test.pa.options_values_id,const 1 100.00 Using index; Distinct
|
2 DERIVED p eq_ref PRIMARY,idx_products_date_added,idx_products_status PRIMARY 4 test.pa.products_id 1 100.00 Using where; Distinct
|
2 DERIVED s ref idx_specials_products_id idx_specials_products_id 4 test.pa.products_id 1 100.00 Using index; Distinct
|
-2 DERIVED pd ref PRIMARY PRIMARY 4 test.pa.products_id 1 100.00 Using index; Distinct
|
+2 DERIVED pd ref PRIMARY PRIMARY 4 test.pa.products_id 960 100.00 Using index; Distinct
|
Level Code Message
|
-Note 1003 select distinct `test`.`pav`.`products_options_values_id` AS `products_options_values_id`,`test`.`pav`.`products_options_values_name` AS `products_options_values_name`,`test`.`pav`.`products_options_values_sort_order` AS `products_options_values_sort_order` from `test`.`products_stock` `ps` join `test`.`products_options_values` `pav` join (select distinct `test`.`pa`.`products_id` AS `products_id` from `test`.`products_attributes` `pa` join `test`.`products_options_values` `pov` join `test`.`products` `p` left join `test`.`specials` `s` on((`test`.`s`.`products_id` = `test`.`pa`.`products_id`)) left join `test`.`products_description` `pd` on((`test`.`pd`.`products_id` = `test`.`pa`.`products_id`)) where ((`test`.`p`.`products_id` = `test`.`pa`.`products_id`) and (`test`.`pa`.`options_id` = 1) and (`test`.`pov`.`products_options_values_id` = `test`.`pa`.`options_values_id`) and (`test`.`pov`.`language_id` = '1') and (`test`.`p`.`products_status` = '1') and (<cache>(('2016-04-19' - interval 7000 day)) <= `test`.`p`.`products_date_added`))) `q` where ((`test`.`ps`.`products_id` = `q`.`products_id`) and (`test`.`ps`.`products_stock_attributes` = concat('1-',`test`.`pav`.`products_options_values_id`)) and (`test`.`ps`.`products_stock_quantity` > 0)) order by `test`.`pav`.`products_options_values_sort_order`
|
+Note 1003 select distinct `test`.`pav`.`products_options_values_id` AS `products_options_values_id`,`test`.`pav`.`products_options_values_name` AS `products_options_values_name`,`test`.`pav`.`products_options_values_sort_order` AS `products_options_values_sort_order` from `test`.`products_stock` `ps` join `test`.`products_options_values` `pav` join (select distinct `test`.`pa`.`products_id` AS `products_id` from `test`.`products_attributes` `pa` join `test`.`products_options_values` `pov` join `test`.`products` `p` left join `test`.`specials` `s` on((`test`.`s`.`products_id` = `test`.`pa`.`products_id`)) left join `test`.`products_description` `pd` on((`test`.`pd`.`products_id` = `test`.`pa`.`products_id`)) where ((`test`.`p`.`products_id` = `test`.`pa`.`products_id`) and (`test`.`pa`.`options_id` = 1) and (`test`.`pov`.`products_options_values_id` = `test`.`pa`.`options_values_id`) and (`test`.`pov`.`language_id` = '1') and (`test`.`p`.`products_status` = '1') and (<cache>(('2016-04-19' - interval 7000 day)) <= `test`.`p`.`products_date_added`))) `q` where ((`q`.`products_id` = `test`.`ps`.`products_id`) and (`test`.`ps`.`products_stock_attributes` = concat('1-',`test`.`pav`.`products_options_values_id`)) and (`test`.`ps`.`products_stock_quantity` > 0)) order by `test`.`pav`.`products_options_values_sort_order`
|
Also reproducible on MySQL 5.7 (execution time and maybe plans are different there, but the idea is the same - fast before ANALYZE, slow after ANALYZE).
Thanks, Elena. Needing a bug fix tho, not temporary workarounds to modify queries, as this bug possibly affects entire application and entire applications worldwide...
Discussing with varun.
- Removed ORDER BY and DISTINCT clauses from the top-level SELECT. The slowdown can still be observed.
The slow query plan without order by and distinct in the top select
+------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+
|
| 1 | PRIMARY | pav | ALL | NULL | NULL | NULL | NULL | 70 | |
|
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 55242 | Using join buffer (flat, BNL join) |
|
| 1 | PRIMARY | ps | eq_ref | idx_products_stock_attributes | idx_products_stock_attributes | 771 | q.products_id,func | 1 | Using where; Using index |
|
| 2 | DERIVED | pa | ALL | idx_products_attributes_products_id | NULL | NULL | NULL | 55242 | Using where; Using temporary |
|
| 2 | DERIVED | pov | eq_ref | PRIMARY | PRIMARY | 8 | test.pa.options_values_id,const | 1 | Using index; Distinct |
|
| 2 | DERIVED | p | eq_ref | PRIMARY,idx_products_date_added,idx_products_status | PRIMARY | 4 | test.pa.products_id | 1 | Using where; Distinct |
|
| 2 | DERIVED | s | ref | idx_specials_products_id | idx_specials_products_id | 4 | test.pa.products_id | 1 | Using index; Distinct |
|
| 2 | DERIVED | pd | ref | PRIMARY | PRIMARY | 4 | test.pa.products_id | 1 | Using index; Distinct |
|
+------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+
|
For the fast query
+------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+
|
| 1 | PRIMARY | pav | ALL | NULL | NULL | NULL | NULL | 70 | |
|
| 1 | PRIMARY | ps | index | idx_products_stock_attributes | idx_products_stock_attributes | 771 | NULL | 55266 | Using where; Using index; Using join buffer (flat, BNL join) |
|
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | test.ps.products_id | 959 | |
|
| 2 | DERIVED | pa | ALL | idx_products_attributes_products_id | NULL | NULL | NULL | 55242 | Using where; Using temporary |
|
| 2 | DERIVED | pov | eq_ref | PRIMARY | PRIMARY | 8 | test.pa.options_values_id,const | 1 | Using index; Distinct |
|
| 2 | DERIVED | p | eq_ref | PRIMARY,idx_products_date_added,idx_products_status | PRIMARY | 4 | test.pa.products_id | 1 | Using where; Distinct |
|
| 2 | DERIVED | s | ref | idx_specials_products_id | idx_specials_products_id | 4 | test.pa.products_id | 1 | Using index; Distinct |
|
| 2 | DERIVED | pd | ref | PRIMARY | PRIMARY | 4 | test.pa.products_id | 960 | Using index; Distinct |
|
+------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+
|
Observations
- The derived table (id=2) uses the same query plan
- except for rows in the last table, pd. Slow plan: rows=1, Fast plan: rows=960.
- Top-level SELECT has different join orders:
- The slow one has pav, <derived2>, ps
- the fast one has pav, ps, <derived2>
Analyze format=json output for the slow query
https://gist.github.com/anonymous/ee09ae49ec9ca91105fa18b73d11ad7c
Analyze format=json output for the fast query
https://gist.github.com/anonymous/4c447813f44a0892339b569266dc9d44
Looking at the above
Query times
- fast query: 3.7 sec
- slow query: 19.8 sec
query plans:
- In both queries, materialized/r_total_time_ms is 800-900 msec.
- Comparing rows and r_rows one can see that the slow query has a better rows estimate for table pd ( 1 vs 960).
- There is a 2x difference in filtered vs r_filtered for table p (3rd in the subquery). I don't think this is the deciding factor.
The slow query plan spends most of its time (18 sec. out of 20) reading table ps:
"table": {
|
"table_name": "ps",
|
"access_type": "eq_ref",
|
"possible_keys": ["idx_products_stock_attributes"],
|
"key": "idx_products_stock_attributes",
|
"key_length": "771",
|
"used_key_parts": ["products_id", "products_stock_attributes"],
|
"ref": ["q.products_id", "func"],
|
"r_loops": 289730,
|
"rows": 1,
|
"r_rows": 0.0992,
|
"r_total_time_ms": 18315,
|
"r_loops": 289730,
|
This comes from
#r_rows(pav) * #r_rows(<derived2>) = 70 * 4139 =289,730
|
r_rows for pav matches the estimate, r_rows for <derived2> is actually lower than the estimate:
"block-nl-join": {
|
"table": {
|
"table_name": "<derived2>",
|
"access_type": "ALL",
|
"r_loops": 1,
|
"rows": 55242,
|
"r_rows": 4139,
|
So we are actually doing fewer lookups than the optimizer should have expected.
289730 lookups per 18 seconds is 16,096 lookups per sec. Not fast but note the key_length=771.
Looking at the fast query plan, one can see that the table ps doesn't have many rows:
"block-nl-join": {
|
"table": {
|
"table_name": "ps",
|
"access_type": "index",
|
"possible_keys": ["idx_products_stock_attributes"],
|
"key": "idx_products_stock_attributes",
|
"key_length": "771",
|
"used_key_parts": ["products_id", "products_stock_attributes"],
|
"r_loops": 1,
|
"rows": 55266,
|
"r_rows": 55266,
|
"r_total_time_ms": 64.807,
|
"filtered": 75.001,
|
"r_filtered": 100,
|
"using_index": true
|
Just 55K Rows in total.
Need to debug the join optimization process in the top-level query.
Observations for fast query
https://gist.github.com/anonymous/6e7d980971036b3bf1affd4dcbf9054e
Observations for slow query
https://gist.github.com/anonymous/1f29eb63bd1f70484c7acc5445628459
== Slow join order ==
|
1. pav
|
records=70
|
cost=2.47
|
|
current_record_count=70
|
current_read_time= 0 + 2.47 + 14= 16.47
|
|
2. pav, q
|
record_count=7
|
|
s->found_records=55,242
|
tmp= s->scan_time() = 55,242
|
// join buffer is filled 1 time
|
|
cost= 55,242
|
records=55,242 // NOTE: actual number is about 4K.
|
|
current_record_count= 70 * 55242= 3,866,940
|
current_read_time = 16.47 + 3.8M/5 = 828,646
|
|
3. pav, q, ps
|
record_count=3,866,940
|
//note: the actual number (from ANALYZE) is 290,430
|
// which is equal to 70*4149 (which means pav-q do a cross join)
|
|
records= 1.0 // eq_ref
|
read_time= prev_record_reads(...)= 3,866,940
|
// note: the table has 55,266 rows in total
|
|
current_record_count= 1*3.8M = 3,866,940
|
current_read_time= 828,646 + 3.8M/5 = 5,468,974
|
== Fast join order ==
|
1. pav
|
|
2. pav, ps
|
record_count=70
|
|
// table has 55,266 records
|
read_time= table->scan_time()=278.
|
// join buffer is filled 1 time
|
cost=3041 // due to (s->records - rnd_records) addition
|
records= 41,450
|
|
current_record_count= 70 * 41450= 2,901,500
|
current_read_time = 16.47 + 3041 + 2.9M/5 = 583,358
|
|
3. pav, ps, q
|
record_count=2,901,500
|
// table q has 55242 records
|
// note: the actual number (from ANALYZE) is 16,667 (*)
|
|
records=10
|
// ^ this is computed as ref access estimate for the case when no
|
//statistics is available. (the real number from ANALYZE is 0.9999)
|
|
tmp = record_count* file->read_time(1, 10 rows)= record_count * 1.5 =
|
2,901,500 * 1.5 = 4,352,250
|
|
current_record_count= 2,901,500 * 10 = 29,016,050
|
current_read_time= 583,358 + 4,352,250 + 29M/5 = 10,738,818
|
ANALYZE FORMAT=JSON for the fast query plan (forced it to be chosen in debugger): https://gist.github.com/spetrunia/b3f686ca8a8451255529fa76995287eb
The slow plan, miscalculation 1
heavily over-estimates #rows in the temp. table q (estimate is 55K, reality is 4K). This also causes the number of reads from table ps to be >10x higher (3.8M vs 290K in reality)
The fast plan, miscalculation 1
Heavily over-estimates cardinality of pav JOIN ps.
"table": {
|
"table_name": "ps",
|
"access_type": "ALL",
|
"possible_keys": ["idx_products_stock_attributes"],
|
"r_loops": 1,
|
"rows": 55266,
|
"r_rows": 55266,
|
"r_total_time_ms": 25.906,
|
"filtered": 75.001,
|
"r_filtered": 30.176,
|
"attached_condition": "(ps.products_stock_quantity > 0)"
|
2x wrong estimate on selectivity of a simple condition (Okay per-se but it adds up with others)
The fast plan, miscalculation 2
The second:
"block-nl-join": {
|
"table": {
|
"table_name": "ps",
|
...
|
},
|
"buffer_type": "flat",
|
"buffer_size": "256Kb",
|
"join_type": "BNL",
|
"attached_condition": "(ps.products_stock_attributes = convert(concat('1-',pav.products_options_values_id) using utf8))",
|
"r_filtered": 1.4286
|
},
|
Here r_filtered=1.4%.
Note that filtered is not even present as the optimizer doesn't try to account for the selectivity of this post-join condition.
The result is that for the derived table we have:
(from debug log, (should ANALYZE include this number?) ):
record_count=2,901,500
|
from ANALYZE:
"r_loops": 16677,
|
The fast plan, miscalculation 3
There's more:
"table": {
|
"table_name": "<derived2>",
|
"access_type": "ref",
|
"possible_keys": ["key0"],
|
"key": "key0",
|
"key_length": "4",
|
"used_key_parts": ["products_id"],
|
"ref": ["j2.ps.products_id"],
|
"r_loops": 16677,
|
"rows": 10,
|
"r_rows": 0.9999,
|
Note rows vs r_rows. Also 10x over-estimation. The estimate of 10 comes from this code in best_access_path:
if (!(records= keyinfo->actual_rec_per_key(key_parts-1)))
|
{ /* Prefer longer keys */
|
records=
|
((double) s->records / (double) rec *
|
(1.0 +
|
((double) (table->s->max_key_length-keyinfo->key_length) /
|
(double) table->s->max_key_length)));
|
if (records < 2.0)
|
records=2.0; /* Can't be as good as a unique */
|
}
|
"The fast plan, miscalculation 2" has the biggest impact. However it is rather hard to make the optimizer take this into account.
The condition is
ps.products_stock_attributes = Concat('1-', pav.products_options_values_id)
|
The table has the column as second part of a unique key:
Create Table: CREATE TABLE `products_stock` (
|
`products_stock_id` int(11) NOT NULL AUTO_INCREMENT,
|
`products_id` int(11) NOT NULL DEFAULT '0',
|
`products_stock_attributes` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
|
`products_stock_quantity` int(11) NOT NULL DEFAULT '0',
|
PRIMARY KEY (`products_stock_id`),
|
UNIQUE KEY `idx_products_stock_attributes` (`products_id`,`products_stock_attributes`)
|
products_stock has 55266 rows.
Most common values:
select *
|
from (
|
select count(*) as CNT
|
from products_stock group
|
by products_stock_attributes
|
) T
|
order by CNT desc limit 10;
|
+------+
|
| CNT |
|
+------+
|
| 6980 |
|
| 6529 |
|
| 6467 |
|
| 5095 |
|
| 4603 |
|
| 3870 |
|
...
|
So the most popular value has 6980 / 55266.=0.12, 12% of the table.
We could use the statistics to produce a selectivity estimate.
"The fast plan, miscalculation 3" could have been avoided as the derived table is defined as:
(SELECT DISTINCT pa.products_id
|
FROM products_attributes pa,
|
products_options_values pov,
|
...
|
) q
|
Since it's SELECT DISTINCT, we know rows=1 there.
... and if I change it to be rows=1 in the debugger, the fast query plan is chosen!
More details about how to use EITS statistics to produce selectivitiy estimates.
MariaDB [j2]> select *,hex(histogram) from mysql.column_stats where table_name='products_stock' and column_name='products_stock_attributes'\G;
|
*************************** 1. row ***************************
|
db_name: j2
|
table_name: products_stock
|
column_name: products_stock_attributes
|
min_value: 1-1
|
max_value: 1-8
|
nulls_ratio: 0.0000
|
avg_length: 3.4370
|
avg_frequency: 812.7353
|
hist_size: 255
|
hist_type: SINGLE_PREC_HB
|
histogram: $$$$$$$$$$$$&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHJmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm��������������������������������������������������������������
|
hex(histogram): 0000000000000000010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101242424242424242424242424262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626264848484848484848484848484848484848484848484848484848484848484848484A6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D919191919191919191919191919191919191919191919191B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6DADADADADADADADADADADADAFFFFFFFFFFFFFFFF
|
1 row in set (0.00 sec)
|
Find the longest run of buckets that have the same bound
Here, the longest run is 12% of total_buckets.
this means that selectivity of condition
products_stock_attributes = ...
|
is at most 0.12.
varun, let's make step#1: if a derived table has SELECT DISTINCT, provide index statistics for it so that the join optimizer in the upper select knows that ref access to the table will produce one row.
Patch for step #1
http://lists.askmonty.org/pipermail/commits/2018-March/012123.html
This patch fixes the slowness issue when we run the same query above after running the alter table on products_description.
- Also for step 1 we need to take care of UNION ALL for derived tables with SELECT DISTINCT, because these can give duplicate entires and for such cases we should not provide index statistics to the join optimizer that ref access to the derived table would produce only one row.
- INTERSECT and EXCEPT are no problems, because they won't give duplicate entries.
Patch 2 sent for review( addressed review comments for 1st and added handling for UNION/INTERSECT/EXCEPT.
http://lists.askmonty.org/pipermail/commits/2018-March/012127.html
Step #2:
If any field in the select list of the derived tables is present in the group by list also , then we are again guaranteed that ref access to the derived table would always produce one row per key.
Review feedback: https://lists.launchpad.net/maria-developers/msg11151.html
Patch for step 1: http://lists.askmonty.org/pipermail/commits/2018-March/012147.html
Patch for step 2: http://lists.askmonty.org/pipermail/commits/2018-March/012148.html
Also the query in this MDEV performs fast irrespective of using the ANALYZE command on the product_description table
It's been a year since this issue has been changed status to "in review". Is there any progress since then?
Thank you
Wait, the review sent yesterday was incomplete and the comment there totally missed the point.
I'm working on the updated version.
Review feedback for part#1 sent. The patch will need to be changed.
Another attempt for part #1
http://lists.askmonty.org/pipermail/commits/2019-April/013664.html
Another attempt for part #2
http://lists.askmonty.org/pipermail/commits/2019-April/013674.html
As discuss with psergey, we would just push the patch for part #1 , for part #2 we should wait https://jira.mariadb.org/browse/MDEV-11588 and then use that to give correct statistics
Can you upload your zip file to the ftp server - https://mariadb.com/kb/en/meta/ftp/
Also try writing moving your subqueries into JOIN expressions.