[MDEV-9959] A serious MariaDB server performance bug - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 5.5(EOL), 10.0(EOL), 10.1(EOL)
Fix Version/s: 10.3.15, 10.4.5
Component/s: Optimizer, Storage Engine - MyISAM
Labels:
- optimizer
- performance
Environment:
WHM on CentOS 6.7 x64, XAMPP 3.2.2 on Windows 7 x64, WAMP 2.4 on Windows 7 x64

Sprint:
10.2.13, 10.3.6-1

Description

I am having a strange issue with a slow query under certain conditions:
I have made a few tests and was able to isolate the issue to one single table called `products_description` (all the tables are MyISAM).
At first I noticed that when this table is new (ie. just imported), the query always performs fast (~0.3s).
However, if I perform any of these operations on this specific table, at any time (even immediately after importing it):
`CHECK`, `OPTIMIZE`, `ANALYZE` or `REPAIR`, the query suddenly slows down x10 times (takes ~4.5s) and keeps staying always slow.

Note that I forced no-caching when running the query to ensure the results are correct.

I am able to restore the performance only if I am then performing any of these operations on that table:

1) `DROP` the table and import it again.

2) `ALTER` any of the following of that table: `Collation` or `CHECKSUM` or `DELAY_KEY_WRITE`. It then runs fast with the altered value and when reverting back to the old value, the performance remains fast.

Even then, if I perform any of the `CHECK`, `OPTIMIZE`, `ANALYZE` or `REPAIR` operations on that table, the query speed drops until I do either 1) or 2)

One more thing I tested:
Before performing any operation on the table, I backed up the table's files (`products_description.frm`, `products_description.MYD`, `products_description.MYI`), ran the query, it ran fast. Then I performed `CHECK` on the table, ran the query, speed was x10 times slower, I copied the backed up files and overwrote the 3 files, ran the query again, slow speed again.

I have compressed the database in a ~5mb zip file (~80mb unzipped) and uploaded it to the private folder on MariaDB ftp.

Here are the EXPLAIN EXTENDED results. Seems like the optimizer is working differently on both tables, but that doesn't really explain why that happens, as the copied database is supposedly identical.

http://i.stack.imgur.com/NQ9YQ.png
http://i.stack.imgur.com/pLyOt.png

These are the screenshots of the profiling of the query on the slow and fast database:

http://i.stack.imgur.com/JBlwP.png
http://i.stack.imgur.com/E1qf0.png

This the SQL query that I am running and you should test with is:

    SELECT DISTINCT pav.products_options_values_id,

                pav.products_options_values_name,

                pav.products_options_values_sort_order

    FROM   products_stock ps,

           products_options_values pav,

           (SELECT DISTINCT pa.products_id

            FROM   products_attributes pa,

                   products_options_values pov,

                   (SELECT p.products_id,

                           p.products_image,

                           p.products_subimage1,

                           pd.products_name,

                           p.products_quantity,

                           p.products_model,

                           p.products_ordered,

                           p.products_price,

                           p.products_date_added,

                           p.products_weight,

                           p.products_length,

                           p.products_width,

                           p.products_height,

                           p.products_tax_class_id,

                           p.products_status,

                           IF(s.status, s.specials_new_products_price, NULL)

AS

                           specials_new_products_price,

                           IF(s.status, s.specials_new_products_price,

                           p.products_price) AS

                           final_price,

                           IF(p.clearance_price < p.products_cost * 2.25,

                           p.clearance_price,

                           p.products_cost * 2.25)

AS

                           sorting_price

                    FROM   products p

                           LEFT JOIN specials s

                                  ON p.products_id = s.products_id

                           LEFT JOIN products_description pd

                                  ON p.products_id = pd.products_id

                    WHERE

                     /*FASTIDS*/

                     p.products_status = '1'

                     AND Date_sub('2016-04-19', INTERVAL 7000 day) <=

                         p.products_date_added

) m

            WHERE  m.products_id = pa.products_id

                   AND pa.options_id = 1

                   AND pa.options_values_id = pov.products_options_values_id

                   AND pov.language_id = '1') q

    WHERE  q.products_id = ps.products_id

           AND ps.products_stock_attributes =

               Concat('1-', pav.products_options_values_id)

           AND ps.products_stock_quantity > 0

    ORDER  BY pav.products_options_values_sort_order ASC

Attachments

Activity

Ascending order - Click to sort in descending order

Daniel Black added a comment - 2016-04-21 04:54

Can you upload your zip file to the ftp server - https://mariadb.com/kb/en/meta/ftp/

Also try writing moving your subqueries into JOIN expressions.

Daniel Black added a comment - 2016-04-21 04:54 Can you upload your zip file to the ftp server - https://mariadb.com/kb/en/meta/ftp/ Also try writing moving your subqueries into JOIN expressions.

Ami added a comment - 2016-04-21 05:14

Hi, Daniel,
The issue is not with the query (although it might be able to optimize it as you suggested and/or another way, but that's a different topic entirely).
The issue is with the database server, as explained in my post above. I posted the query only so that you can reproduce the database issue I am seeing.
This is my client's database, thus I am not at liberty to upload this database to a public location, otherwise I would have uploaded it as attachment here.
I am going to send you a download link to your email address danielgb [at] au.ibm.com, please keep the file private. If anyone else wants the file, I will send it the same way to him/her.

Thank you

Ami added a comment - 2016-04-21 05:14 Hi, Daniel, The issue is not with the query (although it might be able to optimize it as you suggested and/or another way, but that's a different topic entirely). The issue is with the database server, as explained in my post above. I posted the query only so that you can reproduce the database issue I am seeing. This is my client's database, thus I am not at liberty to upload this database to a public location, otherwise I would have uploaded it as attachment here. I am going to send you a download link to your email address danielgb [at] au.ibm.com, please keep the file private. If anyone else wants the file, I will send it the same way to him/her. Thank you

Ami added a comment - 2016-04-21 05:22

email sent to you, Daniel. Thanks again.

Ami added a comment - 2016-04-21 05:22 email sent to you, Daniel. Thanks again.

Daniel Black added a comment - 2016-04-21 05:46

~Yo, the ftp uploads to the private directory isn't a public location. Its accessible to a few mariadb core developers only (i.e. not me).

Looking closer at the query I does look entire possible that the two subqueries can be removed and indeed the DISTINCT on the first subquery isn't needed. As its your client I suggest you help them with that to give them a fast solution now (probably faster than the fast version now).

I'm not convinced its a different topic as its a bit unfair for a server to always optimize a bad query correctly. It seems you have got luckly with some aspect of statistics sampling so far.

Can you include the "SHOW INDEX FROM

{table}

" here for the fast/slow version in text form? This could explain the difference in query plan.

Daniel Black added a comment - 2016-04-21 05:46 ~Yo, the ftp uploads to the private directory isn't a public location. Its accessible to a few mariadb core developers only (i.e. not me). Looking closer at the query I does look entire possible that the two subqueries can be removed and indeed the DISTINCT on the first subquery isn't needed. As its your client I suggest you help them with that to give them a fast solution now (probably faster than the fast version now). I'm not convinced its a different topic as its a bit unfair for a server to always optimize a bad query correctly. It seems you have got luckly with some aspect of statistics sampling so far. Can you include the "SHOW INDEX FROM {table} " here for the fast/slow version in text form? This could explain the difference in query plan.

Ami added a comment - 2016-04-21 05:56 - edited

Thanks Daniel,
I have uploaded the file to the ftp (private folder).
As for the query, again, this is just a way to reproduce the issue. If you read my post very thoroughly and the steps to reproduce - you will see that there is no reason for the optimizer to work differently, it seems to be a server bug.
I encourage you to test my data and try to reproduce the issue the same way I did, you will see what I mean.

Thanks again

Ami added a comment - 2016-04-21 05:56 - edited Thanks Daniel, I have uploaded the file to the ftp (private folder). As for the query, again, this is just a way to reproduce the issue. If you read my post very thoroughly and the steps to reproduce - you will see that there is no reason for the optimizer to work differently, it seems to be a server bug. I encourage you to test my data and try to reproduce the issue the same way I did, you will see what I mean. Thanks again

Ami added a comment - 2016-04-22 11:56

I am also able to reproduce the issue easily in windows 7 x64, on XAMPP 3.2.2 (which is using MariaDB 10.1.10) and WAMP 2.4 (which is using 5.6.12-log - MySQL Community Server (GPL)).
Note that I am using phpMyAdmin for all steps. To reproduce:
1) Import the database.
2) Run the above query (it should run fast) and time it.
3) Run OPTIMIZE (or any other command I listed above) command on products_description table.
4) Run the above query again, this time it should run much much slower.
5) Change the table products_description checksum to 1 and back to 0
6) Run the above query, it now should run as fast as it did in step 2.

Again, all tests were done through phpMyAdmin, I did not test using the command line.

Ami added a comment - 2016-04-22 11:56 I am also able to reproduce the issue easily in windows 7 x64, on XAMPP 3.2.2 (which is using MariaDB 10.1.10) and WAMP 2.4 (which is using 5.6.12-log - MySQL Community Server (GPL)). Note that I am using phpMyAdmin for all steps. To reproduce: 1) Import the database. 2) Run the above query (it should run fast) and time it. 3) Run OPTIMIZE (or any other command I listed above) command on products_description table. 4) Run the above query again, this time it should run much much slower. 5) Change the table products_description checksum to 1 and back to 0 6) Run the above query, it now should run as fast as it did in step 2. Again, all tests were done through phpMyAdmin, I did not test using the command line.

Elena Stepanova added a comment - 2016-04-22 13:23 - edited

Reproducible as described.
Instead of altering the table back and forth, ALTER TABLE products_description FORCE can be used.
As a workaround, SELECT STRAIGHT_JOIN ... seems to work.

The difference in plan between slow and fast execution:

--- slow_plan.sql       2016-04-22 16:17:43.224080338 +0300

+++ fast_plan.sql       2016-04-22 16:18:25.179596410 +0300

@@ -1,11 +1,11 @@

 id     select_type     table   type    possible_keys   key     key_len ref     rows    filtered        Extra

 1      PRIMARY pav     ALL     NULL    NULL    NULL    NULL    70      100.00  Using temporary; Using filesort

-1      PRIMARY <derived2>      ALL     NULL    NULL    NULL    NULL    55242   100.00  Distinct; Using join buffer (flat, BNL join)

-1      PRIMARY ps      eq_ref  idx_products_stock_attributes   idx_products_stock_attributes   771     q.products_id,func      1       100.00  Using index condition; Using where; Distinct

+1      PRIMARY ps      ALL     idx_products_stock_attributes   NULL    NULL    NULL    55266   75.00   Using where; Distinct; Using join buffer (flat, BNL join)

+1      PRIMARY <derived2>      ref     key0    key0    4       test.ps.products_id     959     100.00  Distinct

 2      DERIVED pa      ALL     idx_products_attributes_products_id     NULL    NULL    NULL    55242   100.00  Using where; Using temporary

 2      DERIVED pov     eq_ref  PRIMARY PRIMARY 8       test.pa.options_values_id,const 1       100.00  Using index; Distinct

 2      DERIVED p       eq_ref  PRIMARY,idx_products_date_added,idx_products_status     PRIMARY 4       test.pa.products_id     1       100.00  Using where; Distinct

 2      DERIVED s       ref     idx_specials_products_id        idx_specials_products_id        4       test.pa.products_id     1       100.00  Using index; Distinct

-2      DERIVED pd      ref     PRIMARY PRIMARY 4       test.pa.products_id     1       100.00  Using index; Distinct

+2      DERIVED pd      ref     PRIMARY PRIMARY 4       test.pa.products_id     960     100.00  Using index; Distinct

 Level  Code    Message

-Note   1003    select distinct `test`.`pav`.`products_options_values_id` AS `products_options_values_id`,`test`.`pav`.`products_options_values_name` AS `products_options_values_name`,`test`.`pav`.`products_options_values_sort_order` AS `products_options_values_sort_order` from `test`.`products_stock` `ps` join `test`.`products_options_values` `pav` join (select distinct `test`.`pa`.`products_id` AS `products_id` from `test`.`products_attributes` `pa` join `test`.`products_options_values` `pov` join `test`.`products` `p` left join `test`.`specials` `s` on((`test`.`s`.`products_id` = `test`.`pa`.`products_id`)) left join `test`.`products_description` `pd` on((`test`.`pd`.`products_id` = `test`.`pa`.`products_id`)) where ((`test`.`p`.`products_id` = `test`.`pa`.`products_id`) and (`test`.`pa`.`options_id` = 1) and (`test`.`pov`.`products_options_values_id` = `test`.`pa`.`options_values_id`) and (`test`.`pov`.`language_id` = '1') and (`test`.`p`.`products_status` = '1') and (<cache>(('2016-04-19' - interval 7000 day)) <= `test`.`p`.`products_date_added`))) `q` where ((`test`.`ps`.`products_id` = `q`.`products_id`) and (`test`.`ps`.`products_stock_attributes` = concat('1-',`test`.`pav`.`products_options_values_id`)) and (`test`.`ps`.`products_stock_quantity` > 0)) order by `test`.`pav`.`products_options_values_sort_order`

+Note   1003    select distinct `test`.`pav`.`products_options_values_id` AS `products_options_values_id`,`test`.`pav`.`products_options_values_name` AS `products_options_values_name`,`test`.`pav`.`products_options_values_sort_order` AS `products_options_values_sort_order` from `test`.`products_stock` `ps` join `test`.`products_options_values` `pav` join (select distinct `test`.`pa`.`products_id` AS `products_id` from `test`.`products_attributes` `pa` join `test`.`products_options_values` `pov` join `test`.`products` `p` left join `test`.`specials` `s` on((`test`.`s`.`products_id` = `test`.`pa`.`products_id`)) left join `test`.`products_description` `pd` on((`test`.`pd`.`products_id` = `test`.`pa`.`products_id`)) where ((`test`.`p`.`products_id` = `test`.`pa`.`products_id`) and (`test`.`pa`.`options_id` = 1) and (`test`.`pov`.`products_options_values_id` = `test`.`pa`.`options_values_id`) and (`test`.`pov`.`language_id` = '1') and (`test`.`p`.`products_status` = '1') and (<cache>(('2016-04-19' - interval 7000 day)) <= `test`.`p`.`products_date_added`))) `q` where ((`q`.`products_id` = `test`.`ps`.`products_id`) and (`test`.`ps`.`products_stock_attributes` = concat('1-',`test`.`pav`.`products_options_values_id`)) and (`test`.`ps`.`products_stock_quantity` > 0)) order by `test`.`pav`.`products_options_values_sort_order`

Also reproducible on MySQL 5.7 (execution time and maybe plans are different there, but the idea is the same - fast before ANALYZE, slow after ANALYZE).

Elena Stepanova added a comment - 2016-04-22 13:23 - edited Reproducible as described. Instead of altering the table back and forth, ALTER TABLE products_description FORCE can be used. As a workaround, SELECT STRAIGHT_JOIN ... seems to work. The difference in plan between slow and fast execution: --- slow_plan.sql 2016-04-22 16:17:43.224080338 +0300 +++ fast_plan.sql 2016-04-22 16:18:25.179596410 +0300 @@ -1,11 +1,11 @@ id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY pav ALL NULL NULL NULL NULL 70 100.00 Using temporary; Using filesort -1 PRIMARY <derived2> ALL NULL NULL NULL NULL 55242 100.00 Distinct; Using join buffer (flat, BNL join) -1 PRIMARY ps eq_ref idx_products_stock_attributes idx_products_stock_attributes 771 q.products_id,func 1 100.00 Using index condition; Using where; Distinct +1 PRIMARY ps ALL idx_products_stock_attributes NULL NULL NULL 55266 75.00 Using where; Distinct; Using join buffer (flat, BNL join) +1 PRIMARY <derived2> ref key0 key0 4 test.ps.products_id 959 100.00 Distinct 2 DERIVED pa ALL idx_products_attributes_products_id NULL NULL NULL 55242 100.00 Using where; Using temporary 2 DERIVED pov eq_ref PRIMARY PRIMARY 8 test.pa.options_values_id,const 1 100.00 Using index; Distinct 2 DERIVED p eq_ref PRIMARY,idx_products_date_added,idx_products_status PRIMARY 4 test.pa.products_id 1 100.00 Using where; Distinct 2 DERIVED s ref idx_specials_products_id idx_specials_products_id 4 test.pa.products_id 1 100.00 Using index; Distinct -2 DERIVED pd ref PRIMARY PRIMARY 4 test.pa.products_id 1 100.00 Using index; Distinct +2 DERIVED pd ref PRIMARY PRIMARY 4 test.pa.products_id 960 100.00 Using index; Distinct Level Code Message -Note 1003 select distinct `test`.`pav`.`products_options_values_id` AS `products_options_values_id`,`test`.`pav`.`products_options_values_name` AS `products_options_values_name`,`test`.`pav`.`products_options_values_sort_order` AS `products_options_values_sort_order` from `test`.`products_stock` `ps` join `test`.`products_options_values` `pav` join (select distinct `test`.`pa`.`products_id` AS `products_id` from `test`.`products_attributes` `pa` join `test`.`products_options_values` `pov` join `test`.`products` `p` left join `test`.`specials` `s` on((`test`.`s`.`products_id` = `test`.`pa`.`products_id`)) left join `test`.`products_description` `pd` on((`test`.`pd`.`products_id` = `test`.`pa`.`products_id`)) where ((`test`.`p`.`products_id` = `test`.`pa`.`products_id`) and (`test`.`pa`.`options_id` = 1) and (`test`.`pov`.`products_options_values_id` = `test`.`pa`.`options_values_id`) and (`test`.`pov`.`language_id` = '1') and (`test`.`p`.`products_status` = '1') and (<cache>(('2016-04-19' - interval 7000 day)) <= `test`.`p`.`products_date_added`))) `q` where ((`test`.`ps`.`products_id` = `q`.`products_id`) and (`test`.`ps`.`products_stock_attributes` = concat('1-',`test`.`pav`.`products_options_values_id`)) and (`test`.`ps`.`products_stock_quantity` > 0)) order by `test`.`pav`.`products_options_values_sort_order` +Note 1003 select distinct `test`.`pav`.`products_options_values_id` AS `products_options_values_id`,`test`.`pav`.`products_options_values_name` AS `products_options_values_name`,`test`.`pav`.`products_options_values_sort_order` AS `products_options_values_sort_order` from `test`.`products_stock` `ps` join `test`.`products_options_values` `pav` join (select distinct `test`.`pa`.`products_id` AS `products_id` from `test`.`products_attributes` `pa` join `test`.`products_options_values` `pov` join `test`.`products` `p` left join `test`.`specials` `s` on((`test`.`s`.`products_id` = `test`.`pa`.`products_id`)) left join `test`.`products_description` `pd` on((`test`.`pd`.`products_id` = `test`.`pa`.`products_id`)) where ((`test`.`p`.`products_id` = `test`.`pa`.`products_id`) and (`test`.`pa`.`options_id` = 1) and (`test`.`pov`.`products_options_values_id` = `test`.`pa`.`options_values_id`) and (`test`.`pov`.`language_id` = '1') and (`test`.`p`.`products_status` = '1') and (<cache>(('2016-04-19' - interval 7000 day)) <= `test`.`p`.`products_date_added`))) `q` where ((`q`.`products_id` = `test`.`ps`.`products_id`) and (`test`.`ps`.`products_stock_attributes` = concat('1-',`test`.`pav`.`products_options_values_id`)) and (`test`.`ps`.`products_stock_quantity` > 0)) order by `test`.`pav`.`products_options_values_sort_order` Also reproducible on MySQL 5.7 (execution time and maybe plans are different there, but the idea is the same - fast before ANALYZE, slow after ANALYZE).

Ami added a comment - 2016-04-22 13:27

Thanks, Elena. Needing a bug fix tho, not temporary workarounds to modify queries, as this bug possibly affects entire application and entire applications worldwide...

Ami added a comment - 2016-04-22 13:27 Thanks, Elena. Needing a bug fix tho, not temporary workarounds to modify queries, as this bug possibly affects entire application and entire applications worldwide...

Sergei Petrunia added a comment - 2018-02-06 09:08

Discussing with varun.

Removed ORDER BY and DISTINCT clauses from the top-level SELECT. The slowdown can still be observed.

Sergei Petrunia added a comment - 2018-02-06 09:08 Discussing with varun . Removed ORDER BY and DISTINCT clauses from the top-level SELECT. The slowdown can still be observed.

Varun Gupta (Inactive) added a comment - 2018-02-06 09:11

The slow query plan without order by and distinct in the top select

+------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+

| id   | select_type | table      | type   | possible_keys                                       | key                           | key_len | ref                             | rows  | Extra                              |

+------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+

|    1 | PRIMARY     | pav        | ALL    | NULL                                                | NULL                          | NULL    | NULL                            |    70 |                                    |

|    1 | PRIMARY     | <derived2> | ALL    | NULL                                                | NULL                          | NULL    | NULL                            | 55242 | Using join buffer (flat, BNL join) |

|    1 | PRIMARY     | ps         | eq_ref | idx_products_stock_attributes                       | idx_products_stock_attributes | 771     | q.products_id,func              |     1 | Using where; Using index           |

|    2 | DERIVED     | pa         | ALL    | idx_products_attributes_products_id                 | NULL                          | NULL    | NULL                            | 55242 | Using where; Using temporary       |

|    2 | DERIVED     | pov        | eq_ref | PRIMARY                                             | PRIMARY                       | 8       | test.pa.options_values_id,const |     1 | Using index; Distinct              |

|    2 | DERIVED     | p          | eq_ref | PRIMARY,idx_products_date_added,idx_products_status | PRIMARY                       | 4       | test.pa.products_id             |     1 | Using where; Distinct              |

|    2 | DERIVED     | s          | ref    | idx_specials_products_id                            | idx_specials_products_id      | 4       | test.pa.products_id             |     1 | Using index; Distinct              |

|    2 | DERIVED     | pd         | ref    | PRIMARY                                             | PRIMARY                       | 4       | test.pa.products_id             |     1 | Using index; Distinct              |

+------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+

Varun Gupta (Inactive) added a comment - 2018-02-06 09:11 The slow query plan without order by and distinct in the top select +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+ | 1 | PRIMARY | pav | ALL | NULL | NULL | NULL | NULL | 70 | | | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 55242 | Using join buffer (flat, BNL join) | | 1 | PRIMARY | ps | eq_ref | idx_products_stock_attributes | idx_products_stock_attributes | 771 | q.products_id,func | 1 | Using where; Using index | | 2 | DERIVED | pa | ALL | idx_products_attributes_products_id | NULL | NULL | NULL | 55242 | Using where; Using temporary | | 2 | DERIVED | pov | eq_ref | PRIMARY | PRIMARY | 8 | test.pa.options_values_id,const | 1 | Using index; Distinct | | 2 | DERIVED | p | eq_ref | PRIMARY,idx_products_date_added,idx_products_status | PRIMARY | 4 | test.pa.products_id | 1 | Using where; Distinct | | 2 | DERIVED | s | ref | idx_specials_products_id | idx_specials_products_id | 4 | test.pa.products_id | 1 | Using index; Distinct | | 2 | DERIVED | pd | ref | PRIMARY | PRIMARY | 4 | test.pa.products_id | 1 | Using index; Distinct | +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+

Varun Gupta (Inactive) added a comment - 2018-02-06 09:12

For the fast query

+------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+

| id   | select_type | table      | type   | possible_keys                                       | key                           | key_len | ref                             | rows  | Extra                                                        |

+------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+

|    1 | PRIMARY     | pav        | ALL    | NULL                                                | NULL                          | NULL    | NULL                            |    70 |                                                              |

|    1 | PRIMARY     | ps         | index  | idx_products_stock_attributes                       | idx_products_stock_attributes | 771     | NULL                            | 55266 | Using where; Using index; Using join buffer (flat, BNL join) |

|    1 | PRIMARY     | <derived2> | ref    | key0                                                | key0                          | 4       | test.ps.products_id             |   959 |                                                              |

|    2 | DERIVED     | pa         | ALL    | idx_products_attributes_products_id                 | NULL                          | NULL    | NULL                            | 55242 | Using where; Using temporary                                 |

|    2 | DERIVED     | pov        | eq_ref | PRIMARY                                             | PRIMARY                       | 8       | test.pa.options_values_id,const |     1 | Using index; Distinct                                        |

|    2 | DERIVED     | p          | eq_ref | PRIMARY,idx_products_date_added,idx_products_status | PRIMARY                       | 4       | test.pa.products_id             |     1 | Using where; Distinct                                        |

|    2 | DERIVED     | s          | ref    | idx_specials_products_id                            | idx_specials_products_id      | 4       | test.pa.products_id             |     1 | Using index; Distinct                                        |

|    2 | DERIVED     | pd         | ref    | PRIMARY                                             | PRIMARY                       | 4       | test.pa.products_id             |   960 | Using index; Distinct                                        |

+------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+

Varun Gupta (Inactive) added a comment - 2018-02-06 09:12 For the fast query +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+ | 1 | PRIMARY | pav | ALL | NULL | NULL | NULL | NULL | 70 | | | 1 | PRIMARY | ps | index | idx_products_stock_attributes | idx_products_stock_attributes | 771 | NULL | 55266 | Using where; Using index; Using join buffer (flat, BNL join) | | 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | test.ps.products_id | 959 | | | 2 | DERIVED | pa | ALL | idx_products_attributes_products_id | NULL | NULL | NULL | 55242 | Using where; Using temporary | | 2 | DERIVED | pov | eq_ref | PRIMARY | PRIMARY | 8 | test.pa.options_values_id,const | 1 | Using index; Distinct | | 2 | DERIVED | p | eq_ref | PRIMARY,idx_products_date_added,idx_products_status | PRIMARY | 4 | test.pa.products_id | 1 | Using where; Distinct | | 2 | DERIVED | s | ref | idx_specials_products_id | idx_specials_products_id | 4 | test.pa.products_id | 1 | Using index; Distinct | | 2 | DERIVED | pd | ref | PRIMARY | PRIMARY | 4 | test.pa.products_id | 960 | Using index; Distinct | +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+

Sergei Petrunia added a comment - 2018-02-06 09:17 - edited

Observations

The derived table (id=2) uses the same query plan
- except for rows in the last table, pd. Slow plan: rows=1, Fast plan: rows=960.

Top-level SELECT has different join orders:
- The slow one has pav, <derived2>, ps
- the fast one has pav, ps, <derived2>

Sergei Petrunia added a comment - 2018-02-06 09:17 - edited Observations The derived table (id=2) uses the same query plan except for rows in the last table, pd . Slow plan: rows=1 , Fast plan: rows=960 . Top-level SELECT has different join orders: The slow one has pav, <derived2>, ps the fast one has pav, ps, <derived2>

Varun Gupta (Inactive) added a comment - 2018-02-06 09:26

Analyze format=json output for the slow query
https://gist.github.com/anonymous/ee09ae49ec9ca91105fa18b73d11ad7c

Analyze format=json output for the fast query
https://gist.github.com/anonymous/4c447813f44a0892339b569266dc9d44

Varun Gupta (Inactive) added a comment - 2018-02-06 09:26 Analyze format=json output for the slow query https://gist.github.com/anonymous/ee09ae49ec9ca91105fa18b73d11ad7c Analyze format=json output for the fast query https://gist.github.com/anonymous/4c447813f44a0892339b569266dc9d44

Sergei Petrunia added a comment - 2018-02-06 09:52

Looking at the above

Query times

fast query: 3.7 sec
slow query: 19.8 sec

query plans:

In both queries, materialized/r_total_time_ms is 800-900 msec.
Comparing rows and r_rows one can see that the slow query has a better rows estimate for table pd ( 1 vs 960).
There is a 2x difference in filtered vs r_filtered for table p (3rd in the subquery). I don't think this is the deciding factor.

The slow query plan spends most of its time (18 sec. out of 20) reading table ps:

"table": {

      "table_name": "ps",

      "access_type": "eq_ref",

      "possible_keys": ["idx_products_stock_attributes"],

      "key": "idx_products_stock_attributes",

      "key_length": "771",

      "used_key_parts": ["products_id", "products_stock_attributes"],

      "ref": ["q.products_id", "func"],

      "r_loops": 289730,

      "rows": 1,

      "r_rows": 0.0992,

      "r_total_time_ms": 18315,

Sergei Petrunia added a comment - 2018-02-06 09:52 Looking at the above Query times fast query: 3.7 sec slow query: 19.8 sec query plans: In both queries, materialized/r_total_time_ms is 800-900 msec. Comparing rows and r_rows one can see that the slow query has a better rows estimate for table pd ( 1 vs 960). There is a 2x difference in filtered vs r_filtered for table p (3rd in the subquery). I don't think this is the deciding factor. The slow query plan spends most of its time (18 sec. out of 20) reading table ps : "table": { "table_name": "ps", "access_type": "eq_ref", "possible_keys": ["idx_products_stock_attributes"], "key": "idx_products_stock_attributes", "key_length": "771", "used_key_parts": ["products_id", "products_stock_attributes"], "ref": ["q.products_id", "func"], "r_loops": 289730, "rows": 1, "r_rows": 0.0992, "r_total_time_ms": 18315,

Sergei Petrunia added a comment - 2018-02-06 09:56

      "r_loops": 289730,

This comes from

#r_rows(pav) *  #r_rows(<derived2>)  = 70 * 4139 =289,730

r_rows for pav matches the estimate, r_rows for <derived2> is actually lower than the estimate:

    "block-nl-join": {

      "table": {

        "table_name": "<derived2>",

        "access_type": "ALL",

        "r_loops": 1,

        "rows": 55242,

        "r_rows": 4139,

So we are actually doing fewer lookups than the optimizer should have expected.

Sergei Petrunia added a comment - 2018-02-06 09:56 "r_loops": 289730, This comes from #r_rows(pav) * #r_rows(<derived2>) = 70 * 4139 =289,730 r_rows for pav matches the estimate, r_rows for <derived2> is actually lower than the estimate: "block-nl-join": { "table": { "table_name": "<derived2>", "access_type": "ALL", "r_loops": 1, "rows": 55242, "r_rows": 4139, So we are actually doing fewer lookups than the optimizer should have expected.

Sergei Petrunia added a comment - 2018-02-06 09:58

289730 lookups per 18 seconds is 16,096 lookups per sec. Not fast but note the key_length=771.

Sergei Petrunia added a comment - 2018-02-06 09:58 289730 lookups per 18 seconds is 16,096 lookups per sec. Not fast but note the key_length=771.

Sergei Petrunia added a comment - 2018-02-06 09:59 - edited

Looking at the fast query plan, one can see that the table ps doesn't have many rows:

    "block-nl-join": {

      "table": {

        "table_name": "ps",

        "access_type": "index",

        "possible_keys": ["idx_products_stock_attributes"],

        "key": "idx_products_stock_attributes",

        "key_length": "771",

        "used_key_parts": ["products_id", "products_stock_attributes"],

        "r_loops": 1,

        "rows": 55266,

        "r_rows": 55266,

        "r_total_time_ms": 64.807,

        "filtered": 75.001,

        "r_filtered": 100,

        "using_index": true

Just 55K Rows in total.

Sergei Petrunia added a comment - 2018-02-06 09:59 - edited Looking at the fast query plan, one can see that the table ps doesn't have many rows: "block-nl-join": { "table": { "table_name": "ps", "access_type": "index", "possible_keys": ["idx_products_stock_attributes"], "key": "idx_products_stock_attributes", "key_length": "771", "used_key_parts": ["products_id", "products_stock_attributes"], "r_loops": 1, "rows": 55266, "r_rows": 55266, "r_total_time_ms": 64.807, "filtered": 75.001, "r_filtered": 100, "using_index": true Just 55K Rows in total.

Sergei Petrunia added a comment - 2018-02-06 09:59

Need to debug the join optimization process in the top-level query.

Sergei Petrunia added a comment - 2018-02-06 09:59 Need to debug the join optimization process in the top-level query.

Varun Gupta (Inactive) added a comment - 2018-02-07 08:55 - edited

Observations for fast query
https://gist.github.com/anonymous/6e7d980971036b3bf1affd4dcbf9054e

Observations for slow query
https://gist.github.com/anonymous/1f29eb63bd1f70484c7acc5445628459

Varun Gupta (Inactive) added a comment - 2018-02-07 08:55 - edited Observations for fast query https://gist.github.com/anonymous/6e7d980971036b3bf1affd4dcbf9054e Observations for slow query https://gist.github.com/anonymous/1f29eb63bd1f70484c7acc5445628459

Sergei Petrunia added a comment - 2018-02-07 12:51 - edited

== Slow join order ==

1. pav

   records=70

   cost=2.47

   current_record_count=70

   current_read_time= 0 + 2.47 + 14= 16.47

2. pav, q

   record_count=7

   s->found_records=55,242

   tmp= s->scan_time() = 55,242

   // join buffer is filled 1 time

   cost= 55,242

   records=55,242  // NOTE: actual number is about 4K.

   current_record_count= 70 * 55242= 3,866,940

   current_read_time = 16.47 + 3.8M/5 = 828,646

3. pav, q, ps

   record_count=3,866,940

  //note: the actual number (from ANALYZE) is  290,430

  // which is equal to 70*4149 (which means pav-q do a cross join)

   records= 1.0 // eq_ref

   read_time= prev_record_reads(...)= 3,866,940

   // note: the table has 55,266 rows in total

   current_record_count= 1*3.8M =  3,866,940

   current_read_time= 828,646 + 3.8M/5 = 5,468,974

== Fast join order ==

1. pav

2. pav, ps

   record_count=70

   // table has 55,266 records

   read_time= table->scan_time()=278.

   // join buffer is filled 1 time

   cost=3041 // due to (s->records - rnd_records) addition

   records= 41,450

   current_record_count= 70 * 41450= 2,901,500

   current_read_time = 16.47 + 3041 + 2.9M/5 = 583,358

3. pav, ps, q

   record_count=2,901,500

   // table q has 55242 records

  // note: the actual number (from ANALYZE) is 16,667 (*)

   records=10

   // ^ this is computed as ref access estimate for the case when no

   //statistics is available.  (the real number from ANALYZE is 0.9999)

   tmp = record_count* file->read_time(1, 10 rows)= record_count * 1.5 =

    2,901,500 * 1.5 = 4,352,250

   current_record_count= 2,901,500 * 10 = 29,016,050

   current_read_time= 583,358 + 4,352,250 + 29M/5 = 10,738,818

Sergei Petrunia added a comment - 2018-02-07 12:51 - edited == Slow join order == 1. pav records=70 cost=2.47 current_record_count=70 current_read_time= 0 + 2.47 + 14= 16.47 2. pav, q record_count=7 s->found_records=55,242 tmp= s->scan_time() = 55,242 // join buffer is filled 1 time cost= 55,242 records=55,242 // NOTE: actual number is about 4K. current_record_count= 70 * 55242= 3,866,940 current_read_time = 16.47 + 3.8M/5 = 828,646 3. pav, q, ps record_count=3,866,940 //note: the actual number (from ANALYZE) is 290,430 // which is equal to 70*4149 (which means pav-q do a cross join) records= 1.0 // eq_ref read_time= prev_record_reads(...)= 3,866,940 // note: the table has 55,266 rows in total current_record_count= 1*3.8M = 3,866,940 current_read_time= 828,646 + 3.8M/5 = 5,468,974 == Fast join order == 1. pav 2. pav, ps record_count=70 // table has 55,266 records read_time= table->scan_time()=278. // join buffer is filled 1 time cost=3041 // due to (s->records - rnd_records) addition records= 41,450 current_record_count= 70 * 41450= 2,901,500 current_read_time = 16.47 + 3041 + 2.9M/5 = 583,358 3. pav, ps, q record_count=2,901,500 // table q has 55242 records // note: the actual number (from ANALYZE) is 16,667 (*) records=10 // ^ this is computed as ref access estimate for the case when no //statistics is available. (the real number from ANALYZE is 0.9999) tmp = record_count* file->read_time(1, 10 rows)= record_count * 1.5 = 2,901,500 * 1.5 = 4,352,250 current_record_count= 2,901,500 * 10 = 29,016,050 current_read_time= 583,358 + 4,352,250 + 29M/5 = 10,738,818

Sergei Petrunia added a comment - 2018-02-07 13:49 - edited

ANALYZE FORMAT=JSON for the fast query plan (forced it to be chosen in debugger): https://gist.github.com/spetrunia/b3f686ca8a8451255529fa76995287eb

The slow plan, miscalculation 1

heavily over-estimates #rows in the temp. table q (estimate is 55K, reality is 4K). This also causes the number of reads from table ps to be >10x higher (3.8M vs 290K in reality)

The fast plan, miscalculation 1

Heavily over-estimates cardinality of pav JOIN ps.

      "table": {

        "table_name": "ps",

        "access_type": "ALL",

        "possible_keys": ["idx_products_stock_attributes"],

        "r_loops": 1,

        "rows": 55266,

        "r_rows": 55266,

        "r_total_time_ms": 25.906,

        "filtered": 75.001,

        "r_filtered": 30.176,

        "attached_condition": "(ps.products_stock_quantity > 0)"

2x wrong estimate on selectivity of a simple condition (Okay per-se but it adds up with others)

The fast plan, miscalculation 2

The second:

    "block-nl-join": {

      "table": {

        "table_name": "ps",

...

},

      "buffer_type": "flat",

      "buffer_size": "256Kb",

      "join_type": "BNL",

      "attached_condition": "(ps.products_stock_attributes = convert(concat('1-',pav.products_options_values_id) using utf8))",

      "r_filtered": 1.4286

},

Here r_filtered=1.4%.
Note that filtered is not even present as the optimizer doesn't try to account for the selectivity of this post-join condition.

The result is that for the derived table we have:
(from debug log, (should ANALYZE include this number?) ):

 record_count=2,901,500

from ANALYZE:

"r_loops": 16677,

The fast plan, miscalculation 3

There's more:

    "table": {

      "table_name": "<derived2>",

      "access_type": "ref",

      "possible_keys": ["key0"],

      "key": "key0",

      "key_length": "4",

      "used_key_parts": ["products_id"],

      "ref": ["j2.ps.products_id"],

      "r_loops": 16677,

      "rows": 10,

      "r_rows": 0.9999,

Note rows vs r_rows. Also 10x over-estimation. The estimate of 10 comes from this code in best_access_path:

              if (!(records= keyinfo->actual_rec_per_key(key_parts-1)))

              {                                   /* Prefer longer keys */

                records=

                  ((double) s->records / (double) rec *

                   (1.0 +

                    ((double) (table->s->max_key_length-keyinfo->key_length) /

                     (double) table->s->max_key_length)));

                if (records < 2.0)

                  records=2.0;               /* Can't be as good as a unique */

Sergei Petrunia added a comment - 2018-02-07 13:49 - edited ANALYZE FORMAT=JSON for the fast query plan (forced it to be chosen in debugger): https://gist.github.com/spetrunia/b3f686ca8a8451255529fa76995287eb The slow plan, miscalculation 1 heavily over-estimates #rows in the temp. table q (estimate is 55K, reality is 4K). This also causes the number of reads from table ps to be >10x higher (3.8M vs 290K in reality) The fast plan, miscalculation 1 Heavily over-estimates cardinality of pav JOIN ps . "table": { "table_name": "ps", "access_type": "ALL", "possible_keys": ["idx_products_stock_attributes"], "r_loops": 1, "rows": 55266, "r_rows": 55266, "r_total_time_ms": 25.906, "filtered": 75.001, "r_filtered": 30.176, "attached_condition": "(ps.products_stock_quantity > 0)" 2x wrong estimate on selectivity of a simple condition (Okay per-se but it adds up with others) The fast plan, miscalculation 2 The second: "block-nl-join": { "table": { "table_name": "ps", ... }, "buffer_type": "flat", "buffer_size": "256Kb", "join_type": "BNL", "attached_condition": "(ps.products_stock_attributes = convert(concat('1-',pav.products_options_values_id) using utf8))", "r_filtered": 1.4286 }, Here r_filtered=1.4% . Note that filtered is not even present as the optimizer doesn't try to account for the selectivity of this post-join condition. The result is that for the derived table we have: (from debug log, (should ANALYZE include this number?) ): record_count=2,901,500 from ANALYZE: "r_loops": 16677, The fast plan, miscalculation 3 There's more: "table": { "table_name": "<derived2>", "access_type": "ref", "possible_keys": ["key0"], "key": "key0", "key_length": "4", "used_key_parts": ["products_id"], "ref": ["j2.ps.products_id"], "r_loops": 16677, "rows": 10, "r_rows": 0.9999, Note rows vs r_rows . Also 10x over-estimation. The estimate of 10 comes from this code in best_access_path: if (!(records= keyinfo->actual_rec_per_key(key_parts-1))) { /* Prefer longer keys */ records= ((double) s->records / (double) rec * (1.0 + ((double) (table->s->max_key_length-keyinfo->key_length) / (double) table->s->max_key_length))); if (records < 2.0) records=2.0; /* Can't be as good as a unique */ }

Sergei Petrunia added a comment - 2018-02-07 13:57 - edited

"The fast plan, miscalculation 2" has the biggest impact. However it is rather hard to make the optimizer take this into account.

The condition is

 ps.products_stock_attributes = Concat('1-', pav.products_options_values_id)

The table has the column as second part of a unique key:

Create Table: CREATE TABLE `products_stock` (

  `products_stock_id` int(11) NOT NULL AUTO_INCREMENT,

  `products_id` int(11) NOT NULL DEFAULT '0',

  `products_stock_attributes` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',

  `products_stock_quantity` int(11) NOT NULL DEFAULT '0',

  PRIMARY KEY (`products_stock_id`),

  UNIQUE KEY `idx_products_stock_attributes` (`products_id`,`products_stock_attributes`)

products_stock has 55266 rows.

Most common values:

select *

from (

   select count(*) as CNT

   from products_stock group

   by products_stock_attributes

) T

order by CNT desc limit 10;

+------+

| CNT  |

+------+

| 6980 |

| 6529 |

| 6467 |

| 5095 |

| 4603 |

| 3870 |

...

So the most popular value has 6980 / 55266.=0.12, 12% of the table.
We could use the statistics to produce a selectivity estimate.

Sergei Petrunia added a comment - 2018-02-07 13:57 - edited "The fast plan, miscalculation 2" has the biggest impact. However it is rather hard to make the optimizer take this into account. The condition is ps.products_stock_attributes = Concat('1-', pav.products_options_values_id) The table has the column as second part of a unique key: Create Table: CREATE TABLE `products_stock` ( `products_stock_id` int(11) NOT NULL AUTO_INCREMENT, `products_id` int(11) NOT NULL DEFAULT '0', `products_stock_attributes` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `products_stock_quantity` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`products_stock_id`), UNIQUE KEY `idx_products_stock_attributes` (`products_id`,`products_stock_attributes`) products_stock has 55266 rows. Most common values: select * from ( select count(*) as CNT from products_stock group by products_stock_attributes ) T order by CNT desc limit 10; +------+ | CNT | +------+ | 6980 | | 6529 | | 6467 | | 5095 | | 4603 | | 3870 | ... So the most popular value has 6980 / 55266.=0.12, 12% of the table. We could use the statistics to produce a selectivity estimate.

Sergei Petrunia added a comment - 2018-02-07 14:08 - edited

"The fast plan, miscalculation 3" could have been avoided as the derived table is defined as:

       (SELECT DISTINCT pa.products_id

        FROM   products_attributes pa,

               products_options_values pov,

...

) q

Since it's SELECT DISTINCT, we know rows=1 there.
... and if I change it to be rows=1 in the debugger, the fast query plan is chosen!

Sergei Petrunia added a comment - 2018-02-07 14:08 - edited "The fast plan, miscalculation 3" could have been avoided as the derived table is defined as: (SELECT DISTINCT pa.products_id FROM products_attributes pa, products_options_values pov, ... ) q Since it's SELECT DISTINCT, we know rows=1 there. ... and if I change it to be rows=1 in the debugger, the fast query plan is chosen!

Sergei Petrunia added a comment - 2018-02-08 12:40

More details about how to use EITS statistics to produce selectivitiy estimates.

MariaDB [j2]> select *,hex(histogram) from mysql.column_stats where table_name='products_stock' and column_name='products_stock_attributes'\G;

*************************** 1. row ***************************

       db_name: j2

    table_name: products_stock

   column_name: products_stock_attributes

     min_value: 1-1

     max_value: 1-8

   nulls_ratio: 0.0000

    avg_length: 3.4370

 avg_frequency: 812.7353

     hist_size: 255

     hist_type: SINGLE_PREC_HB

     histogram:         $$$$$$$$$$$$&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHJmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm��������������������������������������������������������������

hex(histogram): 0000000000000000010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101242424242424242424242424262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626264848484848484848484848484848484848484848484848484848484848484848484A6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D919191919191919191919191919191919191919191919191B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6DADADADADADADADADADADADAFFFFFFFFFFFFFFFF

1 row in set (0.00 sec)

Find the longest run of buckets that have the same bound
Here, the longest run is 12% of total_buckets.
this means that selectivity of condition

products_stock_attributes = ...

is at most 0.12.

Sergei Petrunia added a comment - 2018-02-08 12:40 More details about how to use EITS statistics to produce selectivitiy estimates. MariaDB [j2]> select *,hex(histogram) from mysql.column_stats where table_name='products_stock' and column_name='products_stock_attributes'\G; *************************** 1. row *************************** db_name: j2 table_name: products_stock column_name: products_stock_attributes min_value: 1-1 max_value: 1-8 nulls_ratio: 0.0000 avg_length: 3.4370 avg_frequency: 812.7353 hist_size: 255 hist_type: SINGLE_PREC_HB histogram: $$$$$$$$$$$$&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHJmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm�� hex(histogram): 0000000000000000010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101242424242424242424242424262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626264848484848484848484848484848484848484848484848484848484848484848484A6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D919191919191919191919191919191919191919191919191B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6DADADADADADADADADADADADAFFFFFFFFFFFFFFFF 1 row in set (0.00 sec) Find the longest run of buckets that have the same bound Here, the longest run is 12% of total_buckets. this means that selectivity of condition products_stock_attributes = ... is at most 0.12.

Sergei Petrunia added a comment - 2018-03-14 17:41

varun, let's make step#1: if a derived table has SELECT DISTINCT, provide index statistics for it so that the join optimizer in the upper select knows that ref access to the table will produce one row.

Sergei Petrunia added a comment - 2018-03-14 17:41 varun , let's make step#1: if a derived table has SELECT DISTINCT, provide index statistics for it so that the join optimizer in the upper select knows that ref access to the table will produce one row.

Varun Gupta (Inactive) added a comment - 2018-03-22 17:20

Patch for step #1
http://lists.askmonty.org/pipermail/commits/2018-March/012123.html

This patch fixes the slowness issue when we run the same query above after running the alter table on products_description.

Varun Gupta (Inactive) added a comment - 2018-03-22 17:20 Patch for step #1 http://lists.askmonty.org/pipermail/commits/2018-March/012123.html This patch fixes the slowness issue when we run the same query above after running the alter table on products_description.

Varun Gupta (Inactive) added a comment - 2018-03-22 21:09

Also for step 1 we need to take care of UNION ALL for derived tables with SELECT DISTINCT, because these can give duplicate entires and for such cases we should not provide index statistics to the join optimizer that ref access to the derived table would produce only one row.
INTERSECT and EXCEPT are no problems, because they won't give duplicate entries.

Varun Gupta (Inactive) added a comment - 2018-03-22 21:09 Also for step 1 we need to take care of UNION ALL for derived tables with SELECT DISTINCT, because these can give duplicate entires and for such cases we should not provide index statistics to the join optimizer that ref access to the derived table would produce only one row. INTERSECT and EXCEPT are no problems, because they won't give duplicate entries.

Varun Gupta (Inactive) added a comment - 2018-03-23 22:05

Patch 2 sent for review( addressed review comments for 1st and added handling for UNION/INTERSECT/EXCEPT.
http://lists.askmonty.org/pipermail/commits/2018-March/012127.html

Varun Gupta (Inactive) added a comment - 2018-03-23 22:05 Patch 2 sent for review( addressed review comments for 1st and added handling for UNION/INTERSECT/EXCEPT. http://lists.askmonty.org/pipermail/commits/2018-March/012127.html

Varun Gupta (Inactive) added a comment - 2018-03-26 09:34

Step #2:
If any field in the select list of the derived tables is present in the group by list also , then we are again guaranteed that ref access to the derived table would always produce one row per key.

Varun Gupta (Inactive) added a comment - 2018-03-26 09:34 Step #2: If any field in the select list of the derived tables is present in the group by list also , then we are again guaranteed that ref access to the derived table would always produce one row per key.

Sergei Petrunia added a comment - 2018-03-26 09:45

Review feedback: https://lists.launchpad.net/maria-developers/msg11151.html

Sergei Petrunia added a comment - 2018-03-26 09:45 Review feedback: https://lists.launchpad.net/maria-developers/msg11151.html

Varun Gupta (Inactive) added a comment - 2018-03-27 11:45

Patch for step 1: http://lists.askmonty.org/pipermail/commits/2018-March/012147.html

Patch for step 2: http://lists.askmonty.org/pipermail/commits/2018-March/012148.html

Varun Gupta (Inactive) added a comment - 2018-03-27 11:45 Patch for step 1: http://lists.askmonty.org/pipermail/commits/2018-March/012147.html Patch for step 2: http://lists.askmonty.org/pipermail/commits/2018-March/012148.html

Varun Gupta (Inactive) added a comment - 2018-03-27 11:46

Also the query in this MDEV performs fast irrespective of using the ANALYZE command on the product_description table

Varun Gupta (Inactive) added a comment - 2018-03-27 11:46 Also the query in this MDEV performs fast irrespective of using the ANALYZE command on the product_description table

Ami added a comment - 2019-04-15 05:28

It's been a year since this issue has been changed status to "in review". Is there any progress since then?

Thank you

Ami added a comment - 2019-04-15 05:28 It's been a year since this issue has been changed status to "in review". Is there any progress since then? Thank you

Sergei Petrunia added a comment - 2019-04-17 15:15

Let me look at the patches again

Sergei Petrunia added a comment - 2019-04-17 15:15 Let me look at the patches again

Sergei Petrunia added a comment - 2019-04-18 21:18

The "step1" patch will need adjustment, review email sent.

Sergei Petrunia added a comment - 2019-04-18 21:18 The "step1" patch will need adjustment, review email sent.

Sergei Petrunia added a comment - 2019-04-19 07:54

Wait, the review sent yesterday was incomplete and the comment there totally missed the point.

I'm working on the updated version.

Sergei Petrunia added a comment - 2019-04-19 07:54 Wait, the review sent yesterday was incomplete and the comment there totally missed the point. I'm working on the updated version.

Sergei Petrunia added a comment - 2019-04-20 09:25

Review feedback for part#1 sent. The patch will need to be changed.

Sergei Petrunia added a comment - 2019-04-20 09:25 Review feedback for part#1 sent. The patch will need to be changed.

Varun Gupta (Inactive) added a comment - 2019-04-23 09:23

Another attempt for part #1

http://lists.askmonty.org/pipermail/commits/2019-April/013664.html

Varun Gupta (Inactive) added a comment - 2019-04-23 09:23 Another attempt for part #1 http://lists.askmonty.org/pipermail/commits/2019-April/013664.html

Varun Gupta (Inactive) added a comment - 2019-04-27 06:17

Another attempt for part #2
http://lists.askmonty.org/pipermail/commits/2019-April/013674.html

Varun Gupta (Inactive) added a comment - 2019-04-27 06:17 Another attempt for part #2 http://lists.askmonty.org/pipermail/commits/2019-April/013674.html

Sergei Petrunia added a comment - 2019-04-27 13:48

Review input provided for step 2.

Sergei Petrunia added a comment - 2019-04-27 13:48 Review input provided for step 2.

Varun Gupta (Inactive) added a comment - 2019-04-30 15:38

As discuss with psergey, we would just push the patch for part #1 , for part #2 we should wait https://jira.mariadb.org/browse/MDEV-11588 and then use that to give correct statistics

Varun Gupta (Inactive) added a comment - 2019-04-30 15:38 As discuss with psergey , we would just push the patch for part #1 , for part #2 we should wait https://jira.mariadb.org/browse/MDEV-11588 and then use that to give correct statistics

People

Assignee:: Varun Gupta (Inactive)

Reporter:: Ami

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2016-04-20 18:54

Updated:: 2019-05-01 20:45

Resolved:: 2019-05-01 20:45

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server