Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9959

A serious MariaDB server performance bug

Details

    • 10.2.13, 10.3.6-1

    Description

      I am having a strange issue with a slow query under certain conditions:
      I have made a few tests and was able to isolate the issue to one single table called `products_description` (all the tables are MyISAM).
      At first I noticed that when this table is new (ie. just imported), the query always performs fast (~0.3s).
      However, if I perform any of these operations on this specific table, at any time (even immediately after importing it):
      `CHECK`, `OPTIMIZE`, `ANALYZE` or `REPAIR`, the query suddenly slows down x10 times (takes ~4.5s) and keeps staying always slow.

      Note that I forced no-caching when running the query to ensure the results are correct.

      I am able to restore the performance only if I am then performing any of these operations on that table:

      1) `DROP` the table and import it again.

      or

      2) `ALTER` any of the following of that table: `Collation` or `CHECKSUM` or `DELAY_KEY_WRITE`. It then runs fast with the altered value and when reverting back to the old value, the performance remains fast.

      Even then, if I perform any of the `CHECK`, `OPTIMIZE`, `ANALYZE` or `REPAIR` operations on that table, the query speed drops until I do either 1) or 2)

      One more thing I tested:
      Before performing any operation on the table, I backed up the table's files (`products_description.frm`, `products_description.MYD`, `products_description.MYI`), ran the query, it ran fast. Then I performed `CHECK` on the table, ran the query, speed was x10 times slower, I copied the backed up files and overwrote the 3 files, ran the query again, slow speed again.

      I have compressed the database in a ~5mb zip file (~80mb unzipped) and uploaded it to the private folder on MariaDB ftp.

      Here are the EXPLAIN EXTENDED results. Seems like the optimizer is working differently on both tables, but that doesn't really explain why that happens, as the copied database is supposedly identical.

      http://i.stack.imgur.com/NQ9YQ.png
      http://i.stack.imgur.com/pLyOt.png

      These are the screenshots of the profiling of the query on the slow and fast database:

      http://i.stack.imgur.com/JBlwP.png
      http://i.stack.imgur.com/E1qf0.png

      This the SQL query that I am running and you should test with is:

          SELECT DISTINCT pav.products_options_values_id, 
                      pav.products_options_values_name, 
                      pav.products_options_values_sort_order 
          FROM   products_stock ps, 
                 products_options_values pav, 
                 (SELECT DISTINCT pa.products_id 
                  FROM   products_attributes pa, 
                         products_options_values pov, 
                         (SELECT p.products_id, 
                                 p.products_image, 
                                 p.products_subimage1, 
                                 pd.products_name, 
                                 p.products_quantity, 
                                 p.products_model, 
                                 p.products_ordered, 
                                 p.products_price, 
                                 p.products_date_added, 
                                 p.products_weight, 
                                 p.products_length, 
                                 p.products_width, 
                                 p.products_height, 
                                 p.products_tax_class_id, 
                                 p.products_status, 
                                 IF(s.status, s.specials_new_products_price, NULL) 
                                 AS 
                                 specials_new_products_price, 
                                 IF(s.status, s.specials_new_products_price, 
                                 p.products_price) AS 
                                 final_price, 
                                 IF(p.clearance_price < p.products_cost * 2.25, 
                                 p.clearance_price, 
                                 p.products_cost * 2.25) 
                                 AS 
                                 sorting_price 
                          FROM   products p 
                                 LEFT JOIN specials s 
                                        ON p.products_id = s.products_id 
                                 LEFT JOIN products_description pd 
                                        ON p.products_id = pd.products_id 
                          WHERE 
                           /*FASTIDS*/ 
                           p.products_status = '1' 
                           AND Date_sub('2016-04-19', INTERVAL 7000 day) <= 
                               p.products_date_added 
                         ) m 
                  WHERE  m.products_id = pa.products_id 
                         AND pa.options_id = 1 
                         AND pa.options_values_id = pov.products_options_values_id 
                         AND pov.language_id = '1') q 
          WHERE  q.products_id = ps.products_id 
                 AND ps.products_stock_attributes = 
                     Concat('1-', pav.products_options_values_id) 
                 AND ps.products_stock_quantity > 0 
          ORDER  BY pav.products_options_values_sort_order ASC
      

      Attachments

        Activity

          danblack Daniel Black added a comment -

          Can you upload your zip file to the ftp server - https://mariadb.com/kb/en/meta/ftp/

          Also try writing moving your subqueries into JOIN expressions.

          danblack Daniel Black added a comment - Can you upload your zip file to the ftp server - https://mariadb.com/kb/en/meta/ftp/ Also try writing moving your subqueries into JOIN expressions.
          Yo Ami added a comment -

          Hi, Daniel,
          The issue is not with the query (although it might be able to optimize it as you suggested and/or another way, but that's a different topic entirely).
          The issue is with the database server, as explained in my post above. I posted the query only so that you can reproduce the database issue I am seeing.
          This is my client's database, thus I am not at liberty to upload this database to a public location, otherwise I would have uploaded it as attachment here.
          I am going to send you a download link to your email address danielgb [at] au.ibm.com, please keep the file private. If anyone else wants the file, I will send it the same way to him/her.

          Thank you

          Yo Ami added a comment - Hi, Daniel, The issue is not with the query (although it might be able to optimize it as you suggested and/or another way, but that's a different topic entirely). The issue is with the database server, as explained in my post above. I posted the query only so that you can reproduce the database issue I am seeing. This is my client's database, thus I am not at liberty to upload this database to a public location, otherwise I would have uploaded it as attachment here. I am going to send you a download link to your email address danielgb [at] au.ibm.com, please keep the file private. If anyone else wants the file, I will send it the same way to him/her. Thank you
          Yo Ami added a comment -

          email sent to you, Daniel. Thanks again.

          Yo Ami added a comment - email sent to you, Daniel. Thanks again.
          danblack Daniel Black added a comment -

          ~Yo, the ftp uploads to the private directory isn't a public location. Its accessible to a few mariadb core developers only (i.e. not me).

          Looking closer at the query I does look entire possible that the two subqueries can be removed and indeed the DISTINCT on the first subquery isn't needed. As its your client I suggest you help them with that to give them a fast solution now (probably faster than the fast version now).

          I'm not convinced its a different topic as its a bit unfair for a server to always optimize a bad query correctly. It seems you have got luckly with some aspect of statistics sampling so far.

          Can you include the "SHOW INDEX FROM

          {table}

          " here for the fast/slow version in text form? This could explain the difference in query plan.

          danblack Daniel Black added a comment - ~Yo, the ftp uploads to the private directory isn't a public location. Its accessible to a few mariadb core developers only (i.e. not me). Looking closer at the query I does look entire possible that the two subqueries can be removed and indeed the DISTINCT on the first subquery isn't needed. As its your client I suggest you help them with that to give them a fast solution now (probably faster than the fast version now). I'm not convinced its a different topic as its a bit unfair for a server to always optimize a bad query correctly. It seems you have got luckly with some aspect of statistics sampling so far. Can you include the "SHOW INDEX FROM {table} " here for the fast/slow version in text form? This could explain the difference in query plan.
          Yo Ami added a comment - - edited

          Thanks Daniel,
          I have uploaded the file to the ftp (private folder).
          As for the query, again, this is just a way to reproduce the issue. If you read my post very thoroughly and the steps to reproduce - you will see that there is no reason for the optimizer to work differently, it seems to be a server bug.
          I encourage you to test my data and try to reproduce the issue the same way I did, you will see what I mean.

          Thanks again

          Yo Ami added a comment - - edited Thanks Daniel, I have uploaded the file to the ftp (private folder). As for the query, again, this is just a way to reproduce the issue. If you read my post very thoroughly and the steps to reproduce - you will see that there is no reason for the optimizer to work differently, it seems to be a server bug. I encourage you to test my data and try to reproduce the issue the same way I did, you will see what I mean. Thanks again
          Yo Ami added a comment -

          I am also able to reproduce the issue easily in windows 7 x64, on XAMPP 3.2.2 (which is using MariaDB 10.1.10) and WAMP 2.4 (which is using 5.6.12-log - MySQL Community Server (GPL)).
          Note that I am using phpMyAdmin for all steps. To reproduce:
          1) Import the database.
          2) Run the above query (it should run fast) and time it.
          3) Run OPTIMIZE (or any other command I listed above) command on products_description table.
          4) Run the above query again, this time it should run much much slower.
          5) Change the table products_description checksum to 1 and back to 0
          6) Run the above query, it now should run as fast as it did in step 2.

          Again, all tests were done through phpMyAdmin, I did not test using the command line.

          Yo Ami added a comment - I am also able to reproduce the issue easily in windows 7 x64, on XAMPP 3.2.2 (which is using MariaDB 10.1.10) and WAMP 2.4 (which is using 5.6.12-log - MySQL Community Server (GPL)). Note that I am using phpMyAdmin for all steps. To reproduce: 1) Import the database. 2) Run the above query (it should run fast) and time it. 3) Run OPTIMIZE (or any other command I listed above) command on products_description table. 4) Run the above query again, this time it should run much much slower. 5) Change the table products_description checksum to 1 and back to 0 6) Run the above query, it now should run as fast as it did in step 2. Again, all tests were done through phpMyAdmin, I did not test using the command line.
          elenst Elena Stepanova added a comment - - edited

          Reproducible as described.
          Instead of altering the table back and forth, ALTER TABLE products_description FORCE can be used.
          As a workaround, SELECT STRAIGHT_JOIN ... seems to work.

          The difference in plan between slow and fast execution:

          --- slow_plan.sql       2016-04-22 16:17:43.224080338 +0300
          +++ fast_plan.sql       2016-04-22 16:18:25.179596410 +0300
          @@ -1,11 +1,11 @@
           id     select_type     table   type    possible_keys   key     key_len ref     rows    filtered        Extra
           1      PRIMARY pav     ALL     NULL    NULL    NULL    NULL    70      100.00  Using temporary; Using filesort
          -1      PRIMARY <derived2>      ALL     NULL    NULL    NULL    NULL    55242   100.00  Distinct; Using join buffer (flat, BNL join)
          -1      PRIMARY ps      eq_ref  idx_products_stock_attributes   idx_products_stock_attributes   771     q.products_id,func      1       100.00  Using index condition; Using where; Distinct
          +1      PRIMARY ps      ALL     idx_products_stock_attributes   NULL    NULL    NULL    55266   75.00   Using where; Distinct; Using join buffer (flat, BNL join)
          +1      PRIMARY <derived2>      ref     key0    key0    4       test.ps.products_id     959     100.00  Distinct
           2      DERIVED pa      ALL     idx_products_attributes_products_id     NULL    NULL    NULL    55242   100.00  Using where; Using temporary
           2      DERIVED pov     eq_ref  PRIMARY PRIMARY 8       test.pa.options_values_id,const 1       100.00  Using index; Distinct
           2      DERIVED p       eq_ref  PRIMARY,idx_products_date_added,idx_products_status     PRIMARY 4       test.pa.products_id     1       100.00  Using where; Distinct
           2      DERIVED s       ref     idx_specials_products_id        idx_specials_products_id        4       test.pa.products_id     1       100.00  Using index; Distinct
          -2      DERIVED pd      ref     PRIMARY PRIMARY 4       test.pa.products_id     1       100.00  Using index; Distinct
          +2      DERIVED pd      ref     PRIMARY PRIMARY 4       test.pa.products_id     960     100.00  Using index; Distinct
           Level  Code    Message
          -Note   1003    select distinct `test`.`pav`.`products_options_values_id` AS `products_options_values_id`,`test`.`pav`.`products_options_values_name` AS `products_options_values_name`,`test`.`pav`.`products_options_values_sort_order` AS `products_options_values_sort_order` from `test`.`products_stock` `ps` join `test`.`products_options_values` `pav` join (select distinct `test`.`pa`.`products_id` AS `products_id` from `test`.`products_attributes` `pa` join `test`.`products_options_values` `pov` join `test`.`products` `p` left join `test`.`specials` `s` on((`test`.`s`.`products_id` = `test`.`pa`.`products_id`)) left join `test`.`products_description` `pd` on((`test`.`pd`.`products_id` = `test`.`pa`.`products_id`)) where ((`test`.`p`.`products_id` = `test`.`pa`.`products_id`) and (`test`.`pa`.`options_id` = 1) and (`test`.`pov`.`products_options_values_id` = `test`.`pa`.`options_values_id`) and (`test`.`pov`.`language_id` = '1') and (`test`.`p`.`products_status` = '1') and (<cache>(('2016-04-19' - interval 7000 day)) <= `test`.`p`.`products_date_added`))) `q` where ((`test`.`ps`.`products_id` = `q`.`products_id`) and (`test`.`ps`.`products_stock_attributes` = concat('1-',`test`.`pav`.`products_options_values_id`)) and (`test`.`ps`.`products_stock_quantity` > 0)) order by `test`.`pav`.`products_options_values_sort_order`
          +Note   1003    select distinct `test`.`pav`.`products_options_values_id` AS `products_options_values_id`,`test`.`pav`.`products_options_values_name` AS `products_options_values_name`,`test`.`pav`.`products_options_values_sort_order` AS `products_options_values_sort_order` from `test`.`products_stock` `ps` join `test`.`products_options_values` `pav` join (select distinct `test`.`pa`.`products_id` AS `products_id` from `test`.`products_attributes` `pa` join `test`.`products_options_values` `pov` join `test`.`products` `p` left join `test`.`specials` `s` on((`test`.`s`.`products_id` = `test`.`pa`.`products_id`)) left join `test`.`products_description` `pd` on((`test`.`pd`.`products_id` = `test`.`pa`.`products_id`)) where ((`test`.`p`.`products_id` = `test`.`pa`.`products_id`) and (`test`.`pa`.`options_id` = 1) and (`test`.`pov`.`products_options_values_id` = `test`.`pa`.`options_values_id`) and (`test`.`pov`.`language_id` = '1') and (`test`.`p`.`products_status` = '1') and (<cache>(('2016-04-19' - interval 7000 day)) <= `test`.`p`.`products_date_added`))) `q` where ((`q`.`products_id` = `test`.`ps`.`products_id`) and (`test`.`ps`.`products_stock_attributes` = concat('1-',`test`.`pav`.`products_options_values_id`)) and (`test`.`ps`.`products_stock_quantity` > 0)) order by `test`.`pav`.`products_options_values_sort_order`
          

          Also reproducible on MySQL 5.7 (execution time and maybe plans are different there, but the idea is the same - fast before ANALYZE, slow after ANALYZE).

          elenst Elena Stepanova added a comment - - edited Reproducible as described. Instead of altering the table back and forth, ALTER TABLE products_description FORCE can be used. As a workaround, SELECT STRAIGHT_JOIN ... seems to work. The difference in plan between slow and fast execution: --- slow_plan.sql 2016-04-22 16:17:43.224080338 +0300 +++ fast_plan.sql 2016-04-22 16:18:25.179596410 +0300 @@ -1,11 +1,11 @@ id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY pav ALL NULL NULL NULL NULL 70 100.00 Using temporary; Using filesort -1 PRIMARY <derived2> ALL NULL NULL NULL NULL 55242 100.00 Distinct; Using join buffer (flat, BNL join) -1 PRIMARY ps eq_ref idx_products_stock_attributes idx_products_stock_attributes 771 q.products_id,func 1 100.00 Using index condition; Using where; Distinct +1 PRIMARY ps ALL idx_products_stock_attributes NULL NULL NULL 55266 75.00 Using where; Distinct; Using join buffer (flat, BNL join) +1 PRIMARY <derived2> ref key0 key0 4 test.ps.products_id 959 100.00 Distinct 2 DERIVED pa ALL idx_products_attributes_products_id NULL NULL NULL 55242 100.00 Using where; Using temporary 2 DERIVED pov eq_ref PRIMARY PRIMARY 8 test.pa.options_values_id,const 1 100.00 Using index; Distinct 2 DERIVED p eq_ref PRIMARY,idx_products_date_added,idx_products_status PRIMARY 4 test.pa.products_id 1 100.00 Using where; Distinct 2 DERIVED s ref idx_specials_products_id idx_specials_products_id 4 test.pa.products_id 1 100.00 Using index; Distinct -2 DERIVED pd ref PRIMARY PRIMARY 4 test.pa.products_id 1 100.00 Using index; Distinct +2 DERIVED pd ref PRIMARY PRIMARY 4 test.pa.products_id 960 100.00 Using index; Distinct Level Code Message -Note 1003 select distinct `test`.`pav`.`products_options_values_id` AS `products_options_values_id`,`test`.`pav`.`products_options_values_name` AS `products_options_values_name`,`test`.`pav`.`products_options_values_sort_order` AS `products_options_values_sort_order` from `test`.`products_stock` `ps` join `test`.`products_options_values` `pav` join (select distinct `test`.`pa`.`products_id` AS `products_id` from `test`.`products_attributes` `pa` join `test`.`products_options_values` `pov` join `test`.`products` `p` left join `test`.`specials` `s` on((`test`.`s`.`products_id` = `test`.`pa`.`products_id`)) left join `test`.`products_description` `pd` on((`test`.`pd`.`products_id` = `test`.`pa`.`products_id`)) where ((`test`.`p`.`products_id` = `test`.`pa`.`products_id`) and (`test`.`pa`.`options_id` = 1) and (`test`.`pov`.`products_options_values_id` = `test`.`pa`.`options_values_id`) and (`test`.`pov`.`language_id` = '1') and (`test`.`p`.`products_status` = '1') and (<cache>(('2016-04-19' - interval 7000 day)) <= `test`.`p`.`products_date_added`))) `q` where ((`test`.`ps`.`products_id` = `q`.`products_id`) and (`test`.`ps`.`products_stock_attributes` = concat('1-',`test`.`pav`.`products_options_values_id`)) and (`test`.`ps`.`products_stock_quantity` > 0)) order by `test`.`pav`.`products_options_values_sort_order` +Note 1003 select distinct `test`.`pav`.`products_options_values_id` AS `products_options_values_id`,`test`.`pav`.`products_options_values_name` AS `products_options_values_name`,`test`.`pav`.`products_options_values_sort_order` AS `products_options_values_sort_order` from `test`.`products_stock` `ps` join `test`.`products_options_values` `pav` join (select distinct `test`.`pa`.`products_id` AS `products_id` from `test`.`products_attributes` `pa` join `test`.`products_options_values` `pov` join `test`.`products` `p` left join `test`.`specials` `s` on((`test`.`s`.`products_id` = `test`.`pa`.`products_id`)) left join `test`.`products_description` `pd` on((`test`.`pd`.`products_id` = `test`.`pa`.`products_id`)) where ((`test`.`p`.`products_id` = `test`.`pa`.`products_id`) and (`test`.`pa`.`options_id` = 1) and (`test`.`pov`.`products_options_values_id` = `test`.`pa`.`options_values_id`) and (`test`.`pov`.`language_id` = '1') and (`test`.`p`.`products_status` = '1') and (<cache>(('2016-04-19' - interval 7000 day)) <= `test`.`p`.`products_date_added`))) `q` where ((`q`.`products_id` = `test`.`ps`.`products_id`) and (`test`.`ps`.`products_stock_attributes` = concat('1-',`test`.`pav`.`products_options_values_id`)) and (`test`.`ps`.`products_stock_quantity` > 0)) order by `test`.`pav`.`products_options_values_sort_order` Also reproducible on MySQL 5.7 (execution time and maybe plans are different there, but the idea is the same - fast before ANALYZE, slow after ANALYZE).
          Yo Ami added a comment -

          Thanks, Elena. Needing a bug fix tho, not temporary workarounds to modify queries, as this bug possibly affects entire application and entire applications worldwide...

          Yo Ami added a comment - Thanks, Elena. Needing a bug fix tho, not temporary workarounds to modify queries, as this bug possibly affects entire application and entire applications worldwide...

          Discussing with varun.

          • Removed ORDER BY and DISTINCT clauses from the top-level SELECT. The slowdown can still be observed.
          psergei Sergei Petrunia added a comment - Discussing with varun . Removed ORDER BY and DISTINCT clauses from the top-level SELECT. The slowdown can still be observed.

          The slow query plan without order by and distinct in the top select

          +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+
          | id   | select_type | table      | type   | possible_keys                                       | key                           | key_len | ref                             | rows  | Extra                              |
          +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+
          |    1 | PRIMARY     | pav        | ALL    | NULL                                                | NULL                          | NULL    | NULL                            |    70 |                                    |
          |    1 | PRIMARY     | <derived2> | ALL    | NULL                                                | NULL                          | NULL    | NULL                            | 55242 | Using join buffer (flat, BNL join) |
          |    1 | PRIMARY     | ps         | eq_ref | idx_products_stock_attributes                       | idx_products_stock_attributes | 771     | q.products_id,func              |     1 | Using where; Using index           |
          |    2 | DERIVED     | pa         | ALL    | idx_products_attributes_products_id                 | NULL                          | NULL    | NULL                            | 55242 | Using where; Using temporary       |
          |    2 | DERIVED     | pov        | eq_ref | PRIMARY                                             | PRIMARY                       | 8       | test.pa.options_values_id,const |     1 | Using index; Distinct              |
          |    2 | DERIVED     | p          | eq_ref | PRIMARY,idx_products_date_added,idx_products_status | PRIMARY                       | 4       | test.pa.products_id             |     1 | Using where; Distinct              |
          |    2 | DERIVED     | s          | ref    | idx_specials_products_id                            | idx_specials_products_id      | 4       | test.pa.products_id             |     1 | Using index; Distinct              |
          |    2 | DERIVED     | pd         | ref    | PRIMARY                                             | PRIMARY                       | 4       | test.pa.products_id             |     1 | Using index; Distinct              |
          +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+
          

          varun Varun Gupta (Inactive) added a comment - The slow query plan without order by and distinct in the top select +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+ | 1 | PRIMARY | pav | ALL | NULL | NULL | NULL | NULL | 70 | | | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 55242 | Using join buffer (flat, BNL join) | | 1 | PRIMARY | ps | eq_ref | idx_products_stock_attributes | idx_products_stock_attributes | 771 | q.products_id,func | 1 | Using where; Using index | | 2 | DERIVED | pa | ALL | idx_products_attributes_products_id | NULL | NULL | NULL | 55242 | Using where; Using temporary | | 2 | DERIVED | pov | eq_ref | PRIMARY | PRIMARY | 8 | test.pa.options_values_id,const | 1 | Using index; Distinct | | 2 | DERIVED | p | eq_ref | PRIMARY,idx_products_date_added,idx_products_status | PRIMARY | 4 | test.pa.products_id | 1 | Using where; Distinct | | 2 | DERIVED | s | ref | idx_specials_products_id | idx_specials_products_id | 4 | test.pa.products_id | 1 | Using index; Distinct | | 2 | DERIVED | pd | ref | PRIMARY | PRIMARY | 4 | test.pa.products_id | 1 | Using index; Distinct | +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+------------------------------------+

          For the fast query

          +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+
          | id   | select_type | table      | type   | possible_keys                                       | key                           | key_len | ref                             | rows  | Extra                                                        |
          +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+
          |    1 | PRIMARY     | pav        | ALL    | NULL                                                | NULL                          | NULL    | NULL                            |    70 |                                                              |
          |    1 | PRIMARY     | ps         | index  | idx_products_stock_attributes                       | idx_products_stock_attributes | 771     | NULL                            | 55266 | Using where; Using index; Using join buffer (flat, BNL join) |
          |    1 | PRIMARY     | <derived2> | ref    | key0                                                | key0                          | 4       | test.ps.products_id             |   959 |                                                              |
          |    2 | DERIVED     | pa         | ALL    | idx_products_attributes_products_id                 | NULL                          | NULL    | NULL                            | 55242 | Using where; Using temporary                                 |
          |    2 | DERIVED     | pov        | eq_ref | PRIMARY                                             | PRIMARY                       | 8       | test.pa.options_values_id,const |     1 | Using index; Distinct                                        |
          |    2 | DERIVED     | p          | eq_ref | PRIMARY,idx_products_date_added,idx_products_status | PRIMARY                       | 4       | test.pa.products_id             |     1 | Using where; Distinct                                        |
          |    2 | DERIVED     | s          | ref    | idx_specials_products_id                            | idx_specials_products_id      | 4       | test.pa.products_id             |     1 | Using index; Distinct                                        |
          |    2 | DERIVED     | pd         | ref    | PRIMARY                                             | PRIMARY                       | 4       | test.pa.products_id             |   960 | Using index; Distinct                                        |
          +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+
          

          varun Varun Gupta (Inactive) added a comment - For the fast query +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+ | 1 | PRIMARY | pav | ALL | NULL | NULL | NULL | NULL | 70 | | | 1 | PRIMARY | ps | index | idx_products_stock_attributes | idx_products_stock_attributes | 771 | NULL | 55266 | Using where; Using index; Using join buffer (flat, BNL join) | | 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | test.ps.products_id | 959 | | | 2 | DERIVED | pa | ALL | idx_products_attributes_products_id | NULL | NULL | NULL | 55242 | Using where; Using temporary | | 2 | DERIVED | pov | eq_ref | PRIMARY | PRIMARY | 8 | test.pa.options_values_id,const | 1 | Using index; Distinct | | 2 | DERIVED | p | eq_ref | PRIMARY,idx_products_date_added,idx_products_status | PRIMARY | 4 | test.pa.products_id | 1 | Using where; Distinct | | 2 | DERIVED | s | ref | idx_specials_products_id | idx_specials_products_id | 4 | test.pa.products_id | 1 | Using index; Distinct | | 2 | DERIVED | pd | ref | PRIMARY | PRIMARY | 4 | test.pa.products_id | 960 | Using index; Distinct | +------+-------------+------------+--------+-----------------------------------------------------+-------------------------------+---------+---------------------------------+-------+--------------------------------------------------------------+
          psergei Sergei Petrunia added a comment - - edited

          Observations

          • The derived table (id=2) uses the same query plan
            • except for rows in the last table, pd. Slow plan: rows=1, Fast plan: rows=960.
          • Top-level SELECT has different join orders:
            • The slow one has pav, <derived2>, ps
            • the fast one has pav, ps, <derived2>
          psergei Sergei Petrunia added a comment - - edited Observations The derived table (id=2) uses the same query plan except for rows in the last table, pd . Slow plan: rows=1 , Fast plan: rows=960 . Top-level SELECT has different join orders: The slow one has pav, <derived2>, ps the fast one has pav, ps, <derived2>

          Analyze format=json output for the slow query
          https://gist.github.com/anonymous/ee09ae49ec9ca91105fa18b73d11ad7c

          Analyze format=json output for the fast query
          https://gist.github.com/anonymous/4c447813f44a0892339b569266dc9d44

          varun Varun Gupta (Inactive) added a comment - Analyze format=json output for the slow query https://gist.github.com/anonymous/ee09ae49ec9ca91105fa18b73d11ad7c Analyze format=json output for the fast query https://gist.github.com/anonymous/4c447813f44a0892339b569266dc9d44

          Looking at the above

          Query times

          • fast query: 3.7 sec
          • slow query: 19.8 sec

          query plans:

          • In both queries, materialized/r_total_time_ms is 800-900 msec.
          • Comparing rows and r_rows one can see that the slow query has a better rows estimate for table pd ( 1 vs 960).
          • There is a 2x difference in filtered vs r_filtered for table p (3rd in the subquery). I don't think this is the deciding factor.

          The slow query plan spends most of its time (18 sec. out of 20) reading table ps:

          "table": {
                "table_name": "ps",
                "access_type": "eq_ref",
                "possible_keys": ["idx_products_stock_attributes"],
                "key": "idx_products_stock_attributes",
                "key_length": "771",
                "used_key_parts": ["products_id", "products_stock_attributes"],
                "ref": ["q.products_id", "func"],
                "r_loops": 289730,
                "rows": 1,
                "r_rows": 0.0992,
                "r_total_time_ms": 18315,
          

          psergei Sergei Petrunia added a comment - Looking at the above Query times fast query: 3.7 sec slow query: 19.8 sec query plans: In both queries, materialized/r_total_time_ms is 800-900 msec. Comparing rows and r_rows one can see that the slow query has a better rows estimate for table pd ( 1 vs 960). There is a 2x difference in filtered vs r_filtered for table p (3rd in the subquery). I don't think this is the deciding factor. The slow query plan spends most of its time (18 sec. out of 20) reading table ps : "table": { "table_name": "ps", "access_type": "eq_ref", "possible_keys": ["idx_products_stock_attributes"], "key": "idx_products_stock_attributes", "key_length": "771", "used_key_parts": ["products_id", "products_stock_attributes"], "ref": ["q.products_id", "func"], "r_loops": 289730, "rows": 1, "r_rows": 0.0992, "r_total_time_ms": 18315,

                "r_loops": 289730,
          

          This comes from

          #r_rows(pav) *  #r_rows(<derived2>)  = 70 * 4139 =289,730
          

          r_rows for pav matches the estimate, r_rows for <derived2> is actually lower than the estimate:

              "block-nl-join": {
                "table": {
                  "table_name": "<derived2>",
                  "access_type": "ALL",
                  "r_loops": 1,
                  "rows": 55242,
                  "r_rows": 4139,
          

          So we are actually doing fewer lookups than the optimizer should have expected.

          psergei Sergei Petrunia added a comment - "r_loops": 289730, This comes from #r_rows(pav) * #r_rows(<derived2>) = 70 * 4139 =289,730 r_rows for pav matches the estimate, r_rows for <derived2> is actually lower than the estimate: "block-nl-join": { "table": { "table_name": "<derived2>", "access_type": "ALL", "r_loops": 1, "rows": 55242, "r_rows": 4139, So we are actually doing fewer lookups than the optimizer should have expected.

          289730 lookups per 18 seconds is 16,096 lookups per sec. Not fast but note the key_length=771.

          psergei Sergei Petrunia added a comment - 289730 lookups per 18 seconds is 16,096 lookups per sec. Not fast but note the key_length=771.
          psergei Sergei Petrunia added a comment - - edited

          Looking at the fast query plan, one can see that the table ps doesn't have many rows:

              "block-nl-join": {
                "table": {
                  "table_name": "ps",
                  "access_type": "index",
                  "possible_keys": ["idx_products_stock_attributes"],
                  "key": "idx_products_stock_attributes",
                  "key_length": "771",
                  "used_key_parts": ["products_id", "products_stock_attributes"],
                  "r_loops": 1,
                  "rows": 55266,
                  "r_rows": 55266,
                  "r_total_time_ms": 64.807,
                  "filtered": 75.001,
                  "r_filtered": 100,
                  "using_index": true
          

          Just 55K Rows in total.

          psergei Sergei Petrunia added a comment - - edited Looking at the fast query plan, one can see that the table ps doesn't have many rows: "block-nl-join": { "table": { "table_name": "ps", "access_type": "index", "possible_keys": ["idx_products_stock_attributes"], "key": "idx_products_stock_attributes", "key_length": "771", "used_key_parts": ["products_id", "products_stock_attributes"], "r_loops": 1, "rows": 55266, "r_rows": 55266, "r_total_time_ms": 64.807, "filtered": 75.001, "r_filtered": 100, "using_index": true Just 55K Rows in total.

          Need to debug the join optimization process in the top-level query.

          psergei Sergei Petrunia added a comment - Need to debug the join optimization process in the top-level query.
          varun Varun Gupta (Inactive) added a comment - - edited Observations for fast query https://gist.github.com/anonymous/6e7d980971036b3bf1affd4dcbf9054e Observations for slow query https://gist.github.com/anonymous/1f29eb63bd1f70484c7acc5445628459
          psergei Sergei Petrunia added a comment - - edited

          == Slow join order ==
          1. pav
             records=70
             cost=2.47
           
             current_record_count=70
             current_read_time= 0 + 2.47 + 14= 16.47
           
          2. pav, q
             record_count=7
             
             s->found_records=55,242
             tmp= s->scan_time() = 55,242
             // join buffer is filled 1 time
             
             cost= 55,242
             records=55,242  // NOTE: actual number is about 4K. 
           
             current_record_count= 70 * 55242= 3,866,940
             current_read_time = 16.47 + 3.8M/5 = 828,646
           
          3. pav, q, ps
             record_count=3,866,940
            //note: the actual number (from ANALYZE) is  290,430
            // which is equal to 70*4149 (which means pav-q do a cross join)
             
             records= 1.0 // eq_ref
             read_time= prev_record_reads(...)= 3,866,940
             // note: the table has 55,266 rows in total
           
             current_record_count= 1*3.8M =  3,866,940
             current_read_time= 828,646 + 3.8M/5 = 5,468,974
          

          == Fast join order ==
          1. pav
           
          2. pav, ps
             record_count=70
             
             // table has 55,266 records
             read_time= table->scan_time()=278.
             // join buffer is filled 1 time
             cost=3041 // due to (s->records - rnd_records) addition
             records= 41,450
           
             current_record_count= 70 * 41450= 2,901,500
             current_read_time = 16.47 + 3041 + 2.9M/5 = 583,358
           
          3. pav, ps, q
             record_count=2,901,500
             // table q has 55242 records 
            // note: the actual number (from ANALYZE) is 16,667 (*)
             
             records=10 
             // ^ this is computed as ref access estimate for the case when no
             //statistics is available.  (the real number from ANALYZE is 0.9999)
             
             tmp = record_count* file->read_time(1, 10 rows)= record_count * 1.5 =
              2,901,500 * 1.5 = 4,352,250
           
             current_record_count= 2,901,500 * 10 = 29,016,050
             current_read_time= 583,358 + 4,352,250 + 29M/5 = 10,738,818
          

          psergei Sergei Petrunia added a comment - - edited == Slow join order == 1. pav records=70 cost=2.47   current_record_count=70 current_read_time= 0 + 2.47 + 14= 16.47   2. pav, q record_count=7 s->found_records=55,242 tmp= s->scan_time() = 55,242 // join buffer is filled 1 time cost= 55,242 records=55,242 // NOTE: actual number is about 4K.   current_record_count= 70 * 55242= 3,866,940 current_read_time = 16.47 + 3.8M/5 = 828,646   3. pav, q, ps record_count=3,866,940 //note: the actual number (from ANALYZE) is 290,430 // which is equal to 70*4149 (which means pav-q do a cross join) records= 1.0 // eq_ref read_time= prev_record_reads(...)= 3,866,940 // note: the table has 55,266 rows in total   current_record_count= 1*3.8M = 3,866,940 current_read_time= 828,646 + 3.8M/5 = 5,468,974 == Fast join order == 1. pav   2. pav, ps record_count=70 // table has 55,266 records read_time= table->scan_time()=278. // join buffer is filled 1 time cost=3041 // due to (s->records - rnd_records) addition records= 41,450   current_record_count= 70 * 41450= 2,901,500 current_read_time = 16.47 + 3041 + 2.9M/5 = 583,358   3. pav, ps, q record_count=2,901,500 // table q has 55242 records // note: the actual number (from ANALYZE) is 16,667 (*) records=10 // ^ this is computed as ref access estimate for the case when no //statistics is available. (the real number from ANALYZE is 0.9999) tmp = record_count* file->read_time(1, 10 rows)= record_count * 1.5 = 2,901,500 * 1.5 = 4,352,250   current_record_count= 2,901,500 * 10 = 29,016,050 current_read_time= 583,358 + 4,352,250 + 29M/5 = 10,738,818
          psergei Sergei Petrunia added a comment - - edited

          ANALYZE FORMAT=JSON for the fast query plan (forced it to be chosen in debugger): https://gist.github.com/spetrunia/b3f686ca8a8451255529fa76995287eb

          The slow plan, miscalculation 1

          heavily over-estimates #rows in the temp. table q (estimate is 55K, reality is 4K). This also causes the number of reads from table ps to be >10x higher (3.8M vs 290K in reality)

          The fast plan, miscalculation 1

          Heavily over-estimates cardinality of pav JOIN ps.

                "table": {
                  "table_name": "ps",
                  "access_type": "ALL",
                  "possible_keys": ["idx_products_stock_attributes"],
                  "r_loops": 1,
                  "rows": 55266,
                  "r_rows": 55266,
                  "r_total_time_ms": 25.906,
                  "filtered": 75.001,
                  "r_filtered": 30.176,
                  "attached_condition": "(ps.products_stock_quantity > 0)"
          

          2x wrong estimate on selectivity of a simple condition (Okay per-se but it adds up with others)

          The fast plan, miscalculation 2

          The second:

              "block-nl-join": {
                "table": {
                  "table_name": "ps",
          ...
                },
                "buffer_type": "flat",
                "buffer_size": "256Kb",
                "join_type": "BNL",
                "attached_condition": "(ps.products_stock_attributes = convert(concat('1-',pav.products_options_values_id) using utf8))",
                "r_filtered": 1.4286
              },
          

          Here r_filtered=1.4%.
          Note that filtered is not even present as the optimizer doesn't try to account for the selectivity of this post-join condition.

          The result is that for the derived table we have:
          (from debug log, (should ANALYZE include this number?) ):

           record_count=2,901,500
          

          from ANALYZE:

          "r_loops": 16677,
          

          The fast plan, miscalculation 3

          There's more:

              "table": {
                "table_name": "<derived2>",
                "access_type": "ref",
                "possible_keys": ["key0"],
                "key": "key0",
                "key_length": "4",
                "used_key_parts": ["products_id"],
                "ref": ["j2.ps.products_id"],
                "r_loops": 16677,
                "rows": 10,
                "r_rows": 0.9999,
          

          Note rows vs r_rows. Also 10x over-estimation. The estimate of 10 comes from this code in best_access_path:

                        if (!(records= keyinfo->actual_rec_per_key(key_parts-1)))
                        {                                   /* Prefer longer keys */
                          records=
                            ((double) s->records / (double) rec *
                             (1.0 +
                              ((double) (table->s->max_key_length-keyinfo->key_length) /
                               (double) table->s->max_key_length)));
                          if (records < 2.0)
                            records=2.0;               /* Can't be as good as a unique */
                        }
          

          psergei Sergei Petrunia added a comment - - edited ANALYZE FORMAT=JSON for the fast query plan (forced it to be chosen in debugger): https://gist.github.com/spetrunia/b3f686ca8a8451255529fa76995287eb The slow plan, miscalculation 1 heavily over-estimates #rows in the temp. table q (estimate is 55K, reality is 4K). This also causes the number of reads from table ps to be >10x higher (3.8M vs 290K in reality) The fast plan, miscalculation 1 Heavily over-estimates cardinality of pav JOIN ps . "table": { "table_name": "ps", "access_type": "ALL", "possible_keys": ["idx_products_stock_attributes"], "r_loops": 1, "rows": 55266, "r_rows": 55266, "r_total_time_ms": 25.906, "filtered": 75.001, "r_filtered": 30.176, "attached_condition": "(ps.products_stock_quantity > 0)" 2x wrong estimate on selectivity of a simple condition (Okay per-se but it adds up with others) The fast plan, miscalculation 2 The second: "block-nl-join": { "table": { "table_name": "ps", ... }, "buffer_type": "flat", "buffer_size": "256Kb", "join_type": "BNL", "attached_condition": "(ps.products_stock_attributes = convert(concat('1-',pav.products_options_values_id) using utf8))", "r_filtered": 1.4286 }, Here r_filtered=1.4% . Note that filtered is not even present as the optimizer doesn't try to account for the selectivity of this post-join condition. The result is that for the derived table we have: (from debug log, (should ANALYZE include this number?) ): record_count=2,901,500 from ANALYZE: "r_loops": 16677, The fast plan, miscalculation 3 There's more: "table": { "table_name": "<derived2>", "access_type": "ref", "possible_keys": ["key0"], "key": "key0", "key_length": "4", "used_key_parts": ["products_id"], "ref": ["j2.ps.products_id"], "r_loops": 16677, "rows": 10, "r_rows": 0.9999, Note rows vs r_rows . Also 10x over-estimation. The estimate of 10 comes from this code in best_access_path: if (!(records= keyinfo->actual_rec_per_key(key_parts-1))) { /* Prefer longer keys */ records= ((double) s->records / (double) rec * (1.0 + ((double) (table->s->max_key_length-keyinfo->key_length) / (double) table->s->max_key_length))); if (records < 2.0) records=2.0; /* Can't be as good as a unique */ }
          psergei Sergei Petrunia added a comment - - edited

          "The fast plan, miscalculation 2" has the biggest impact. However it is rather hard to make the optimizer take this into account.

          The condition is

           ps.products_stock_attributes = Concat('1-', pav.products_options_values_id) 
          

          The table has the column as second part of a unique key:

          Create Table: CREATE TABLE `products_stock` (
            `products_stock_id` int(11) NOT NULL AUTO_INCREMENT,
            `products_id` int(11) NOT NULL DEFAULT '0',
            `products_stock_attributes` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
            `products_stock_quantity` int(11) NOT NULL DEFAULT '0',
            PRIMARY KEY (`products_stock_id`),
            UNIQUE KEY `idx_products_stock_attributes` (`products_id`,`products_stock_attributes`)
          

          products_stock has 55266 rows.

          Most common values:

          select * 
          from (
             select count(*) as CNT 
             from products_stock group 
             by products_stock_attributes
            ) T  
          order by CNT desc limit 10;
          

          +------+
          | CNT  |
          +------+
          | 6980 |
          | 6529 |
          | 6467 |
          | 5095 |
          | 4603 |
          | 3870 |
          ...
          

          So the most popular value has 6980 / 55266.=0.12, 12% of the table.
          We could use the statistics to produce a selectivity estimate.

          psergei Sergei Petrunia added a comment - - edited "The fast plan, miscalculation 2" has the biggest impact. However it is rather hard to make the optimizer take this into account. The condition is ps.products_stock_attributes = Concat('1-', pav.products_options_values_id) The table has the column as second part of a unique key: Create Table: CREATE TABLE `products_stock` ( `products_stock_id` int(11) NOT NULL AUTO_INCREMENT, `products_id` int(11) NOT NULL DEFAULT '0', `products_stock_attributes` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `products_stock_quantity` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`products_stock_id`), UNIQUE KEY `idx_products_stock_attributes` (`products_id`,`products_stock_attributes`) products_stock has 55266 rows. Most common values: select * from ( select count(*) as CNT from products_stock group by products_stock_attributes ) T order by CNT desc limit 10; +------+ | CNT | +------+ | 6980 | | 6529 | | 6467 | | 5095 | | 4603 | | 3870 | ... So the most popular value has 6980 / 55266.=0.12, 12% of the table. We could use the statistics to produce a selectivity estimate.
          psergei Sergei Petrunia added a comment - - edited

          "The fast plan, miscalculation 3" could have been avoided as the derived table is defined as:

                 (SELECT DISTINCT pa.products_id 
                  FROM   products_attributes pa, 
                         products_options_values pov, 
          ...
                 ) q 
          

          Since it's SELECT DISTINCT, we know rows=1 there.
          ... and if I change it to be rows=1 in the debugger, the fast query plan is chosen!

          psergei Sergei Petrunia added a comment - - edited "The fast plan, miscalculation 3" could have been avoided as the derived table is defined as: (SELECT DISTINCT pa.products_id FROM products_attributes pa, products_options_values pov, ... ) q Since it's SELECT DISTINCT, we know rows=1 there. ... and if I change it to be rows=1 in the debugger, the fast query plan is chosen!

          More details about how to use EITS statistics to produce selectivitiy estimates.

          MariaDB [j2]> select *,hex(histogram) from mysql.column_stats where table_name='products_stock' and column_name='products_stock_attributes'\G;
          *************************** 1. row ***************************
                 db_name: j2
              table_name: products_stock
             column_name: products_stock_attributes
               min_value: 1-1
               max_value: 1-8
             nulls_ratio: 0.0000
              avg_length: 3.4370
           avg_frequency: 812.7353
               hist_size: 255
               hist_type: SINGLE_PREC_HB
               histogram:         $$$$$$$$$$$$&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHJmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm��������������������������������������������������������������
          hex(histogram): 0000000000000000010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101242424242424242424242424262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626264848484848484848484848484848484848484848484848484848484848484848484A6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D919191919191919191919191919191919191919191919191B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6DADADADADADADADADADADADAFFFFFFFFFFFFFFFF
          1 row in set (0.00 sec)
          

          Find the longest run of buckets that have the same bound
          Here, the longest run is 12% of total_buckets.
          this means that selectivity of condition

          products_stock_attributes = ...
          

          is at most 0.12.

          psergei Sergei Petrunia added a comment - More details about how to use EITS statistics to produce selectivitiy estimates. MariaDB [j2]> select *,hex(histogram) from mysql.column_stats where table_name='products_stock' and column_name='products_stock_attributes'\G; *************************** 1. row *************************** db_name: j2 table_name: products_stock column_name: products_stock_attributes min_value: 1-1 max_value: 1-8 nulls_ratio: 0.0000 avg_length: 3.4370 avg_frequency: 812.7353 hist_size: 255 hist_type: SINGLE_PREC_HB histogram: $$$$$$$$$$$$&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHJmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm�������������������������������������������������������������� hex(histogram): 0000000000000000010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101242424242424242424242424262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626262626264848484848484848484848484848484848484848484848484848484848484848484A6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D6D919191919191919191919191919191919191919191919191B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6DADADADADADADADADADADADAFFFFFFFFFFFFFFFF 1 row in set (0.00 sec) Find the longest run of buckets that have the same bound Here, the longest run is 12% of total_buckets. this means that selectivity of condition products_stock_attributes = ... is at most 0.12.

          varun, let's make step#1: if a derived table has SELECT DISTINCT, provide index statistics for it so that the join optimizer in the upper select knows that ref access to the table will produce one row.

          psergei Sergei Petrunia added a comment - varun , let's make step#1: if a derived table has SELECT DISTINCT, provide index statistics for it so that the join optimizer in the upper select knows that ref access to the table will produce one row.

          Patch for step #1
          http://lists.askmonty.org/pipermail/commits/2018-March/012123.html

          This patch fixes the slowness issue when we run the same query above after running the alter table on products_description.

          varun Varun Gupta (Inactive) added a comment - Patch for step #1 http://lists.askmonty.org/pipermail/commits/2018-March/012123.html This patch fixes the slowness issue when we run the same query above after running the alter table on products_description.
          • Also for step 1 we need to take care of UNION ALL for derived tables with SELECT DISTINCT, because these can give duplicate entires and for such cases we should not provide index statistics to the join optimizer that ref access to the derived table would produce only one row.
          • INTERSECT and EXCEPT are no problems, because they won't give duplicate entries.
          varun Varun Gupta (Inactive) added a comment - Also for step 1 we need to take care of UNION ALL for derived tables with SELECT DISTINCT, because these can give duplicate entires and for such cases we should not provide index statistics to the join optimizer that ref access to the derived table would produce only one row. INTERSECT and EXCEPT are no problems, because they won't give duplicate entries.

          Patch 2 sent for review( addressed review comments for 1st and added handling for UNION/INTERSECT/EXCEPT.
          http://lists.askmonty.org/pipermail/commits/2018-March/012127.html

          varun Varun Gupta (Inactive) added a comment - Patch 2 sent for review( addressed review comments for 1st and added handling for UNION/INTERSECT/EXCEPT. http://lists.askmonty.org/pipermail/commits/2018-March/012127.html

          Step #2:
          If any field in the select list of the derived tables is present in the group by list also , then we are again guaranteed that ref access to the derived table would always produce one row per key.

          varun Varun Gupta (Inactive) added a comment - Step #2: If any field in the select list of the derived tables is present in the group by list also , then we are again guaranteed that ref access to the derived table would always produce one row per key.
          psergei Sergei Petrunia added a comment - Review feedback: https://lists.launchpad.net/maria-developers/msg11151.html
          varun Varun Gupta (Inactive) added a comment - Patch for step 1: http://lists.askmonty.org/pipermail/commits/2018-March/012147.html Patch for step 2: http://lists.askmonty.org/pipermail/commits/2018-March/012148.html

          Also the query in this MDEV performs fast irrespective of using the ANALYZE command on the product_description table

          varun Varun Gupta (Inactive) added a comment - Also the query in this MDEV performs fast irrespective of using the ANALYZE command on the product_description table
          Yo Ami added a comment -

          It's been a year since this issue has been changed status to "in review". Is there any progress since then?

          Thank you

          Yo Ami added a comment - It's been a year since this issue has been changed status to "in review". Is there any progress since then? Thank you

          Let me look at the patches again

          psergei Sergei Petrunia added a comment - Let me look at the patches again

          The "step1" patch will need adjustment, review email sent.

          psergei Sergei Petrunia added a comment - The "step1" patch will need adjustment, review email sent.

          Wait, the review sent yesterday was incomplete and the comment there totally missed the point.

          I'm working on the updated version.

          psergei Sergei Petrunia added a comment - Wait, the review sent yesterday was incomplete and the comment there totally missed the point. I'm working on the updated version.

          Review feedback for part#1 sent. The patch will need to be changed.

          psergei Sergei Petrunia added a comment - Review feedback for part#1 sent. The patch will need to be changed.
          varun Varun Gupta (Inactive) added a comment - Another attempt for part #1 http://lists.askmonty.org/pipermail/commits/2019-April/013664.html
          varun Varun Gupta (Inactive) added a comment - Another attempt for part #2 http://lists.askmonty.org/pipermail/commits/2019-April/013674.html

          Review input provided for step 2.

          psergei Sergei Petrunia added a comment - Review input provided for step 2.

          As discuss with psergey, we would just push the patch for part #1 , for part #2 we should wait https://jira.mariadb.org/browse/MDEV-11588 and then use that to give correct statistics

          varun Varun Gupta (Inactive) added a comment - As discuss with psergey , we would just push the patch for part #1 , for part #2 we should wait https://jira.mariadb.org/browse/MDEV-11588 and then use that to give correct statistics

          People

            varun Varun Gupta (Inactive)
            Yo Ami
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.