[MDEV-10452] Poor performance after upgrade from 10.1.14 to 10.1.16 Created: 2016-07-27  Updated: 2016-07-28  Resolved: 2016-07-28

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.1.16
Fix Version/s: 10.1.16

Type: Bug Priority: Major
Reporter: Josep Sanz Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: 10.1.16, performance, query
Environment:

This problem was detected in a CentOS 7 and Fedora 23 environment


Attachments: Text File explain-10.1.14.log     Text File explain-10.1.16-v2.log     Text File explain-10.1.16.log     File query.sql     Text File saltos-10.1.14.log     Text File saltos-10.1.16-v2.log     Text File saltos-10.1.16.log     File schema.sql     File used-size.pdf    

 Description   

After the upgrade from the 10.1.14 to 10.1.16, I have detected that some queries need a lot of time to be executed.

When I have downgraded to the 10.1.14, I have checked that the problem has dissapeared. I have executed the mysql_upgrade after each upgrade.

I attach some files that I expect that help you to understand the problem and fix it.

  • saltos-10.1.14.log => contains the log of all queries executed by SaltOS to get the list page - using a MariaDB 10.1.14.
  • saltos-10.1.16.log => contains the log of all queries executed by SaltOS to get the list page using a MariaDB 10.1.16.
  • query.sql => contains the conflictive query!!!
  • explain-10.1.14.log => contains the explain extended command for the previous query using a MariaDB 10.1.14.
  • explain-10.1.16.log => contains the explain extended command for the same query using a MariaDB 10.1.16.
  • schema.sql => contains the schema used by the SaltOS's database.
  • used-size.pdf => contains the real used size to understand the problem.


 Comments   
Comment by Elena Stepanova [ 2016-07-27 ]

Thanks for the report.

Comment by Sergei Petrunia [ 2016-07-27 ]

Looking at the attached files...

The difference between query plans is only the first two tables.
In 10.1.14, the order was "a,e", in 10.16 it is "e,a".

The query has "ORDER BY a.id desc LIMIT 0,50"

10.1.14:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: a
         type: index
possible_keys: PRIMARY,correos6
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 4
     filtered: 100.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: e
         type: ref
possible_keys: id_usuario,registros,registros3
          key: registros3
      key_len: 12
          ref: const,saltos.a.id,const
         rows: 12
     filtered: 100.00
        Extra: Using index

the table a has type=index, but rows=4, i.e. the query plan relies on the fact that it is producing rows in LIMIT order and so will be able to terminate early.

In 10.1.16:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: e
         type: range
possible_keys: id_usuario,registros,registros3
          key: registros3
      key_len: 4
          ref: NULL
         rows: 108250
     filtered: 99.93
        Extra: Using where; Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: a
         type: eq_ref
possible_keys: PRIMARY,correos6
          key: PRIMARY
      key_len: 4
          ref: saltos.e.id_registro
         rows: 1
     filtered: 100.00
        Extra: Using where

The first table is 'e'. "Using temporary" and "Using filesort" are used to produce the required ordering (this means execution can't finish early after producing #LIMIT rows).

Comment by Sergei Petrunia [ 2016-07-27 ]

looking at schema.sql one can see the tables are using MyISAM, so apparently any fixes in 10.1.16 that relate to "Extended Keys" are not at fault.

Comment by Sergei Petrunia [ 2016-07-27 ]

Looking at the tables in the problematic join:

  FROM tbl_correo a
          LEFT JOIN tbl_registros_i e ON e.id_aplicacion='27' AND e.id_registro=a.id
          LEFT JOIN tbl_usuarios d ON e.id_usuario=d.id

table 'd' is not present in either good or bad query plan, it seems it was
removed by the Table Elimination feature.

"a left join e" is converted into an inner join. I see it in EXPLAIN EXTENDED (tried both 10.1.14 and 10.1.16) when trying to reproduce with dummy data:

... from tbl_correo a join tbl_registros_i e ...

and also one can see it manually. The big select has e.id_usuario id_usuario
and then the top-level WHERE has id_usuario='38'.

Comment by Sergei Petrunia [ 2016-07-27 ]

I'm testing with a dummy dataset that I have generated ( id_usuarios_c - 10 rows, tbl_usuarios - 1 row, all other tables: 1K rows each).

And the interesting part is that I'm getting the same difference in EXPLAINs:

  • a,e in 10.1.14
  • e,a in 10.1.16
Comment by Sergei Petrunia [ 2016-07-27 ]

Trying to debug where it comes from
== 10.1.14 ==

(gdb) p join->join_tab[0].table->alias.Ptr
  $6 = 0x7f85492fc4f0 "a"
(gdb) p join->join_tab[1].table->alias.Ptr
  $7 = 0x7f85492fc170 "e"
(gdb) p join->join_tab[2].table->alias.Ptr
  $8 = 0x7f85492fc5f0 "d"
(gdb)  
(gdb) p join->join_tab[0].found_records
  $9 = 1000
(gdb) p join->join_tab[1].found_records
  $10 = 1
(gdb) p join->join_tab[2].found_records
  $11 = 1
(gdb)  
(gdb) p join->join_tab[0].dependent
  $12 = 0
(gdb) p join->join_tab[1].dependent
  $13 = 1
(gdb) p join->join_tab[2].dependent
  $14 = 3

== 10.1.16 ==

(gdb) p join->join_tab[0].table->alias.Ptr
  $16 = 0x7fac66c20f70 "a"
(gdb) p join->join_tab[1].table->alias.Ptr
  $17 = 0x7fac66c20770 "e"
(gdb) p join->join_tab[2].table->alias.Ptr
  $18 = 0x7fac66c21270 "d"
(gdb)  
(gdb) p join->join_tab[0].found_records
  $19 = 1000
(gdb) p join->join_tab[1].found_records
  $20 = 1
(gdb) p join->join_tab[2].found_records
  $21 = 1
(gdb)  
(gdb) p join->join_tab[0].dependent
  $22 = 0
(gdb) p join->join_tab[1].dependent
  $23 = 0
(gdb) p join->join_tab[2].dependent
  $24 = 3

Apparently, JOIN_TAB::dependent is different. In 10.1.14 e->dependent=

{a}

, in 10.1.16 e->dependent=0.

Comment by Sergei Petrunia [ 2016-07-27 ]

I would say that the value in 10.1.16 is the correct one. e->dependent=0 makes more sense when the query is a INNER JOIN e.
e->dependent=(a) is what would be used if the query was a LEFT JOIN e.

Comment by Sergei Petrunia [ 2016-07-27 ]

Ok, the change in behavior that I'm observing was caused by the fix for MDEV-10006.
It also matches the behavior that sanz is reporting:

  • In 10.1.14, the join order of {a, e}

    was the only possible.

    • This was not correct ( {e,a}

      should be possible also, this is what MDEV-10006 is about)

    • However, this mistake forced the choice of a good query plan.
  • In 10.1.16, the fix for MDEV-10006 made the {e, a}

    order possible

    • And the optimizer picked it
    • However, when doing so, it did not take into account the ORDER BY LIMIT clause (this is a long-known limitation of both MariaDB and MySQL)
    • And as a result, query plan changed for the worse.

(now, looking if there is anything we could do here)

Comment by Sergei Petrunia [ 2016-07-27 ]

Actually, the query plan reported for 10.1.16 should be able to avoid doing "Using filesort" with improved ORDER BY optimizations.

sanz, would it be possible for you to

  • take 10.1.16
  • Add this into my1.cnf:

    optimizer_switch='orderby_uses_equalities=on'
    

  • start the server
  • run the EXPLAIN for the problematic query
  • run the query itself and check if it is fast or slow
  • post the results here?
Comment by Josep Sanz [ 2016-07-28 ]

Hi Sergei.

With the optimizer_switch option works as expected as you can see in the attached files.

I explain-10.1.16-v2.log saltos-10.1.16-v2.log

Thanks by your help.

Comment by Sergei Petrunia [ 2016-07-28 ]

sanz, you're welcome. It should be safe to just run with 'orderby_uses_equalities=on' while on MariaDB 10.1. In the next MariaDB release (10.2), it will be ON by default.

Comment by Sergei Petrunia [ 2016-07-28 ]

Closing as there is a workaround (MDEV-8989).

Generated at Thu Feb 08 07:42:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.