[MDEV-6239] Partition pruning is not working as expected in an inner query Created: 2014-05-14  Updated: 2014-06-16  Resolved: 2014-05-29

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.5.37, 10.0.11
Fix Version/s: 5.5.38, 10.0.12

Type: Bug Priority: Major
Reporter: Sivaram Dandibhotla Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None
Environment:

Linux, MariaDB


Attachments: Microsoft Word Explain Plan for the query in MySQL 5.6.16.csv     Microsoft Word Explain Plan for the query.csv     File Extended Plan.rar     File Partition Pruning issue data setup.sql    
Issue Links:
Relates
relates to MDEV-6279 Possible performance issue with semi-... Closed

 Description   

We noticed an issue with the partition pruning on Maria DB with Toku DB engine. Partition pruning doesn’t seem to work as expected for an inner query, this is illustrated with a simple example below.

Let us consider two tables employee and dept with company_id being the column on which partition is defined. When we try to fetch the list of departments for a given company_id in an inner query then all the partitions are being accessed instead of narrowing down to the partition specified in the criteria. When we run the same query as an outer query it is working fine, it is working fine on MySQL 5.6.16. Please let us know if this is a defect.

DDL and DML statements are attached in "Partition Pruning issue data setup.sql". This file also has an explain plan for a simple select statement. Please note that for the inner query all 3 partitions are being listed. Explain plan is attached in "Explain Plan for the query.csv". We are running into this issue with our complex queries, the one listed here is just for explaining the problem.

This issue is causing a huge performance problems for our system and right now it is a blocker for us to pursue MaraDB further.

Thanks for your time, please let us know if you need any more details.

Updated the ticket with the "Explain Plan for query in MySQL5.6.16.csv". Please note that in MySQL inner query access only the relevant partition.



 Comments   
Comment by Elena Stepanova [ 2014-05-16 ]

Hi,

Could you please try to set
optimizer_switch='semijoin=off' and run the query on real big tables, to see if it helps you to resolve the performance problem?

Thanks.

Comment by Sivaram Dandibhotla [ 2014-05-22 ]

Hi Elena,

Thanks for your quick response. We tried the optimizer switch you mentioned, it seemed to work as expected. But we noticed a performance issue on MariaDB with TokuDB storage engine when compared against MySQL on MyISAM storage engine.

In our query we have two tables in the outer query joining with a tuple condition with the inner query which is fetching data only from one table. Tables in outer query has very less data, 15 rows and 75 rows respectively. Inner table is quite big, it has 800K records. Find below the response time on different configurations.

MySQL with MyISAM storage engine - response time is ~1 second.
MariaDB with TokuDB storage engine - response time is ~10 seconds when the optimizer switch is turned off, ~5 seconds when the optimizer switch is on.
MariaDB with MyISAM - ~23 seconds when the optimizer is on, ~1 second when it is turned off.

We are running into table level lock issues with MyISAM and hence would like to use MariaDB with TokuDB storage engine.

Note that this is a single user test and we are yet to verify this on heavy load. Can you please comment on the performance degradation when the optimizer switch is turned off. Though we are seeing better performance when the optimizer switch is on we are concerned with it scanning across all partitions.

Regards,
Siva

Comment by Elena Stepanova [ 2014-05-23 ]

Hi Siva,

Could you please do the following on each of MySQL with MyISAM, MariaDB with MyISAM optimizer switch ON, MariaDB with MyISAM optimizer_switch OFF instances, and paste the output (you can skip the result set of the SELECT query itself).

EXPLAIN EXTENDED <your SELECT query>;
SHOW WARNINGS;
FLUSH STATUS;
<your SELECT query>;
SHOW STATUS LIKE 'Handler%';

Thanks.

Comment by Sivaram Dandibhotla [ 2014-05-23 ]

Captured the extended plan and the status variables after query execution.

Comment by Sivaram Dandibhotla [ 2014-05-23 ]

Hi Elena,

Please find attached "Extended Plan.rar" which has the extended plan and the status variables after query execution. Along with the configurations that you have asked for stats for Toku DB is also captured. Let me know if you need any further details.

Regards,

Siva

Comment by Elena Stepanova [ 2014-05-23 ]

psergey,

This is the issue we briefly discussed earlier. The problem seems to be twofold:

1) Partition pruning: with semijoin=on, partition pruning does not work as expected. It can be reproduced by running Partition Pruning issue data setup.sql from attachments.

2) Performance of the actual query:

  • with semijoin=on and with MyISAM, the query is much slower in MariaDB, possibly due to this very bug with partition pruning;
  • with semijoin=off and MyISAM, the query performs as in MySQL, this is OK;
  • but with TokuDB, it is slower both with and without semijoin, this is probably something to look into. The data (explains and status) can be found in the attached rar file
Comment by Sergei Petrunia [ 2014-05-28 ]

== Investigation ==

I am debugging this query (the tables are InnoDB):

explain partitions select * from employee where company_id = 1000 and dept_id in (select dept_id from dept where COMPANY_ID = 1000);

MariaDB 5.5 shows

MariaDB [j5]> explain partitions select * from employee where company_id = 1000 and dept_id in (select dept_id from dept where COMPANY_ID = 1000);
+------+-------------+----------+----------------------+--------+---------------+---------+---------+---------------------------+------+-------------+
| id   | select_type | table    | partitions           | type   | possible_keys | key     | key_len | ref                       | rows | Extra       |
+------+-------------+----------+----------------------+--------+---------------+---------+---------+---------------------------+------+-------------+
|    1 | PRIMARY     | employee | p_1000               | ref    | PRIMARY       | PRIMARY | 8       | const                     |    6 | Using where |
|    1 | PRIMARY     | dept     | p_1000,p_2000,p_3000 | eq_ref | PRIMARY       | PRIMARY | 16      | const,j5.employee.dept_id |    1 | Using index |
+------+-------------+----------+----------------------+--------+---------------+---------+---------+---------------------------+------+-------------+

MySQL 5.6 shows

+----+-------------+----------+------------+--------+---------------+---------+---------+----------------------------+------+-------------+
| id | select_type | table    | partitions | type   | possible_keys | key     | key_len | ref                        | rows | Extra       |
+----+-------------+----------+------------+--------+---------------+---------+---------+----------------------------+------+-------------+
|  1 | SIMPLE      | employee | p_1000     | ref    | PRIMARY       | PRIMARY | 8       | const                      |    6 | Using where |
|  1 | SIMPLE      | dept     | p_1000     | eq_ref | PRIMARY       | PRIMARY | 16      | const,j52.employee.dept_id |    1 | Using index |
+----+-------------+----------+------------+--------+---------------+---------+---------+----------------------------+------+-------------+

table dept has this condition inside the subquery: COMPANY_ID=1000. Apparently mysql 5.6 was able to use it for partition pruning, while mariadb 5.5 was not.

This should not affect the speed of this particular query (except for table opening). This is because table dept is accessed through eq_ref access through

primary key (company_id, dept_id)

Every time the executor is about to make an index lookup, it knows a specific value of company_id, and so it will access only one partition. This is called "partition selection" internally. It is not visible in EXPLAIN but it works.

However, lack of partition pruning can cause problems in other kinds of queries, and also it will affect query optimization.

Comment by Sergei Petrunia [ 2014-05-28 ]

Debugging...

=== MariaDB 5.5 ===
in JOIN::optimize(), prune_partitions() is called only for table `employee`.
Debugging JOIN::optimize, one can see this:

    List_iterator_fast<TABLE_LIST> li(select_lex->leaf_tables);
    while ((tbl= li++))
    {
      /* 
        If tbl->embedding!=NULL that means that this table is in the inner
        part of the nested outer join, and we can't do partition pruning
        (TODO: check if this limitation can be lifted)
      */
      if (!tbl->embedding)
      {
        Item *prune_cond= tbl->on_expr? tbl->on_expr : conds;
        tbl->table->no_partitions_used= prune_partitions(thd, tbl->table,
	                                                 prune_cond);

leaf_tables includes table `dept`, but dept has tbl->embedding!=NULL, so
prune_partitions() is not called.

== MySQL 5.6 ==
JOIN::optimize()/JOIN::prune_table_partitions() behave like in MariaDB (dept is
not pruned).

But, 5.6 also has "partition lock pruning", which is called from JOIN::prepare,
which is before subquery is converted into semi-join. partition lock pruning
is invoked for both tables, and it prunes away partitions in table 'dept'.

Comment by Sergei Petrunia [ 2014-05-29 ]

The fix for the partition pruning problem has been pushed into the 5.5 tree.

Comment by Sergei Petrunia [ 2014-05-29 ]

As for possible semi-join optimization problems, lets discuss those in another MDEV. I've created MDEV-6279.

Comment by Sergei Petrunia [ 2014-05-29 ]

Closing this issue (since there was a specific problem in partition pruning which was fixed and pushed). If that's not the only issue, let's discuss in MDEV-6279, and also feel free to open more issues.

Comment by Sivaram Dandibhotla [ 2014-06-16 ]

Can you please confirm that the fix for MDEV-6239 is available 10.0.12, couldn't find this in the change log of the release (https://mariadb.com/kb/en/mariadb-10012-changelog/).

Generated at Thu Feb 08 07:10:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.