[MDEV-771] LP:806894 - SJ-Materialization picks scan-strategy which is 2x slower than SJ-Materialization lookup Created: 2011-07-07  Updated: 2017-02-17

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.3.12, 5.5
Fix Version/s: 10.2

Type: Bug Priority: Minor
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: Launchpad

Attachments: XML File LPexportBug806894.xml    

 Description   

See testcase and details in MDEV-2608

I'm not sure if this is actually a bug (i.e. here we have a situation where the optimizer had sufficient info to make the right decision but didn't make it). This needs to be investigated.



 Comments   
Comment by Dan Vande More [ 2012-01-29 ]

Re: SJ-Materialization picks scan-strategy which is 2x slower than SJ-Materialization lookup
I too am seeing a lot of issues where semijoin is detrimental to use. For my application, subqueries are absolutely essential so I have been running maria 5.3 in production since at least september.

I occasionally run into areas where maria attempts to semijoin and it will cause the query to be 1 or more orders of magnitude slower. Unfortunately, about 1/10 as often I find that semijoin makes some painful queries wickedly fast, so I am forced to leave it on in some cases.

You have this marked as low priority but I would really like to see this fixed for maria 5.3. I do consider this a major bug but semijoin is too optimistic about it's chances to handle some things and can wreak havoc on queries even mysql 5.5 can do faster.

Comment by Sergei Petrunia [ 2012-01-30 ]

Re: SJ-Materialization picks scan-strategy which is 2x slower than SJ-Materialization lookup
Yes. This bug was marked as low priority at some point because we were focusing on wrong query result and crashing bugs. Now, these bugs were fixed and wrong-query-plan bugs get priority.

The slowdowns you're seeing may be different from the problem that caused this bug. We have also found and are working on BUG#913030, BUG#914569.

Any chance you could provide us (=mariadb team) with EXPLAIN EXTENDED outputs for cases where semi-join execution was slower than non-semijoin execution?

Ideally, we would like to have the datasets, too. If the dataset cannot be made public, we have ftp.askmonty.org specifically set up to allow data uploads that can be visible only by Monty Program Ab staff.

Comment by Dan Vande More [ 2012-01-30 ]

Re: SJ-Materialization picks scan-strategy which is 2x slower than SJ-Materialization lookup
Hi thanks for the response. The two mentioned bugs do mention myisam so I wasn't able to determine whether they are limited to just those. I am using innodb.

I have been given approval to spend some time on reproducing this, you will hear from me over the next few days whether or not I can provide you with the needed info.

Comment by Dan Vande More [ 2012-02-02 ]

Re: SJ-Materialization picks scan-strategy which is 2x slower than SJ-Materialization lookup
Hi,

Unfortunately when I had this problem I failed to record the exact query & optimizer_switch, so I am unable to reproduce the issue at this time. I know I am definitely having it, but the 3:30am calls to fix a server that's being I/O starved typically mean I'm in a rush to just fix it. I will definitely post back when I have full details, but as of now you shouldn't wait on me to provide any meaningful data.

Comment by Sergei Petrunia [ 2012-03-02 ]

Re: SJ-Materialization picks scan-strategy which is 2x slower than SJ-Materialization lookup
A slowdown experienced by @fimbulvetr has been filed here: https://bugs.launchpad.net/maria/+bug/929732.

Comment by Sergei Petrunia [ 2012-03-02 ]

Re: SJ-Materialization picks scan-strategy which is 2x slower than SJ-Materialization lookup
Within this bug, I'll try to figure out what was the cause of the slowdown with the bug report.

A slightly simplified query from the report (converted LEFT JOIN to inner join since it was converted internally anyway):

select * from CountryLanguage where (Language, Country) IN (SELECT City.Name, Country.Code FROM City, Country WHERE Country = Code and Code > 'LLL');

With semijoin=on, EXPLAIN is different on current 5.3:
-------------------------------------------------------------------------------------------------------------------------------------------

id select_type table type possible_keys key key_len ref rows filtered Extra

-------------------------------------------------------------------------------------------------------------------------------------------

1 PRIMARY Country range PRIMARY PRIMARY 3 NULL 114 100.00 Using where; Using index
1 PRIMARY City ref Country Country 3 w1.Country.Code 18 100.00 Start temporary
1 PRIMARY CountryLanguage eq_ref PRIMARY PRIMARY 33 w1.Country.Code,w1.City.Name 1 100.00 Using index condition; End temporary

-------------------------------------------------------------------------------------------------------------------------------------------

with semijoin=off, I get:
----------------------------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

----------------------------------------------------------------------------------------------------------+

1 PRIMARY CountryLanguage ALL NULL NULL NULL NULL 984 Using where
2 MATERIALIZED Country range PRIMARY PRIMARY 3 NULL 114 Using where; Using index
2 MATERIALIZED City ref Country Country 3 w1.Country.Code 18  

----------------------------------------------------------------------------------------------------------+

Query execution times:
semijoin=off: 0.08 sec
semijoin=on: 0.18 sec

Comment by Sergei Petrunia [ 2012-03-07 ]

Re: SJ-Materialization picks scan-strategy which is 2x slower than SJ-Materialization lookup
== Semi-join plan analysis ==

MariaDB [bug806894]> show table_statistics;
---------------------------------

Table_schema Table_name Rows_read

---------------------------------

bug806894 City 1676
bug806894 Country 114

---------------------------------

MariaDB [bug806894]> show index_statistics;;
-------------------------------------------+

Table_schema Table_name Index_name Rows_read

-------------------------------------------+

bug806894 City Country 1676
bug806894 Country PRIMARY 114

-------------------------------------------+

This means that
Country: 114 rows expected, 114 rows read
City: 18*114=2052 rows expected, 1676 rows read
CountryLanguage: 18*114*1=2052 rows expected, 0 rows actually read

Per-table counters count sucessful read ops. Key lookups that found nothing are
not counted. Let's also check status increments:

---------------------------------+

Handler_read_key 1791
Handler_read_next 1790
Handler_tmp_update 0
Handler_tmp_write 0

---------------------------------+

.. which are caused by:

table | Handler_read_key | Handler_read_next
Country | 1 | 113
City | 114 lookups | min. 1676
CountryLanguage | =1790-115=1676 | 0, its eq_ref

Comment by Sergei Petrunia [ 2012-03-07 ]

Re: SJ-Materialization picks scan-strategy which is 2x slower than SJ-Materialization lookup
Indeed, let's take another look at the query:

select *
from
CountryLanguage
where (Language, Country) IN (SELECT City.Name, Country.Code FROM ...)

Apparently, one will never get matches when they look for City.Name= CountryLanguage.Language.

On the other hand, in the subquery's join

SELECT City.Name, Country.Code FROM City, Country WHERE Country = Code and Code > 'LLL'

either table will have matches. Condition "City.Country=Country.Code" is the join intended by the dataset, each city lies within some existing country, and each country has some cities.

Comment by Sergei Petrunia [ 2012-03-07 ]

Re: SJ-Materialization picks scan-strategy which is 2x slower than SJ-Materialization lookup
Due to the above, I'm making the conclusion:
this particular slowdown is expected. The fact that it worked before is pure luck, because previous plan has put table that has no matches in other tables into the front of the query plan, while now it is at the end of it.

Comment by Elena Stepanova [ 2012-03-21 ]

Re: SJ-Materialization picks scan-strategy which is 2x slower than SJ-Materialization lookup
Also filed in JIRA as MDEV-194

Comment by Rasmus Johansson (Inactive) [ 2012-03-29 ]

Launchpad bug id: 806894

Generated at Thu Feb 08 06:31:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.