[MDEV-20871] Queries with entity_id IN ('1', '2', ..... , '70000') run much slower in MariaDB 10.3.18 than on MariaDB 10.1 Created: 2019-10-21  Updated: 2019-12-10  Resolved: 2019-12-10

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.3.18
Fix Version/s: 10.3.22, 10.4.12, 10.5.1

Type: Bug Priority: Critical
Reporter: Alexander Menk Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 2
Labels: None
Environment:

Magento 2.3 Shop, MariaDB 10.3.18


Attachments: File query-select-problem.sql     File testscript-not-reproducible.sql    
Issue Links:
Relates
relates to MDEV-20900 IN predicate to IN subquery conversio... Closed

 Description   

We discovered, that queries with a long IN-list for an INT column take much longer than on 10.1, when the list entries are supplied as strings.

Is that intended or a bug?

References: https://github.com/magento/magento2/issues/25199
https://magento.stackexchange.com/questions/292867/which-server-settings-can-cause-magento-reindex-to-run-really-slow/293345#293345



 Comments   
Comment by Alexander Menk [ 2019-10-23 ]

Interesting catch: Performance can be improved by removing the quotes.

Comment by Wilson Hauck [ 2019-10-23 ]

Query completion time - with quotes, 8 minutes.
Query completion time - without quotes, 3 seconds per Alexander Menk on 10/21/2019

Comment by Wilson Hauck [ 2019-10-27 ]

Magento, details to consider for your corrective action. The Magento query code should not be using quoted IN values when datatype is INT. Your code correction will benefit your World Wide Base of customers with 3-second query completion vs 8 minutes when dealing with 70,000 items in the catalog. 480 / 3 = 160 times faster completion of this query.

Comment by Alexander Menk [ 2019-11-22 ]

What is the status of this?
I believed I was able to patch this in Magento 2 (https://github.com/magento/magento2/pull/25212) by just converting all numeric strings to integers, but it turns out that approach seems to cause another odd behavior, because

SELECT * WHERE x IN (1, 2) also finds x = "1-1", x = "1-2" (http://sqlfiddle.com/#!9/1c01c9/1/0)

So it would be much harder to fix all the possible occurrences of the problematic pattern in Magento.

What is the course of action.

Is it planned to fix in MariaDB or is it intended behavior of the optimizer?

Comment by Wilson Hauck [ 2019-11-23 ]

Alexander,

SELECT * is not how your application is written.

Your 
Queries with entity_id IN ('1', '2', ..... , '70000') 
that runs so slow because of the quotes has an object to match on entity_id is a DATATYPE of INTeger.

The sqlfiddle test is using VARCHAR data - a whole different animal. than INTEGER data type.

The decision to either use or not use quotes should be made before QUERY BUILD time by KNOWING the DATATYPE of the object of the IN list.

It would seem, INT - BIGINT - MEDINT would all have to be considered earlier rather than later in the QUERY BUILD decisions.

Please see my Skype chat entry a few hours ago.

Thank you,
Wilson Hauck
www.mysqlservertuning.com

From: "Alexander Menk (Jira)" <jira@mariadb.org>Date: Fri, Nov 22, 2019 12:36 pm
To: wilson.hauck@mysqlservertuning.com
Subject: [JIRA] (MDEV-20871) Queries with entity_id IN ('1', '2', ..... , '70000') run much slower in MariaDB 10.3.18 than on MariaDB 10.1

      [ https://jira.mariadb.org/browse/MDEV-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId8484#comment-138484 ]

Alexander Menk commented on MDEV-20871:
---------------------------------------

What is the status of this?
I believed I was able to patch this in Magento 2 (https://github.com/magento/magento2/pull/25212) by just converting all numeric strings to integers, but it turns out that approach seems to cause another odd behavior, because

SELECT * WHERE x IN (1, 2) also finds x = "1-1", x = "1-2" (http://sqlfiddle.com/#!9/1c01c9/1/0)

So it would be much harder to fix all the possible occurrences of the problematic pattern in Magento.

What is the course of action.

Is it planned to fix in MariaDB or is it intended behavior of the optimizer?

&g! t;                       Reporter: Alexander Menk


This message was sent by Atlassian Jira
(v7.13.1#713001)

Comment by Alexander Menk [ 2019-11-23 ]

I have to clarify my last posting:

1) The problem in MariaDB 10.3 is, that WHERE n IN ('1', '2' .. '70000') runs lot slower than in MariaDB 10.1 where n is an integer column. This can be mitigated like a workaround by using WHERE n IN (1, 2 .. 70000)
2) I am facing this problem in the linked Magento 2.3 issue so I tried to fix the problem, by converting ALL integer-looking values to actual integer, no matter of the column type, application wide, because it would take a lot time to identify each and every query of that form.
3) Now after I patched Magento that way, I am facing data inconsistency problems on other parts of Magento -> I assume that is because doing WHERE x IN (1, 2...) may lead to different results than WHERE x in (1, 2...) in the case of other queries where x is VARCHAR instead

So Wilson is right:

> The decision to either use or not use quotes should be made before QUERY BUILD time by KNOWING the DATATYPE of the object of the IN list.

But this would take much more diving deep into Magento and basically check all the queries that are build.

I know would like to find out, whether

A) the behavior in MariaDB reported here (running the WHERE on INT columns, with quotes being much slower than without quotes) is intended
or
B) is a bug in MariaDB 10.3

If it is a bug (B), then a fix would be great (but I cannot do this, I am a PHP programmer)
if (A) we have to dive deeper into Magento code and fix all that kind of queries at the source (and not at the escaping function for all values, as I tried to)

Comment by Alexander Menk [ 2019-11-23 ]

query-select-problem.sql is the problematic query, which runs on 10.1 in less than a secound; on 10.3 it takes minutes.

I tried to make a minimal reproducible example in testscript-not-reproducible.sql - but I could not reproduce the problem that way (it runs equally fast / faster on 10.3, as compared to 10.1)

Comment by Sergei Golubchik [ 2019-11-23 ]

What does EXPLAIN show in 10.3 and 10.2? (unfortunately, I cannot check it myself without a reproducible test case)

Comment by Alexander Menk [ 2019-11-23 ]

mariadb:10.3.20

root@80693dc9640f:/mnt# mysql perftest < /mnt/explain-query-select-problem.sql
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 73584 Start temporary; Using temporary; Using filesort
1 PRIMARY cp eq_ref PRIMARY PRIMARY 4 tvc_0._col_1 1 Using where; Using index; End temporary
1 PRIMARY cpw range PRIMARY,CATALOG_PRODUCT_WEBSITE_WEBSITE_ID CATALOG_PRODUCT_WEBSITE_WEBSITE_ID 2 NULL 41230 Using where; Using index; Using join buffer (flat, BNL join)
1 PRIMARY cpsd eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cp.entity_id,const,const 1
1 PRIMARY cpss eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cp.entity_id,const,const 1 Using where
1 PRIMARY cpvd eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cp.entity_id,const,const 1
1 PRIMARY cpvs eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cp.entity_id,const,const 1 Using where
1 PRIMARY ccp ref CATALOG_CATEGORY_PRODUCT_PRODUCT_ID CATALOG_CATEGORY_PRODUCT_PRODUCT_ID 4 perftest.cp.entity_id 1
3 DERIVED NULL NULL NULL NULL NULL NULL NULL No tables used

mariadb:10.2.29

root@5c5d3f20ec38:/# mysql perftest < /mnt/explain-query-select-problem.sql
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE cpw ref PRIMARY,CATALOG_PRODUCT_WEBSITE_WEBSITE_ID CATALOG_PRODUCT_WEBSITE_WEBSITE_ID 2 const 41230 Using where; Using index
1 SIMPLE cpsd eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cpw.product_id,const,const 1
1 SIMPLE cp eq_ref PRIMARY PRIMARY 4 perftest.cpw.product_id 1 Using where; Using index
1 SIMPLE cpss eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cpw.product_id,const,const 1 Using where
1 SIMPLE cpvd eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cpw.product_id,const,const 1
1 SIMPLE cpvs eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cpw.product_id,const,const 1 Using where
1 SIMPLE ccp ref CATALOG_CATEGORY_PRODUCT_PRODUCT_ID CATALOG_CATEGORY_PRODUCT_PRODUCT_ID 4 perftest.cpw.product_id 1

Comment by Alexander Menk [ 2019-11-23 ]

10.2 also works fine, problem starts to appear in 10.3 .. I am also testing with 10.4 right now and I will try to provide a example database extract on GitHub

Comment by Sergei Golubchik [ 2019-11-23 ]

Try to set in_predicate_conversion_threshold to 0 as a workaround

Comment by Alexander Menk [ 2019-11-23 ]

@Sergei https://github.com/iMi-digital/MDEV-20871-Example

Comment by Alexander Menk [ 2019-11-23 ]

MariaDB [(none)]> set in_predicate_conversion_threshold = 0;
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> set global in_predicate_conversion_threshold = 0;
Query OK, 0 rows affected (0.000 sec)

Does not seem to help — query takes still very long

Comment by Alexander Menk [ 2019-11-23 ]

10.3.20 is running now for more than 25 mins that query, after I set global in_predicate_conversion_threshold = 0;

Comment by Alexander Menk [ 2019-11-23 ]

on 10.3.20 with global in_predicate_conversion_threshold = 0;

real 27m21.634s
user 0m0.043s
sys 0m0.141s
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 73584 Start temporary; Using temporary; Using filesort
1 PRIMARY cp eq_ref PRIMARY PRIMARY 4 tvc_0._col_1 1 Using where; Using index; End temporary
1 PRIMARY cpw range PRIMARY,CATALOG_PRODUCT_WEBSITE_WEBSITE_ID CATALOG_PRODUCT_WEBSITE_WEBSITE_ID 2 NULL 40195 Using where; Using index; Using join buffer (flat, BNL join)
1 PRIMARY cpsd eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cp.entity_id,const,const 1
1 PRIMARY cpss eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cp.entity_id,const,const 1 Using where
1 PRIMARY cpvd eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cp.entity_id,const,const 1
1 PRIMARY cpvs eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cp.entity_id,const,const 1 Using where
1 PRIMARY ccp ref CATALOG_CATEGORY_PRODUCT_PRODUCT_ID CATALOG_CATEGORY_PRODUCT_PRODUCT_ID 4 perftest.cp.entity_id 2
3 DERIVED NULL NULL NULL NULL NULL NULL NULL No tables used

Comment by Alexander Menk [ 2019-11-23 ]

This might be related: https://jira.mariadb.org/browse/MDEV-12176?focusedCommentId=119660&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-119660
I am trying
set in_predicate_conversion_threshold = 4294967295;
set global in_predicate_conversion_threshold = 4294967295;

And yeah, it works faster on 10.3.20

real 0m1.819s
user 0m0.061s
sys 0m0.110s
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE cpw ref PRIMARY,CATALOG_PRODUCT_WEBSITE_WEBSITE_ID CATALOG_PRODUCT_WEBSITE_WEBSITE_ID 2 const 1 Using where; Using index
1 SIMPLE cpsd eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cpw.product_id,const,const 1
1 SIMPLE cp eq_ref PRIMARY PRIMARY 4 perftest.cpw.product_id 1 Using where; Using index
1 SIMPLE cpss eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cpw.product_id,const,const 1 Using where
1 SIMPLE cpvd eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cpw.product_id,const,const 1
1 SIMPLE cpvs eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 perftest.cpw.product_id,const,const 1 Using where
1 SIMPLE ccp ref CATALOG_CATEGORY_PRODUCT_PRODUCT_ID CATALOG_CATEGORY_PRODUCT_PRODUCT_ID 4 perftest.cpw.product_id 2

Comment by Igor Babaev [ 2019-11-23 ]

Alexander,
Now the conversion of the IN Predicate into IN subquery is not done if the arguments of the left and right parts of the predicate are of the different types.
Wait for 10.3.21.

Comment by Varun Gupta (Inactive) [ 2019-12-10 ]

This is related to MDEV-20900 and should be fixed by that. The fix would be available in 10.3.22

Generated at Thu Feb 08 09:02:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.