Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36048

Query execution time drops significantly when adding more values in the `IN` clause.

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.6.18
    • 10.6
    • Optimizer
    • None
    • Ubuntu 22.04

    Description

      *Environment*:

      • *MariaDB Version*: 10.6.18
      • *Operating System*: Ubuntu 22.04
      • *Server Version*: `10.6.18-MariaDB-0ubuntu0.22.04.1-log`

      *Description*

      I am experiencing a counterintuitive performance issue in MariaDB 10.6.18. When running a query with an `IN` clause on `category_id`, adding more values makes the query significantly faster, despite retrieving more data.

      *Query 1 (Slow)*
      Takes *8+ seconds*.

      SELECT DISTINCT `e`.`entity_id`
      FROM `catalog_product_entity` AS `e`
      INNER JOIN `catalog_product_index_price` AS `price_index`
                 ON price_index.entity_id = e.entity_id 
                   AND price_index.customer_group_id = 0 
                   AND price_index.website_id = '1'
      INNER JOIN `catalog_product_website` AS `product_website`
                 ON product_website.product_id = e.entity_id 
                   AND product_website.website_id = 1
      WHERE (((IFNULL(`e`.`entity_id`, 0) IN (
                SELECT `catalog_category_product`.`product_id`
                FROM `catalog_category_product`
                WHERE (category_id IN (3, 174, 175, 176))
             ))));
      

      *Query 2 (Fast)*
      Adding one more `category_id` (`177`), execution time drops to *100ms*.

      SELECT DISTINCT `e`.`entity_id`
      FROM `catalog_product_entity` AS `e`
      INNER JOIN `catalog_product_index_price` AS `price_index`
                 ON price_index.entity_id = e.entity_id 
                   AND price_index.customer_group_id = 0 
                   AND price_index.website_id = '1'
      INNER JOIN `catalog_product_website` AS `product_website`
                 ON product_website.product_id = e.entity_id 
                   AND product_website.website_id = 1
      WHERE (((IFNULL(`e`.`entity_id`, 0) IN (
                SELECT `catalog_category_product`.`product_id`
                FROM `catalog_category_product`
                WHERE (category_id IN (3, 174, 175, 176, 177))
             ))));
      


      *Expected Behavior*
      The query execution time should not drop drastically by merely adding one more value to the `IN` clause.

      *Observed Behavior*

      • When `category_id IN (3, 174, 175, 176)`, the query takes *8+ seconds*.
      • When `category_id IN (3, 174, 175, 176, 177)`, the query completes in *100ms*.

      *Schema Information*
      `catalog_product_entity`

      CREATE TABLE `catalog_product_entity` (
        `entity_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
        PRIMARY KEY (`entity_id`)
      ) ENGINE=InnoDB;
      

      `catalog_product_index_price`

      CREATE TABLE `catalog_product_index_price` (
        `entity_id` int(10) unsigned NOT NULL,
        `customer_group_id` int(10) unsigned NOT NULL,
        `website_id` smallint(5) unsigned NOT NULL,
        PRIMARY KEY (`entity_id`, `customer_group_id`, `website_id`)
      ) ENGINE=InnoDB;
      

      `catalog_product_website`

      CREATE TABLE `catalog_product_website` (
        `product_id` int(10) unsigned NOT NULL,
        `website_id` smallint(5) unsigned NOT NULL,
        PRIMARY KEY (`product_id`, `website_id`)
      ) ENGINE=InnoDB;
      

      `catalog_category_product`

      CREATE TABLE `catalog_category_product` (
        `entity_id` int(11) NOT NULL AUTO_INCREMENT,
        `category_id` int(10) unsigned NOT NULL DEFAULT 0,
        `product_id` int(10) unsigned NOT NULL DEFAULT 0,
        PRIMARY KEY (`entity_id`, `category_id`, `product_id`),
        UNIQUE KEY `CATALOG_CATEGORY_PRODUCT_CATEGORY_ID_PRODUCT_ID` (`category_id`, `product_id`),
        KEY `CATALOG_CATEGORY_PRODUCT_PRODUCT_ID` (`product_id`),
        KEY `idx_category_product_category_id_product_id` (`category_id`, `product_id`)
      ) ENGINE=InnoDB;
      


      *Execution Plan Analysis*
      *Slow Query Execution Plan*
      Execution time: *10+ seconds*

      {
        "query_block": {
          "select_id": 1,
          "r_loops": 1,
          "r_total_time_ms": 10737.67,
          "temporary_table": {
            "table": {
              "table_name": "product_website",
              "access_type": "ref",
              "possible_keys": ["PRIMARY", "CATALOG_PRODUCT_WEBSITE_WEBSITE_ID"],
              "key": "CATALOG_PRODUCT_WEBSITE_WEBSITE_ID",
              "key_length": "2",
              "used_key_parts": ["website_id"],
              "ref": ["const"],
              "r_loops": 1,
              "rows": 1811020,
              "filtered": 100,
              "using_index": true
            },
            "table": {
              "table_name": "catalog_category_product",
              "access_type": "ref",
              "possible_keys": [
                "CATALOG_CATEGORY_PRODUCT_CATEGORY_ID_PRODUCT_ID",
                "CATALOG_CATEGORY_PRODUCT_PRODUCT_ID",
                "idx_category_product_category_id_product_id"
              ],
              "key": "CATALOG_CATEGORY_PRODUCT_PRODUCT_ID",
              "key_length": "4",
              "used_key_parts": ["product_id"],
              "ref": ["func"],
              "r_loops": 3986533,
              "rows": 1,
              "r_filtered": 0.0039,
              "using_index": true
            }
          }
        }
      }
      

      *Fast Query Execution Plan*
      Execution time: *100ms*

      {
        "query_block": {
          "select_id": 1,
          "r_loops": 1,
          "r_total_time_ms": 10465.68,
          "temporary_table": {
            "table": {
              "table_name": "catalog_category_product",
              "access_type": "ref",
              "possible_keys": [
                "CATALOG_CATEGORY_PRODUCT_CATEGORY_ID_PRODUCT_ID",
                "CATALOG_CATEGORY_PRODUCT_PRODUCT_ID",
                "idx_category_product_category_id_product_id"
              ],
              "key": "CATALOG_CATEGORY_PRODUCT_PRODUCT_ID",
              "key_length": "4",
              "used_key_parts": ["product_id"],
              "ref": ["func"],
              "r_loops": 3986533,
              "rows": 1,
              "r_filtered": 0.0091,
              "using_index": true
            }
          }
        }
      }
      

      *Additional Notes*

      • The query optimizer might be making a poor choice in index selection based on the estimated number of rows.
      • The drastic improvement in performance suggests an issue with query optimization in `IN` subqueries.

      *Steps to Reproduce*
      1. Use the provided schema and indexes.
      2. Execute both queries and observe the significant time difference with a very big dataset (in our case 4million products)

      Would appreciate any insights or guidance on why this behavior occurs and if it’s a known issue or a bug.

      PS: I've noticed that when the results are more than 8000 records, the query runs in around 1 second. When the results are around 10 records, it can take a minute.

      You can see some further info here.
      https://github.com/magento/magento2/issues/39577

      Attachments

        Activity

          There are no comments yet on this issue.

          People

            psergei Sergei Petrunia
            ioweb.gr Gabriel Tziotzis
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.