Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25008

Single-table UPDATE/DELETE: make cost-based choice between subquery strategies

Details

    Description

      There is a query in my application to delete duplicate records (specific fields), this query used to work fine on MySQL (v 8.0.21) on Windows 10 and takes about a minute,

      When I migrated to MariaDB 10.5.9 on Ubuntu 20.04 , this query stucks, I left it for +6 hours and it didn't come back.

      The table has about 4 million records, table is:

      CREATE TABLE `item_variant_price` (
        `seller_variant_id` int(11) NOT NULL AUTO_INCREMENT,
        `item_id` varchar(100) DEFAULT NULL,
        `price` float DEFAULT NULL,
        `seller_name` varchar(400) DEFAULT NULL,
        `variant` varchar(400) DEFAULT NULL,
        `is_fulfilled` int(11) NOT NULL DEFAULT 0,
        `insertion_date` timestamp NOT NULL DEFAULT current_timestamp(),
        `modification_date` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
        `is_main_page` int(11) DEFAULT 0,
        `is_cheapest` int(11) NOT NULL DEFAULT 0,
        `variant_url` varchar(400) DEFAULT NULL,
        PRIMARY KEY (`variant_id`),
        FULLTEXT KEY `item_variant_price_serial_IDX` (`item_id`,`seller_name`,`variant`)
      ) ENGINE=InnoDB AUTO_INCREMENT=4309337 DEFAULT CHARSET=utf8mb4;
      

      and this is the query:

      delete from item_variant_price where seller_variant_id not in (select m from (select max(seller_variant_id) m from item_variant_price group by item_id, seller_name, variant) as innerTable);
      

      the inner select statement takes 27 seconds, so there has to be an issue with delete ?

      Thanks for your support

      Attachments

        Issue Links

          Activity

            As for 11.0 changing how this query is optimized... Yes it does change it.

            DELETE FROM t1 
            WHERE id NOT IN  (SELECT m  -- select #2
                              FROM (SELECT max(id) m  -- select #3
                                    FROM t1 
                                    GROUP BY item_id, seller_name, variant
                                   ) AS innertable)
            

            Before 11.0:

            select #3 was materialized into a temporary table which didn't have a useful index.
            Then, NOT IN subquery used IN-to-EXISTS which had to scan the temptable many times.

            After 11:0:

            select #3 is materialized into a table with a potential index on innertable.m
            The NOT IN still uses IN-to-EXISTS.
            IN-to-EXISTS pushes the "t1.id=innertable.m" into subquery#2.
            potential index on innertable.m becomes a real index, and that index is now used to perform
            [non-unique] index lookups.
            This way, the query plan is now similar to Materialization strategy - have a temptable with a suitable index, make lookups there when computing the subquery.
            (Also EXPLAIN is wrong as shown above but this doesn't seem to have much effect).

            psergei Sergei Petrunia added a comment - As for 11.0 changing how this query is optimized... Yes it does change it. DELETE FROM t1 WHERE id NOT IN ( SELECT m -- select #2 FROM ( SELECT max (id) m -- select #3 FROM t1 GROUP BY item_id, seller_name, variant ) AS innertable) Before 11.0: select #3 was materialized into a temporary table which didn't have a useful index. Then, NOT IN subquery used IN-to-EXISTS which had to scan the temptable many times. After 11:0: select #3 is materialized into a table with a potential index on innertable.m The NOT IN still uses IN-to-EXISTS. IN-to-EXISTS pushes the "t1.id=innertable.m" into subquery#2. potential index on innertable.m becomes a real index, and that index is now used to perform [non-unique] index lookups. This way, the query plan is now similar to Materialization strategy - have a temptable with a suitable index, make lookups there when computing the subquery. (Also EXPLAIN is wrong as shown above but this doesn't seem to have much effect).

            Take away: need to fix EXPLAIN for the above case but this is not a blocker for this MDEV.

            psergei Sergei Petrunia added a comment - Take away: need to fix EXPLAIN for the above case but this is not a blocker for this MDEV.

            Filed MDEV-35231 for fixing the EXPLAIN. It's a different problem from this MDEV, this MDEV doesn't need to wait on it.

            psergei Sergei Petrunia added a comment - Filed MDEV-35231 for fixing the EXPLAIN. It's a different problem from this MDEV, this MDEV doesn't need to wait on it.

            Testing done. Ok to push. The problems discussed above will be fixed in MDEV-35231

            lstartseva Lena Startseva added a comment - Testing done. Ok to push. The problems discussed above will be fixed in MDEV-35231
            ycp Yuchen Pei added a comment -

            pushed 4b6922a315fa5411665ac99c0b40fd7238093403 to 11.7/main

            ycp Yuchen Pei added a comment - pushed 4b6922a315fa5411665ac99c0b40fd7238093403 to 11.7/main

            People

              ycp Yuchen Pei
              ahmedadel47 Ahmed Wahba
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.