Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19793

Optimizer to prune partitions with ORDER-BY & LIMIT (Feature request)

    XMLWordPrintable

Details

    • Task
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Given the following table listed below, I'm always inserting in the last partition and most SELECTs are ORDER BY my_id DESC LIMIT 100 (in English: "give me latest 100 items of..."). For the inserts partitioning is working very well, but for SELECTs the optimizer seems not smart enough.

      CREATE TABLE `mytable` (
        `user_id` bigint(20) unsigned NOT NULL,
        `my_id` bigint(20) unsigned NOT NULL,                -- monotonically increasing for inserts
        `data` varbinary(123) DEFAULT NULL,
        PRIMARY KEY (`user_id`,`my_id`),
        UNIQUE KEY `my_id_idx` (`my_id`)
      ) ENGINE=InnoDB ROW_FORMAT=COMPACT
       PARTITION BY RANGE (`my_id`)
      (PARTITION `p0` VALUES LESS THAN (10000000) ENGINE = InnoDB,
       PARTITION `p10M` VALUES LESS THAN (20000000) ENGINE = InnoDB,
       PARTITION `p20M` VALUES LESS THAN (30000000) ENGINE = InnoDB,
       PARTITION `p30M` VALUES LESS THAN (40000000) ENGINE = InnoDB,
       [...]
      ) 
      

      I found that MariaDB's optimizer is not clever enough to:

      • iterate over the partitions in reverse order; start reading the most recent partition (info via partitioning layout: highest values of my_id)
      • stop reading more partitions after having round LIMIT number of rows.

      IOW: "If there is a LIMIT and ORDER BY k DESC where k is the entire partition key, then scan the partitions in reverse order and be prepared to stop at LIMIT". (by Rick James)

      How did I test this?

      • MariaDB server 10.4.5.
      • InnoDB file per table.
      • 'Old' partitions on slow spinning disks, latest partition on NVMe SSD.
      • SELECT with either ORDER BY my_id DESC LIMIT 100 & ... ASC ... are both slow, showing disk I/O from all partitions, as measured from the OS.

      Specifying partitions manually boosts performance as expected, but that is quite cumbersome, and also it does not work through a SPIDER instance, so I can't really do that in practice.

      To give an English analogy that describes this feature request best:

      Suppose you are archiving all mail over the years and you sort every year by sender. Then if you get the question to get the latest 10 letters from sender X, would you start at the 1970 folder or try 2019 first? And would you stop at finding 10 items in the 2019 folder? MariaDB starts at 1970 and always opens every folder...

      More context in my Q on DBA.SE.

      Related issues I've found:

      Attachments

        Activity

          People

            Unassigned Unassigned
            gertvdijk Gert van Dijk
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.