Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28444

Wrong query plan using partition by range of date

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.7.1
    • None
    • None
    • None

    Description

      For a table partitioned by day with PK( date , point) to respect primary key ordering

      CREATE TABLE `monde_gfs` (
        `date_heure` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
        `num_point` int(11) NOT NULL DEFAULT 0,
        PRIMARY KEY (`date_heure`,`num_point`),
        KEY `num_point` (`num_point`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1
       PARTITION BY RANGE (to_days(`date_heure`))
      (PARTITION `p2022_04_27` VALUES LESS THAN (738638) ENGINE = InnoDB,
       PARTITION `p2022_04_28` VALUES LESS THAN (738639) ENGINE = InnoDB,
       PARTITION `p2022_04_29` VALUES LESS THAN (738640) ENGINE = InnoDB
      ...
      

      The optimizer wrongly estimate the cost of the range ref access 2 vs 24M where as the 2 partions contains around 50M rows

       analyze format=json SELECT * FROM  monde_gfs m0_ WHERE m0_.num_point = 237609 AND m0_.date_heure >=
       '2022-04-29 00:00:00' AND m0_.date_heure <= '2022-04-30 00:00:00'; 
      {
       "query_block": {
        "select_id": 1,
        "r_loops": 1,
        "r_total_time_ms": 46326.7725,
        "table": {
         "table_name": "m0_",
         "partitions": ["p2022_04_29", "p2022_04_30"],
         "access_type": "range",
         "possible_keys": ["PRIMARY", "num_point"],
         "key": "PRIMARY",
         "key_length": "9",
         "used_key_parts": ["date_heure", "num_point"],
         "r_loops": 1,
         "rows": 2,
         "r_rows": 24917761,
         "filtered": 100,
         "r_filtered": 1.0033e-4,
         "attached_condition": "m0_.num_point = 237609 and m0_.date_heure >= '2022-04-29 00:00:00' and m0_.date_heure <= '2022-04-30 00:00:00'"
        }
      

      At the same time the partition pruning should already filtering the date_heure range and a score of 2 is under estimate and should be more estimated to the number records of each pruned partition because it's a range full index scan

      As the result of this miss estimate the correct plan is ignored

      analyze format=json SELECT * FROM  monde_gfs m0_ force index (num_point) WHERE m0_.num_point = 237609 AND m0_.date_heure >=
       '2022-04-29 00:00:00' AND m0_.date_heure <= '2022-04-30 00:00:00'; 
      {
        "query_block": {
          "select_id": 1,
          "r_loops": 1,
          "r_total_time_ms": 48.77394453,
          "table": {
            "table_name": "m0_",
            "partitions": ["p2022_04_29", "p2022_04_30"],
            "access_type": "ref",
            "possible_keys": ["num_point"],
            "key": "num_point",
            "key_length": "4",
            "used_key_parts": ["num_point"],
            "ref": ["const"],
            "r_loops": 1,
            "rows": 2,
            "r_rows": 48,
            "r_table_time_ms": 48.6800073,
            "r_other_time_ms": 0.056539477,
            "filtered": 100,
            "r_filtered": 52.08333333,
            "attached_condition": "m0_.date_heure >= '2022-04-29 00:00:00' and m0_.date_heure <= '2022-04-30 00:00:00'"
          }
        }
      } |
      

      We have observed this regression on pre production while production in 10.1 not affected and also the issue looks like beeing trigger at some point hitting 100% of the CPU with such queries

      Attachments

        Activity

          People

            Unassigned Unassigned
            stephane@skysql.com VAROQUI Stephane
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.