Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35300

EXPLAIN shows inconsistent #rows for LIMIT queries

    XMLWordPrintable

Details

    Description

      In EXPLAIN output, rows is "#rows to be examined". What if the optimizer does not plan to run the scan until all rows are read?

      create table t1 ( 
        a int,
        b int,
        key(a)
      );
      insert into t1 select seq, seq from seq_1_to_10000;
      analyze table t1;
      

      explain select * from t1 order by a limit 10;
      

      +------+-------------+-------+-------+---------------+------+---------+------+------+-------+
      | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra |
      +------+-------------+-------+-------+---------------+------+---------+------+------+-------+
      |    1 | SIMPLE      | t1    | index | NULL          | a    | 5       | NULL | 10   |       |
      +------+-------------+-------+-------+---------------+------+---------+------+------+-------+
      

      rows=10 which is how many we expect to read before producing #LIMIT rows. Ok.

      Now, with a quick select:

      explain select * from t1 
      where a between 1 and 1000 
      order by a limit 10;
      

      +------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
      | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                 |
      +------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
      |    1 | SIMPLE      | t1    | range | a             | a    | 5       | NULL | 1000 | Using index condition |
      +------+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
      

      rows=1000. It seems, LIMIT is not taken into account here.

      This query might be somewhat exceptional as we first decide to use range on index a and then test_if_skip_sort_order() figures that this also produces rows in the desired order.

      Another example with type=index:

      alter table t1 add key(b);
      explain select * from t1 
      where a between 1 and 1000 order by b limit 20;
      

      +------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
      | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra       |
      +------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
      |    1 | SIMPLE      | t1    | index | a             | b    | 5       | NULL | 203  | Using where |
      +------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
      

      Here, the number 203 is the # of rows to scan computed by test_if_skip_sort_order().

      (We need to get LIMIT 20 rows. range access provides selectivity of WHERE clause: (quick->rows=1000)/(table_rows=10000 )=0.1. To get 20 output rows, we scan 20/0.1=200 rows).

      explain 
      select *from t1 
      where a between 1 and 1000 and b between 1 and 5000 
      order by b limit 20;
      

      +------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
      | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra       |
      +------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
      |    1 | SIMPLE      | t1    | range | a,b           | b    | 5       | NULL | 5000 | Using where |
      +------+-------------+-------+-------+---------------+------+---------+------+------+-------------+
      

      Note that we see rows=5000. This is the estimate of #rows in range scan over index b, without taking LIMIT into account.
      Internally, the optimizer has a different idea:

                      "reconsidering_access_paths_for_index_ordering": {
                        "clause": "ORDER BY",
                        "fanout": 1,
                        "read_time": 1232.379546,
                        "table": "t1",
                        "rows_estimation": 1000,
                        "possible_keys": [
                          {
                            "index": "a",
                            "can_resolve_order": false,
                            "cause": "not usable index for the query"
                          },
                          {
                            "index": "b",
                            "can_resolve_order": true,
                            "updated_limit": 412,
                            "range_scan_time": 243.5337535,
                            "index_scan_time": 243.5337535,
                            "records": 5000,
                            "chosen": true
                          }
                        ]
                      }
                    },
      

      note the updated_limit=412 .
      This comes from 20 * (table_records / refkey_rows_estimate)

      refkey_rows_estimate= 492 which comes from table->cond_selectivity=0.048 which is a product of quick select's selectivities.

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            psergei Sergei Petrunia
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.