Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20693

ha_rocksdb::records_in_range() vastly overestimates the number of rows in certain ranges

    XMLWordPrintable

Details

    Description

      Consider this testcase (not necessarily minimal):

      CREATE TABLE obj2 (
        part_id smallint(5) unsigned NOT NULL,
        oid bigint(20) unsigned NOT NULL,
        tid bigint(20) unsigned NOT NULL,
        filler char(32),
        KEY tid (part_id,tid,oid)
      ) ENGINE=ROCKSDB;
      

      create table ten(a int primary key);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
       
      create table one_k(a int primary key);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
       
      create table hundred(a int primary key);
      insert into hundred select A.a + B.a* 10 from ten A, ten B;
       
      set rocksdb_bulk_load_size=10000, rocksdb_commit_in_the_middle=1;
       
      insert into obj2
      select 
        0, 
        A.a + 1000*B.a + 1000000*C.a,
        A.a + 1000*B.a + 1000000*C.a,
        'filler-data'
      from one_k A, one_k B, hundred C;
       
      insert into obj2
      select 
        1,
        A.a + 1000*B.a + 1000000*C.a,
        A.a + 1000*B.a + 1000000*C.a,
        'filler-data'
      from one_k A, one_k B, hundred C;
      set global rocksdb_force_flush_memtable_and_lzero_now =1;
      

      Ok, now table obj2 has these rows:

      part_id=0, tid=0
      part_id=0, tid=1
      ...
      part_id=0, tid=100M
      part_id=1, tid=0
      part_id=1, tid=1
      ...
      part_id=1, tid=100M
      

      Reading the rows at the very end of the group with part_id=0:

      analyze format=json select * from obj2 force index (tid) where part_id=0 and tid>100000000\G
      *************************** 1. row ***************************
      ANALYZE: {
        "query_block": {
          "select_id": 1,
          "r_loops": 1,
          "r_total_time_ms": 0.2885,
          "table": {
            "table_name": "obj2",
            "access_type": "range",
            "possible_keys": ["tid"],
            "key": "tid",
            "key_length": "10",
            "used_key_parts": ["part_id", "tid"],
            "r_loops": 1,
            "rows": 104100545,
            "r_rows": 0,
            "filtered": 52.05,
            "r_filtered": 100,
            "index_condition": "obj2.part_id = 0 and obj2.tid > 100000000"
          }
        }
      }
      

      Note this part:

            "rows": 104100545,
            "r_rows": 0,
      

      There are 0 rows in the range, but the estimate is 100M rows! Without FORCE INDEX, index tid will not be used, and full table scan will be done, which will kill the performance.

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.