Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20693

ha_rocksdb::records_in_range() vastly overestimates the number of rows in certain ranges

    XMLWordPrintable

    Details

      Description

      Consider this testcase (not necessarily minimal):

      CREATE TABLE obj2 (
        part_id smallint(5) unsigned NOT NULL,
        oid bigint(20) unsigned NOT NULL,
        tid bigint(20) unsigned NOT NULL,
        filler char(32),
        KEY tid (part_id,tid,oid)
      ) ENGINE=ROCKSDB;
      

      create table ten(a int primary key);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
       
      create table one_k(a int primary key);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
       
      create table hundred(a int primary key);
      insert into hundred select A.a + B.a* 10 from ten A, ten B;
       
      set rocksdb_bulk_load_size=10000, rocksdb_commit_in_the_middle=1;
       
      insert into obj2
      select 
        0, 
        A.a + 1000*B.a + 1000000*C.a,
        A.a + 1000*B.a + 1000000*C.a,
        'filler-data'
      from one_k A, one_k B, hundred C;
       
      insert into obj2
      select 
        1,
        A.a + 1000*B.a + 1000000*C.a,
        A.a + 1000*B.a + 1000000*C.a,
        'filler-data'
      from one_k A, one_k B, hundred C;
      set global rocksdb_force_flush_memtable_and_lzero_now =1;
      

      Ok, now table obj2 has these rows:

      part_id=0, tid=0
      part_id=0, tid=1
      ...
      part_id=0, tid=100M
      part_id=1, tid=0
      part_id=1, tid=1
      ...
      part_id=1, tid=100M
      

      Reading the rows at the very end of the group with part_id=0:

      analyze format=json select * from obj2 force index (tid) where part_id=0 and tid>100000000\G
      *************************** 1. row ***************************
      ANALYZE: {
        "query_block": {
          "select_id": 1,
          "r_loops": 1,
          "r_total_time_ms": 0.2885,
          "table": {
            "table_name": "obj2",
            "access_type": "range",
            "possible_keys": ["tid"],
            "key": "tid",
            "key_length": "10",
            "used_key_parts": ["part_id", "tid"],
            "r_loops": 1,
            "rows": 104100545,
            "r_rows": 0,
            "filtered": 52.05,
            "r_filtered": 100,
            "index_condition": "obj2.part_id = 0 and obj2.tid > 100000000"
          }
        }
      }
      

      Note this part:

            "rows": 104100545,
            "r_rows": 0,
      

      There are 0 rows in the range, but the estimate is 100M rows! Without FORCE INDEX, index tid will not be used, and full table scan will be done, which will kill the performance.

        Attachments

          Activity

            People

            Assignee:
            psergey Sergei Petrunia
            Reporter:
            psergey Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated: