[MDEV-20693] ha_rocksdb::records_in_range() vastly overestimates the number of rows in certain ranges - Jira

XML

Word

Printable

Details

Type: Bug
Status: Confirmed (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.2(EOL), 10.3(EOL), 10.4(EOL)
Fix Version/s: 10.4(EOL)
Component/s: Storage Engine - RocksDB
Labels:
- upstream
- upstream-fixed

Description

Consider this testcase (not necessarily minimal):

CREATE TABLE obj2 (

  part_id smallint(5) unsigned NOT NULL,

  oid bigint(20) unsigned NOT NULL,

  tid bigint(20) unsigned NOT NULL,

  filler char(32),

  KEY tid (part_id,tid,oid)

) ENGINE=ROCKSDB;

create table ten(a int primary key);

insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int primary key);

insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table hundred(a int primary key);

insert into hundred select A.a + B.a* 10 from ten A, ten B;

set rocksdb_bulk_load_size=10000, rocksdb_commit_in_the_middle=1;

insert into obj2

select

0,

  A.a + 1000*B.a + 1000000*C.a,

  A.a + 1000*B.a + 1000000*C.a,

  'filler-data'

from one_k A, one_k B, hundred C;

insert into obj2

select

1,

  A.a + 1000*B.a + 1000000*C.a,

  A.a + 1000*B.a + 1000000*C.a,

  'filler-data'

from one_k A, one_k B, hundred C;

set global rocksdb_force_flush_memtable_and_lzero_now =1;

Ok, now table obj2 has these rows:

part_id=0, tid=0

part_id=0, tid=1

...

part_id=0, tid=100M

part_id=1, tid=0

part_id=1, tid=1

...

part_id=1, tid=100M

Reading the rows at the very end of the group with part_id=0:

analyze format=json select * from obj2 force index (tid) where part_id=0 and tid>100000000\G

*************************** 1. row ***************************

ANALYZE: {

  "query_block": {

    "select_id": 1,

    "r_loops": 1,

    "r_total_time_ms": 0.2885,

    "table": {

      "table_name": "obj2",

      "access_type": "range",

      "possible_keys": ["tid"],

      "key": "tid",

      "key_length": "10",

      "used_key_parts": ["part_id", "tid"],

      "r_loops": 1,

      "rows": 104100545,

      "r_rows": 0,

      "filtered": 52.05,

      "r_filtered": 100,

      "index_condition": "obj2.part_id = 0 and obj2.tid > 100000000"

Note this part:

      "rows": 104100545,

      "r_rows": 0,

There are 0 rows in the range, but the estimate is 100M rows! Without FORCE INDEX, index tid will not be used, and full table scan will be done, which will kill the performance.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

psergey-mdev20693.diff
6 kB
2019-09-29 13:47

Issue Links

is blocked by

MDEV-30610 Update RocksDB to the latest upstream version

Open

Activity

People

Assignee:: Sergei Petrunia

Reporter:: Sergei Petrunia

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2019-09-27 21:21

Updated:: 2023-06-22 14:06

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.