[MDEV-20568] Tokudb, very strange, substantial performance drop Created: 2019-09-12  Updated: 2019-09-12

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - TokuDB
Affects Version/s: 10.3.12, 10.4.7
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Slawomir Pryczek Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

FC 30,
Debian 9 Vmware



 Description   

Hi Guys, during some testing I found a very strange performance issue with toku. Basically when doing a select with WHERE date IN (....) - tokudb is about 4 times as fast to run the select for multiple days than for single day.

Checked this on innodb as well and the EXPLAIN format=json is same as for toku, but execution times are exactly the same. So this problem seems to be limited to tokudb only.

To reproduce it you'll need 2 queries
A. SELECT /SQL_NO_CACHE/ sum(impressions) FROM stats_tr_creative_2001 where date IN ('2019-08-01', '2030-08-02')

B. SELECT /SQL_NO_CACHE/ sum(impressions) FROM stats_tr_creative_2001 where date IN ('2019-08-01')

Always run query A first and A is always faster for us than query B, even with cold keycache on A and hot keycache on B ... A would be ~2 times faster. When keycache status is same we have around x4 difference (eg. 0.6s for A and 2.8s on B)

Reporting as minor because workaround is just to include some non-existing date in the past or in the future, but maybe fixing it would improve toku performance a lot in other scenarios.

For query A.

{
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "stats_tr_creative_2001",
      "partitions": ["p1"],
      "access_type": "ref",
      "possible_keys": ["raw_stats_lookup_id", "lookup_date"],
      "key": "raw_stats_lookup_id",
      "key_length": "3",
      "used_key_parts": ["date"],
      "ref": ["const"],
      "rows": 771332,
      "filtered": 100
    }
  }
}

For query B.

{
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "stats_tr_creative_2001",
      "partitions": ["p1", "p2"],
      "access_type": "range",
      "possible_keys": ["raw_stats_lookup_id", "lookup_date"],
      "key": "raw_stats_lookup_id",
      "key_length": "3",
      "used_key_parts": ["date"],
      "rows": 771335,
      "filtered": 100,
      "attached_condition": "stats_tr_creative_2001.`date` in ('2019-08-01','2030-08-02')"
    }
  }
}

Table struct:

CREATE TABLE `stats_tr_creative_2001` (
  `network_id` smallint(6) NOT NULL,
  `creative_id` int(11) NOT NULL,
  `stats_tr_value_id_domain` int(11) NOT NULL,
  `stats_tr_value_id` int(11) NOT NULL,
  `date` date NOT NULL,
  `impressions` int(11) NOT NULL,
  UNIQUE KEY `raw_stats_lookup_id` (`date`,`network_id`,`creative_id`,`stats_tr_value_id_domain`,`stats_tr_value_id`) USING BTREE,
  KEY `lookup_date` (`date`,`stats_tr_value_id_domain`)
) ENGINE=TokuDB DEFAULT CHARSET=ascii
 PARTITION BY HASH (to_days(`date`) DIV 20)
PARTITIONS 5;



 Comments   
Comment by Slawomir Pryczek [ 2019-09-12 ]

SQL to populate the table:

INSERT INTO `stats_tr_creative_2001` SELECT seq%10, seq%800000, seq, seq, date_add('2019-08-01', INTERVAL seq/100000 DAY), (seq%50)+(seq/100000)
	FROM seq_1_to_20000000;

 

INSERT INTO `stats_tr_creative_2001` SELECT seq%10, seq%800000, seq, seq, date_add('2019-08-01', INTERVAL seq/100000 DAY), (seq%50)+(seq/100000)
FROM seq_1_to_20000000;
Affected rows: 20000000
Time: 96.678s

[SQL]

SELECT sum(impressions) FROM stats_tr_creative_2001 where date IN ('2019-08-01', '2030-08-02');
Affected rows: 0
Time: 0.035s

[SQL]

SELECT sum(impressions) FROM stats_tr_creative_2001 where date IN ('2019-08-01');
Affected rows: 0
Time: 0.115s

{/code}
Generated at Thu Feb 08 09:00:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.