[MDEV-5595] Bug in Index on deletion in TokuDB Created: 2014-01-31  Updated: 2014-05-06  Resolved: 2014-05-06

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.7
Fix Version/s: 10.0.11

Type: Bug Priority: Major
Reporter: Alexey Zilber Assignee: Unassigned
Resolution: Fixed Votes: 1
Labels: tokudb, upstream
Environment:

CentOs, TokuDB 7.1.0



 Description   

Hi All,

We are running into a very weird issue with slow and hanging queries on TokuDB 7.1.0. Here's some background and configs:
DB server has 264GB of ram. 1.7TB of RAID-10 SSD storage for data, and 876GB of RAID-1 SSD storage for binary logs.

Since beginning to write this post, we've identified the exact issue and it seems to be a bug in the way TokuDB handles indexes when an entry is deleted. Here's some background information as part of the original message I was writing, as well as our temporary workaround:

Previously we were running on MariaDB 5.3/XtraDB with no issues. The reason we switched to TokuDB is for space savings. We are now experiencing terrible performance issues on an email table, causing us to have to truncate the table every few months. We have a cleanup script running that deletes everything from the table that's older then 50 days. There are around 180M rows in the table, consisting of emails.

General config:

##Encodings/Collation 
default-storage-engine=TokuDB 
character-set-server=utf8 
collation_server=utf8_unicode_ci
max_connections = 5000 
thread_cache_size = 900
query_cache_size= 0 
query_cache_type = 0
table_cache = 4096

TokuDB specific config:

##TokuDB
tokudb_cache_size = 180G
tokudb_lock_timeout = 20000
tokudb_row_format = tokudb_small
tokudb_directio = ON
tokudb_read_buf_size=1048576
tokudb_load_save_space=on
tokudb_commit_sync=off

Things run perfectly fine until simple queries of the type: select * from `InBox` where `id`=12345 limit 1

Here's an explain (from the slave DB, prior to our latest truncate on the master), please note that the slave server is identical to the master:

explain select * from `InBox` where `id`=752371516 limit 1;
+------+-------------+------------+-------+---------------+---------+---------+-------+------+-------+
| id   | select_type | table      | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+------+-------------+------------+-------+---------------+---------+---------+-------+------+-------+
|    1 | SIMPLE      | InBox | const | PRIMARY       | PRIMARY | 8       | const |    1 |       |
+------+-------------+------------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.00 sec)

The select also returns in (0.00 sec).

Now here's one problem. If you do a query, that references an item that used to exist, but has been deleted. The query will scan the whole db. This can quickly kill the database:

If you do an explain

 explain select * from `InBox` where `id`=209685691 limit 1;
+------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                               |
+------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
|    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Impossible WHERE noticed after reading const tables |
+------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
1 row in set (35 min 10.07 sec)

or do a select:

select * from `InBox` where `id`=209685691 limit 1;
Empty set (1 min 32.46 sec)

But if you do it in an index that has never existed, this is not a problem:

select * from `InBox` where `id`=99752371516 limit 1;
Empty set (0.00 sec)

explain select * from `InBox` where `id`=99209685691 limit 1;
+------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                               |
+------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
|    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Impossible WHERE noticed after reading const tables |
+------+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
1 row in set (0.00 sec)

select * from `InBox` where `id`=99752371516 limit 1;
Empty set (0.00 sec)

As you can see by the execution times, the difference is huge, and any accidental access to an old, non-existant item is pretty catastrophic.

Our current workaround, is never to delete. We will have to wait till the table fills up, then truncate it.

Some additional worries from the previous message:

Additionally, previously on InnoDB, we would do EXPLAIN statements and use the # of rows estimated. This was very accurate to the # of rows returned. If i were to do the statement without the explain like i could do with certain indexes, EXPLAIN SELECT * from BLAH use index(`sometimeindex`) where `sometimeindex` > NOW() - interval 1 day and the # of estimated rows in the explain statement would be nearly equal to that if i ran the query itself. But now if i do that, the # of rows returned in the explain doesn't actually change.

We're worried that with the constant deletes, that the indexes are getting seriously messed up.

This is the schema for the (obfuscated) InBox table:

CREATE TABLE `InBox` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `cc1` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `cc2` timestamp NULL DEFAULT NULL,
  `cc3` tinyint(4) NOT NULL,
  `cc4` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
  `cc5` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
  `cc6` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
  `cc7` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
  `cc8` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
  `cc9` longtext COLLATE utf8_unicode_ci NOT NULL,
  `cc10` longtext COLLATE utf8_unicode_ci NOT NULL,
  `cc11` timestamp NULL DEFAULT NULL,
  `cc12` tinyint(4) NOT NULL,
  `cc13` tinyint(4) NOT NULL,
  `cc14` tinyint(4) NOT NULL,
  `cc15` tinyint(4) NOT NULL,
  `cc16` tinyint(4) NOT NULL,
  `cc17` tinyint(4) NOT NULL,
  `cc18` varchar(150) COLLATE utf8_unicode_ci DEFAULT NULL,
  `cc19` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `cc20` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `cc21` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `cc18` (`cc18`,`cc19`,`cc1`),
  KEY `cc11` (`cc11`,`cc1`),
  KEY `cc3` (`cc3`,`id`),
  KEY `cc4` (`cc4`,`cc1`),
  KEY `cc5` (`cc5`,`cc1`),
  KEY `cc20` (`cc20`),
  KEY `cc1` (`cc1`),
  KEY `cc16` (`cc16`,`cc1`),
  KEY `cc18_2` (`cc18`,`cc19`,`cc16`,`cc1`),
  KEY `cc21` (`cc21`),
  KEY `cc3_2` (`cc3`,`cc21`,`id`)
) ENGINE=TokuDB AUTO_INCREMENT=934472644 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ROW_FORMAT=TOKUDB_SMALL;

Here's the DELETE sql:

DELETE FROM `InBox` where `id` IN (1,2,3,4,5) limit 5;

Thank you!
Alex



 Comments   
Comment by Alexey Zilber [ 2014-01-31 ]

To add some more info. This event is observable when we begin to delete data after aproximately 50 days of inserts. At that point, there's aprox. 170+ million rows, and 400GB of compressed data.

Comment by Alexey Zilber [ 2014-03-10 ]

Could we assign this to someone? This is a very critical bug, possibly even blocker level, as the deletions cannot be performed on a huge tokudb table without rebuilding indexes.

Comment by Elena Stepanova [ 2014-03-10 ]

Hi Alexey,

The issue has been discussed with TokuDB people, but none of us was able to reproduce the problem so far, so we are eagerly awaiting the new information from you. You wrote in the Google group:

I'm currently testing it on a little server of mine.

and (about OPTIMIZE)

I can try this on the slave server in about two months and see if it works. But, even if it does, it won't solve the problem on the master.

But you haven't given an update on either of the experiments.
We are particularly interested in the "little server" test, as it might be our only chance to reproduce the problem. Tokutek, on the other hand, is apparently interested in OPTIMIZE TABLE result – even though it won't solve the problem, it will probably give more information to tmcallaghan to analyze the issue further.

Comment by Tim Callaghan (Inactive) [ 2014-03-10 ]

The current work around is to optimize the table.

Comment by Alexey Zilber [ 2014-03-10 ]

Optimize table is not workable on a multi-terabyte database after every delete. Not deleting at all seems to be the only solution for now.

I will try doing further tests as per the MariaDB's team.

Comment by Tim Callaghan (Inactive) [ 2014-03-10 ]

Alexey, partitioning might also be a workable solution for your use-case.

Comment by Alexey Zilber [ 2014-03-22 ]

We may have run into a different scenario with this bug where it's more testable, but the implications are potentially worse.
We have another table with 90M rows on two different database servers (both are TokuDB 7.1.0, one has 256GB of ram, the other 128GB of ram (and non-ssd)

We noticed the same symptoms as we described initially. The table status:

MariaDB [masked_db]> show table status like 'MaskedTable1'\G;
*************************** 1. row ***************************
           Name: MaskedTable1
         Engine: TokuDB
        Version: 10
     Row_format: tokudb_small
           Rows: 38975004
 Avg_row_length: 70
    Data_length: 2751257149
Max_data_length: 9223372036854775807
   Index_length: 3215417738
      Data_free: 170606592
 Auto_increment: NULL
    Create_time: 2013-08-19 07:50:23
    Update_time: 2014-03-20 17:07:13
     Check_time: NULL
      Collation: utf8_unicode_ci
       Checksum: NULL
 Create_options: row_format=TOKUDB_SMALL
        Comment:
1 row in set (0.00 sec)

This is the original query we were running:

SELECT id, field1, field2
FROM MaskedTable1
FORCE INDEX ( `active` )
WHERE `active` =1
AND field2 <= NOW( )
ORDER BY field2 DESC
LIMIT 8000;

We realized, when we compared the explain of this same query to the other DB (which at that point was not having this issue, but which started having the same issue a few hours later), that the query type went from a range type, to a ref type.

We ran analyze table on 'MaskedTable1', which did nothing. We then ran optimize table on 'MaskedTable1' and this fixed the issue. Unfortunately, if fixed the issue for a approximately 8 hours. It seems the query optimiser is also perhaps having issues. We're attempting another work-around right now, that seems to be working.

It's best shown via these explain statements. These were done on the same db, seconds after each other, with no optimise table run between them:

MariaDB [masked_db]> explain SELECT id, field1, field2 FROM MaskedTable1 FORCE INDEX ( `active` ) WHERE `active` =1 AND field2 <= NOW( ) ORDER BY field2 DESC LIMIT 8000; 
+------+-------------+--------------+------+---------------+--------+---------+-------+-----------+--------------------------+
| id   | select_type | table        | type | possible_keys | key    | key_len | ref   | rows      | Extra                    |
+------+-------------+--------------+------+---------------+--------+---------+-------+-----------+--------------------------+
|    1 | SIMPLE      | MaskedTable1 | ref  | active        | active | 1       | const | 114653697 | Using where; Using index |
+------+-------------+--------------+------+---------------+--------+---------+-------+-----------+--------------------------+
1 row in set (0.00 sec)
 
MariaDB [masked_db]> explain SELECT id, field1, field2 FROM MaskedTable1 FORCE INDEX ( `active` ) WHERE `active` =0 AND field2 <= NOW( ) ORDER BY field2 DESC LIMIT 8000; 
+------+-------------+--------------+-------+---------------+--------+---------+------+----------+--------------------------+
| id   | select_type | table        | type  | possible_keys | key    | key_len | ref  | rows     | Extra                    |
+------+-------------+--------------+-------+---------------+--------+---------+------+----------+--------------------------+
|    1 | SIMPLE      | MaskedTable1 | range | active        | active | 5       | NULL | 45986026 | Using where; Using index |
+------+-------------+--------------+-------+---------------+--------+---------+------+----------+--------------------------+
1 row in set (0.00 sec)

Based on the explains above, the workaround is to set active=0 right before we issue a DELETE. This is what led us to believe this issue is related to the original bug filed.

Since this issue is becoming more critical, and is reproducible on even smaller tables now, I believe the bug needs to be escalated. Either it's an index issue or a big problem with the query optimiser.

Comment by Sergei Golubchik [ 2014-04-07 ]

This is TokuDB issue. When it will be fixed upstream we will merge the new TokuDB with the fix.

But as of TokuDB 7.1.5 it is not fixed yet.

Comment by Rich Prohaska [ 2014-04-28 ]

tokudb 7.1.6 fixed the long point query times for deleted keys. see https://github.com/Tokutek/ft-index/issues/218.

Generated at Thu Feb 08 07:05:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.