[MDEV-9384] Key File Corruption for TokuDB Created: 2016-01-09  Updated: 2023-09-24  Resolved: 2023-09-24

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - TokuDB
Affects Version/s: 10.1.9, 10.1.10
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: John Barratt Assignee: Unassigned
Resolution: Won't Fix Votes: 1
Labels: tokudb
Environment:

Ubuntu 14.04.3 LTS



 Description   

We have started to see almost daily corruption of of one of our core tables using TokuDB with FAST compression enabled. These tables are manually partitioned by day, and have 180-200 million rows added each day. A small percentage of these rows are deleted (< 5%), and another portion (< 10%) are updated one or more times through the day.

Here is an example table :

*************************** 8. row ***************************
           Name: XXXX_20160107
         Engine: TokuDB
        Version: 10
     Row_format: Dynamic
           Rows: 188682009
 Avg_row_length: 224
    Data_length: 42396852041
Max_data_length: 9223372036854775807
   Index_length: 13764087186
      Data_free: 19841155072
 Auto_increment: 228709329
    Create_time: 2016-01-06 00:00:01
    Update_time: 2016-01-08 23:46:54
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: `COMPRESSION`=TOKUDB_FAST
        Comment:

We find at some point the table becomes corrupt, and that trying to select from a certain sequence of rows fails and reports the following error :

ERROR 1034 (HY000): Incorrect key file for table 'XXXXX_20160108'; try to repair it

As I understand it, it isn't possible to repair a tokudb, the only option is to restore from backups. This server isn't currently running with a slave, so I can't verify if a slave would have been corrupted in the same way.

The database is running on a pair of mirrored SSDs, and only holds up to 8 days worth of these tables, and some views pointing to these tables. Also of note we have had this problem on 10.1.9 and 10.1.10, and on two different servers running different hardware, with different models of CPU and SSDs, so it doesn't appear to be a hardware issue. One of the servers had ECC memory, and one didn't.

I am going to turn of deletions on this table, and just flag rows as deleted to see if that changes the rate of occurrence, or if it fixes it completely, and points to deletion of rows as the problem. My only other anecdotal evidence is that this seems to have only started to happen after we begun doing relatively heavy updates of the data, I don't think it has ever occurred when we were only doing inserts, selects and deletes.

Given it is happening only relatively infrequently (We have had 5 tables corrupted over the last week or so this way) and randomly and on such a large dataset, I am not sure how else to try to debug this further, or what other information may be of help, but please let me know what other details may be of assistance in resolving this issue.



 Comments   
Comment by John Barratt [ 2016-01-09 ]

One other note that may be relevant with regards to the update, is that is a case/when/then statement that is updating 1000 rows at once, each with different values on two integer fields. This is done to be much, much more efficient than 1,000 individual queries (1,000 seemed to be about the sweet spot for this query/table), but it is I would expect a much less commonly used syntax (first time I've ever used it), and hence potentially may be more likely to contain an undiscovered edge case.

Comment by Elena Stepanova [ 2016-01-11 ]

johnbarratt

Does it always happen on the same table, or on tables with identical/similar structure?
Could you please SHOW CREATE TABLE output? You can mask/obfuscate column names and anything else you consider confidential, I am mainly interested in column data types and index structure.

Please also paste the template of the update that you identified as a reason of the trouble, and attach your cnf file(s).

Thanks.

Comment by John Barratt [ 2016-01-12 ]

Thanks for the followup Elena.

  • It happens on different tables with the same structure. We had one table from a few days ago where two blocks of data were corrupted, but typically it was only impacting us once every 1-2 days.
  • I have since modified our update code to do simple update of one row per query about 24hrs ago, and so far, it hasn't become corrupt again. While this needs perhaps another 24hrs to be more certain, it does indicate that the problem is likely related to the CASE statement use.
  • Sample table structure :

 
CREATE TABLE `XXXX_20160108` (
  `id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `X` varchar(255) CHARACTER SET utf8mb4 NOT NULL,
  `X` varchar(15) NOT NULL,
  `X` bigint(20) NOT NULL,
  `X` varchar(30) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `X` float NOT NULL DEFAULT '0',
  `X` float NOT NULL DEFAULT '0',
  `X` mediumint(9) NOT NULL DEFAULT '0',
  `X` tinyint(4) NOT NULL DEFAULT '-1',
  `sequence_id` int(11) NOT NULL AUTO_INCREMENT,
  `X` tinyint(4) NOT NULL DEFAULT '0',
  `X` mediumint(4) unsigned NOT NULL DEFAULT '0',
  `X` tinyint(4) unsigned NOT NULL DEFAULT '0',
  `X` tinyint(4) unsigned NOT NULL DEFAULT '0',
  `X` bigint(20) unsigned NOT NULL DEFAULT '0',
  `X` bigint(20) unsigned NOT NULL DEFAULT '0',
  `added_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `X` bigint(20) unsigned NOT NULL DEFAULT '0',
  `UPDATE_1` int(10) unsigned NOT NULL DEFAULT '0',
  `UPDATE_2` int(10) unsigned NOT NULL DEFAULT '0',
  `X` int(10) unsigned NOT NULL DEFAULT '0',
  `X` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`sequence_id`),
  UNIQUE KEY `id` (`id`),
  KEY `index_tweets_on_created_at` (`created_at`),
  KEY `index_tweets_on_updated_at` (`updated_at`),
  KEY `index_tweets_on_X` (`INT`),
  KEY `index_tweets_on_X` (`INT`,`sequence_id`),
  KEY `index_tweets_on_added_at` (`added_at`),
  KEY `index_tweets_on_X` (`X`)
) ENGINE=TokuDB AUTO_INCREMENT=224285456 DEFAULT CHARSET=utf8 `COMPRESSION`=TOKUDB_FAST;

  • Example update statement, this is with 2 rows being updated, the production one being run had 1,000 :

update XXXX_20151201 set
       UPDATE_1  = CASE id
     	 when 100 then if(UPDATE_1 < 327, 327, UPDATE_1)
	 when 101 then if(UPDATE_1 < 14, 14, UPDATE_1)
       end
       UPDATE_2 = CASE id
	 when 100 then if(UPDATE_2 < 582, 582, UPDATE_2)
	 when 101 then if(UPDATE_2 < 13, 13, UPDATE_2)
       end
where id in (100,101)
 

  • my.cnf

     
    [client]
    port            = 3306
    socket          = /var/run/mysqld/mysqld.sock
     
    [mysqld_safe]
    socket          = /var/run/mysqld/mysqld.sock
    nice            = 0
     
    [mysqld]
    default-storage-engine = tokudb
    user            = mysql
    pid-file        = /var/run/mysqld/mysqld.pid
    socket          = /var/run/mysqld/mysqld.sock
    port            = 3306
    basedir         = /usr
    datadir         = /db/mysql
    tmpdir          = /db/tmp
    lc-messages-dir = /usr/share/mysql
    skip-external-locking
    bind-address            = 0.0.0.0
    key_buffer              = 128M
    max_allowed_packet      = 16M
    thread_stack            = 192K
    thread_cache_size       = 800
    myisam-recover          = BACKUP
    max_connections         = 2500
    query_cache_type        = 0 ## Off
    query_cache_limit       = 8M
    query_cache_size        = 256M
    log_error               = /var/log/mysql/error.log
    expire_logs_days        = 10
    max_binlog_size         = 100M
     
    innodb_thread_concurrency = 0
    innodb_flush_log_at_trx_commit = 2
    innodb_doublewrite = false
    innodb_io_capacity = 2000
    innodb_support_xa = false
    innodb_file_per_table = true
    innodb_buffer_pool_size = 128M
    innodb_read_io_threads = 64
    innodb_write_io_threads = 64
    innodb_log_buffer_size = 128M
    innodb_log_file_size = 512M
    innodb_log_files_in_group = 2
    innodb_max_dirty_pages_pct = 90
    innodb_flush_method = O_DIRECT
    innodb_lock_wait_timeout=100
     
    open_files_limit = 65535
    wait_timeout=3600
     
    [mysqldump]
    quick
    quote-names
    max_allowed_packet      = 16M
     
    [mysql]
     
    [isamchk]
    key_buffer              = 16M
     
    !includedir /etc/mysql/conf.d/

  • tokudb.cnf

     
    [mariadb]
    plugin-load-add=ha_tokudb.so
    tokudb_commit_sync = 0
    tokudb_fsync_log_period = 1000
    tokudb_data_dir=/db/mysql/tokudb
    tokudb_fs_reserve_percent=0

  • One other thing to confirm, the disk the db was on had plenty of free space when these problems have occurred.

Please let me know if there are any other details I can supply.

Thanks,

JB.

Comment by John Barratt [ 2016-01-12 ]

EDIT : This actually isn't confirmed, there was still some code running CASE based updates on the table.

One further followup, we have just experienced a corruption of todays table (bit over 4hrs old, started from UTC 00Z)e. This time though it was on a table that has never had the 'case' based update run on it, so that appears to be a red herring. It hence just seems to be related to the mix of single row updates alongside inserts/deletes that is causing the problem. We had been running just pure inserts and deletes against a 10.1.9 server for many weeks with this volume of data, and never saw any corruption, it was only after adding in updates to the mix that we started seeing issues.

Please let me know if there is any other info I can provide.

Thanks,

JB.

-

Comment by John Barratt [ 2016-01-12 ]

Sorry, one more thing that may further help to pinpoint the corruption, a mysqldump on a 'corrupt' table succeeds OK with no errors reported. However re-importing this data into another database however shows that it has less rows in it than the original corrupt one does. This is around 20k rows out of nearly 200 million.

Comment by Elena Stepanova [ 2016-01-13 ]

johnbarratt,
Regarding your last comment – but does the resulting dump file actually contain 20k rows, or 200m rows?
And if it contains 200m rows, then how is this file created – does it contain INSERT, or REPLACE, or INSERT INGORE statements, or maybe it produces separate data files suitable for LOAD DATA? And how do you load the dump?

Thanks.

Comment by John Barratt [ 2016-01-14 ]

The resulting dump contains around 200m rows, but 20k less rows than are in the original, corrupted table. The dump file is created just by using mysqldump on the table, so just contains the multi-row insert statements it creates by default. It is loaded into another database just by piping the dump file directly into the database via the mysql command.

Thanks,

JB.

Comment by John Barratt [ 2016-01-18 ]

So it seems very likely now that the CASE statement is what is inducing the corruption here. I (actually) stopped all CASE based updates nearly 5 days ago now, and since then we haven't had a single corruption of the database. Doing single updates rather than using the CASE statement though is adding additional load and additional delays in updating the data.

Please let me know if I can provide any more details.

Thanks,

JB.

Comment by John Barratt [ 2016-01-18 ]

We just had a further corruption whilst using single updates after nearly 6 days of no corruption. So while the use of the CASE statement may not be solely responsible, it appears given the relatively long time with no corruption with just simple updates, that it does induce the issue more regularly. I will have to see if we can swap back to no compression, or even perhaps to innodb to see if that avoids further corruption next.

Comment by Elena Stepanova [ 2016-01-21 ]

John Barratt, There is a little problem with your column obfuscation: since it's a key corruption, we need to know on which columns the table has indexes, and now three indexes are indistinguishable. Could you please somehow identify them? Thanks.

Comment by John Barratt [ 2016-01-21 ]

Ahh, sorry, this one should be better. If you need me to send the actual table and some sample data directly to whoever may be investigating, please let me know.

CREATE TABLE XXX_20160118 (
  id bigint(20) unsigned NOT NULL DEFAULT '0',
  X varchar(255) CHARACTER SET utf8mb4 NOT NULL,
  CB varchar(15) NOT NULL,
  CA bigint(20) NOT NULL,
  X varchar(30) NOT NULL,
  created_at timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  X float NOT NULL DEFAULT '0',
  X float NOT NULL DEFAULT '0',
  X mediumint(9) NOT NULL DEFAULT '0',
  X tinyint(4) NOT NULL DEFAULT '-1',
  sequence_id int(11) NOT NULL AUTO_INCREMENT,
  CC tinyint(4) NOT NULL DEFAULT '0',
  X mediumint(4) unsigned NOT NULL DEFAULT '0',
  CD tinyint(4) unsigned NOT NULL DEFAULT '0',
  X tinyint(4) unsigned NOT NULL DEFAULT '0',
  X bigint(20) unsigned NOT NULL DEFAULT '0',
  X bigint(20) unsigned NOT NULL DEFAULT '0',
  added_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  X bigint(20) unsigned NOT NULL DEFAULT '0',
  UPDATE_1 int(10) unsigned NOT NULL DEFAULT '0',
  UPDATE_2 int(10) unsigned NOT NULL DEFAULT '0',
  X int(10) unsigned NOT NULL DEFAULT '0',
  X int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (sequence_id),
  UNIQUE KEY id (id),
  KEY index_XXX_on_created_at (created_at),
  KEY index_XXX_on_updated_at (updated_at),
  KEY index_XXX_on_CB (CB),
  KEY index_XXX_on_CC_seq (CC,sequence_id),
  KEY index_XXX_on_added_at (added_at),
  KEY index_XXX_on_CD (CD)
) ENGINE=TokuDB AUTO_INCREMENT=219849167 DEFAULT CHARSET=utf8 COMPRESSION=TOKUDB_FAST;
 

Comment by John Barratt [ 2016-01-21 ]

As an aside, I am still only doing single updates to the table, and have changed from TOKUDB_FAST to TOKUDB_LZMA compression to see if that makes any difference. So far (since my update a few days ago) there is no corruption, but it is only early days. I can't try no compression, or innodb with the same setup at the moment due to disk space limitations on the server it is running.

Thanks,

JB.

Comment by Elena Stepanova [ 2016-01-22 ]

johnbarratt,

I've been running some test flow with UPDATEs of the same pattern, we'll see how it goes.
Meanwhile, two more questions.
1) Are updates (or any other queries) happening concurrently, or do you have a single-threaded flow?
2) In the initial description you mentioned that you were going to disable DELETEs, but I didn't find any notes about results. Did you do it, or do you still have those DELETEs?

Thanks.

Comment by John Barratt [ 2016-01-22 ]

Hi Elena,

  • There are two separate processes running updates that appear to be causing the problem, so two updates can occur at the same time (of two different fields). It is also possible (though rare) that these update the same row at the same time.
  • The deletes were disabled completely and this appeared to make no difference. Now however that this was changed into an update of a flag field to indicate the row was effectively 'deleted'. This is the last 'tinyint' field in the table, and is not indexed. This is updated by doing a bitwise update on the field like this: FIELD=FIELD|32.
  • If you are trying to replicate, it appears use of a CASE statement induced the problem more often. We were updating 1,000 rows with different values per query using this method.

Please let me know if you need any more info.

Thanks,

JB

Comment by John Barratt [ 2016-01-25 ]

Further quick update to this, we've now had 3 further corruptions in just under 2 days whilst using LZMA compression, and single updates. Two of the corruptions occured within an hour of one another.

Comment by John Barratt [ 2016-05-12 ]

Just wanted to check in on this to see if there had been any progress, and give an update. We are still experiencing this issue every few days, and is causing more and more problems for us.

Some new information :

  • We have never seen it happening on an identical system that is processing about 1/2 the volume of data (around 120 million rows added per day). Though also this system doesn't see the regular updates either.
  • I have converted tables to use TOKUDB_FAST instead of TOKUDB_LZMA, but the corruption continues, so it appears not to be specific to the compression used, though of course could still be related to the fact that compression is enabled. Unfortunately due to disk space on the SSDs we are unable to easily test this though.

Thanks,

JB.

Comment by Hayden Clark [ 2017-02-01 ]

Hmm. This one has been "open" for a year.
I'm here as I've just been bitten by the same bug. A 178m row tokudb table gets added to, updated, and aged. I now can't make queries on it because of
1034, "Incorrect key file for table 'messages'; try to repair it"
It seems to be on a small section of the table/index, as the queries seem to barf in the same date range.

Comment by Elena Stepanova [ 2017-02-01 ]

hayden,

Yes, unfortunately the bug has had no action for over half a year because we couldn't create a repeatable test case. Can you provide one?
If we have it, we can check it against upstream TokuDB and if the problem is repeatable there, re-file the bug for them. If it turns out to be MariaDB specific, our development will take care of it.

Comment by Hayden Clark [ 2017-02-02 ]

It's a bit tricky. The problem happens occasionally, under heavy load, with very large datasets. Even if scripting a test case was possible, I have no idea what the sensitivity of the bug is.

This is really one of those really hard bugs to spot. The actual error may lie in the table or its indexes for days before a query hits the bad patch.

How can we proceed with this? What logs can I turn on to diagnose this in the future?
Without a fix, I'll have to downgrade to Innodb, and that will reduce performance.

Comment by John Barratt [ 2017-02-02 ]

Sorry to hear you've got the same problem Hayden, though I am selfishly pleased that there is someone else out there that may be able to help get to the bottom of it. Our situation seems absolutely identical to yours. It has been still happening to us, we have just put in place some work arounds to minimise the impact, as we aren't in a position to be able to downgrade to innodb due to the size of the data and the hardware available. Dumping and recreating the table 'fixes' it with no lost data it seems, it is only the index that is breaking, but that isn't a practical solution.

We have tried to keep up with the latest releases of mariadb, but it is still happening to us every few days, even on v10.1.20.

One clarification on this, the table this corruption is occurring on is currently write once and forget, ie it is effectively just logging data. The rows are never modified or deleted after insertion, and a new one gets created each day, and gets a new 250million odd rows. Also as before, an identical system with less data (about half the rows, same structure) has never seen this problem, though it has less read load on the data as well.

Comment by dennis [ 2018-03-13 ]

I got the same bug in the percona 5.7, and report the bug:
https://jira.percona.com/browse/PS-3773

But there is no solution now.
It seams that, this bug is only related to percona5.7 or related mariadb version (10.1 or higher).

Comment by Elena Stepanova [ 2023-09-24 ]

TokuDB engine is no longer maintained in MariaDB, and as of 10.5, no longer released.

Generated at Thu Feb 08 07:34:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.