[MDEV-6207] Strange behaviour of OPTIMIZE TABLE with TokuDB [2x table size increase after usage] Created: 2014-05-04 Updated: 2014-08-03 Resolved: 2014-08-03 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 10.0.10 |
| Fix Version/s: | 5.5.39, 10.0.13 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexander Loginov | Assignee: | Rich Prohaska |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | tokudb, upstream | ||
| Environment: |
Debian Wheezy 7.5 64-bit |
||
| Issue Links: |
|
||||||||
| Description |
|
Hi. I am using MariaDB for 2 years. I moved to MariaDB from Percona MySQL edition. Currently i am using it on 100+ production servers, and many of them are highloaded servers with 50M rows tables or more. Optimization is very important to me. At the moment i am using TokuDB as the primary database type. For test i am using 10.0.10-MariaDB-1~wheezy-log version of MySQL on Debian Wheezy 7.5 64-bit. MySQL settings is not important here, it's near default. TokuDB settings is here:
For the test i've copy one of my production databases to new database "tester_testdb" with one table called "test_table" inside this database. This table has near 1 million rows and 2 keys. 1 - primary (id). 2 - pub_key(pub_key). It's MyISAM table. An initial file size of this table is 110 megabytes.
Structure:
Just to be sure, i did OPTIMIZE TABLE on this MyISAM table. Size did not changed. Then, let's convert this table to TokuDB. We'll use default zlib compression.
All seems to be OK here. We have near the same table size - summary ~93 mbytes. Now, let's do optimize. OPTIMIZE TABLE test_table Look at the file system again.
Hmm? Table size is now increased to ~125 mbytes. Why??? Let's do optimize again. OPTIMIZE TABLE test_table
Wow... we got another increase in table size. Why??? Let's do optimize again. OPTIMIZE TABLE test_table
Ok. We reach the final destination. Database size should be stable now. Let's then convert our secondary index to clustering index. ALTER TABLE test_table DROP INDEX pub_key
Ok. Second step. ALTER TABLE test_table ADD INDEX pub_key(pub_key) CLUSTERING=Yes
All seems to be OK here. We have clustering index. Let's do optimize. OPTIMIZE TABLE test_table
Now it returns to higher value once again. So now the total size of our database and table is ~140 mbytes. But...with MyISAM we have 110 mbytes. That's the problem? I think, that here's something wrong with OPTIMIZE TABLE. I think so, because i can have more compacted table with just one more command. ALTER TABLE test_table ENGINE=TokuDB COMPRESSION=TokuDB_zlib
Here i've just altered table once again. And table size dropped ~2x. That's really strange. This (~66 mbytes) should be the final and correct table size. But after table optimize i got 2x increase in table size and it's very very bad. Please, try to repeat my experiment yourself and review is there any problem with OPTIMIZE TABLE command on TokuDB tables or not. Thanks in advance for your time, Alex. |
| Comments |
| Comment by Elena Stepanova [ 2014-05-04 ] |
|
I could reproduce the described behavior, at least the part about the first OPTIMIZE. I've never seen further growth on the 2nd OPTIMIZE, but sometimes TokuDB files are updated slowly, maybe TokuDB was still writing the file when you measured the size after the first OPTIMIZE; or, maybe in some cases it does happen that the 2nd OPTIMIZE causes this effect. Anyway, for the 1st OPTIMIZE, it's also reproducible on TokuDB builds, both with MariaDB and MySQL (I tried 5.5.36 for both), and with all compression algorithms. I hope Tokutek people will be able to shed some light on it. |
| Comment by Alexander Loginov [ 2014-05-05 ] |
|
Will you forward this report to Tokutek or i should contact them myself? |
| Comment by Elena Stepanova [ 2014-05-05 ] |
|
We already have. The report is assigned to Rich Prohaska who is our Tokutek contact here in JIRA, let's see if he confirms it is a bug. |
| Comment by Leif Walsh [ 2014-05-05 ] |
|
This is expected behavior for small tables, and it's because OPTIMIZE TABLE has a different meaning in TokuDB. TokuDB stores data in fractal trees, which are like B-trees but with buffers on the internal nodes that can store messages (inserts, updates, deletes, etc) destined for leaves below them. The OPTIMIZE TABLE command flushes all of those buffers in depth-first left-to-right order, to flatten the tree. This dirties all the nodes in the tree over time, so they'll all get written out again. Since our block allocator is copy-on-write, checkpoint makes new copies of those nodes that were changed between each checkpoint, and the old versions of those nodes are freed after the checkpoint completes. There is an awkward interaction between checkpoints, the block allocator, and OPTIMIZE TABLE on small trees: if the OPTIMIZE runs fast enough, it can dirty the entire tree in between checkpoints, and then on the next checkpoint, the block allocator essentially needs to duplicate the entire tree. If the tree was already fairly compact, this will nearly double the file size, and when the checkpoint completes, most of the beginning half of the file will be freed. After this, you'll have to wait until all the nodes get dirtied again and get written out to fill in all those holes early in the file, before the file can be truncated and start to shrink. So my advice is, don't run OPTIMIZE TABLE on small TokuDB tables, it generally isn't necessary and will just make strange things happen. |
| Comment by Rich Prohaska [ 2014-05-05 ] |
|
The final 'alter table engine=tokudb' operation rebuilds the table by coping all of the rows from the original table to a new table. At the end, mysql renames the new table and deletes the old table. Since the new table is created from scratch, its size is expected to be smaller than the original table since there is only one copy of the fractal tree blocks. |
| Comment by Alexander Loginov [ 2014-05-05 ] |
|
Big thanks for the explanation, but can you clarity what "small" means? For example table with 100M rows. It's about 10 GB of data. It's "small" for TokuDB, right? |
| Comment by Rich Prohaska [ 2014-05-05 ] |
|
'select * from information_schema.tokudb_fractal_tree_info' returns bytes used and bytes allocated for each fractal tree. 'allocated-used' is the amount of free space in the fractal tree file that will be used by new blocks as data gets added to the table. someday, we will implement algorithms that return the free space to the file system. |
| Comment by Alexander Loginov [ 2014-05-05 ] |
|
Thanks. BTW, i guess it can be a good idea to invent a session variable tokudb_optimizetable (integer 0 or 1). If 0 - "optimize table" action will just do nothing, but returns success. Or 1 - it will act just like now. Defaults to 1. As far as it session variable, in my.cnf it will set default value, but it can be changed in the session. It can be useful for people, who have many small table (let's say <100M rows), but a few big tables |
| Comment by Rich Prohaska [ 2014-05-06 ] |
|
I don't understand the proposal. Why not just decide to not execute an optimize statement? |
| Comment by Alexander Loginov [ 2014-05-06 ] |
|
Well, there are sometimes no way to control an application or user(s) which use database. Application and users can do Optimize table on tables and increase database size this way. Most of Content Management Systems use MySQL database optimize every day. It's also can be accidently caused by yourself and then it will be a pain to revert the process, it'll cause a lot of downtime. Also the bad thing at the moment, is that there are no Hot optimize table operation at MariaDB. So it can be better to prevent it with MySQL server config. From what i tested, with 500M rows, i still have 2x increase in database size. It can really break predictions of drive space needed for database If you think that it is not needed, nothing wrong here. Thanks for your time. |
| Comment by Rich Prohaska [ 2014-05-07 ] |
|
TokuDB (at least Tokutek's version) supports hot optimize table. hot in the sense that that concurrent reads and writes can occur while optimize is running. Would this feature be useful? |
| Comment by Alexander Loginov [ 2014-05-07 ] |
|
Sure it will. I am using MariaDB native version at the moment. But Optimize table is still can take hours to be done. |
| Comment by Rich Prohaska [ 2014-06-13 ] |
|
TokuDB 7.1.7 has online optimize table for MariaDB 10.0 which will allow concurrent table optimize + table reads + table writes. |
| Comment by Alexander Loginov [ 2014-06-13 ] |
|
Yeah, i know, thanks. |
| Comment by Sergei Golubchik [ 2014-08-03 ] |
|
tokudb 7.17. is merged into mariadb 5.5.39 (and will be in 10.0.13) |