[MDEV-22839] ROW_FORMAT=COMPRESSED vs PAGE_COMPRESSION=1 - size comparison Created: 2020-06-08 Updated: 2024-01-02 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Server |
| Fix Version/s: | 10.6 |
| Type: | Task | Priority: | Major |
| Reporter: | Nuno | Assignee: | Marko Mäkelä |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Description |
|
I have the following tables: Table1: . . . It became like this: . It became like this: . I understand that the actual file sizes don't matter so much (they can be sparse files), Table1 – From no compression at all, to be compressed, the data increased 1GB, and indexes doubled. Table2 – From ROW_FORMAT=COMPRESSED to maximum level of PAGE_COMPRESSION, both data and index increased considerably. Does it make sense that PAGE_COMPRESSION, even with level 9, is compressing less than ROW_FORMAT=COMPRESSED? |
| Comments |
| Comment by Marko Mäkelä [ 2020-06-09 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you for the input. If your test includes long string columns that could be stored off-page by InnoDB, then ROW_FORMAT=COMPRESSED would gain some advantage, by compressing the singly-linked lists of pages of such columns as one zlib stream. That is similar to what The hole-punching of page_compressed tables could significantly reduce the file system performance. It was originally developed for an SSD manufacturer’s file system NVMFS. You are not the first one to question the usefulness: Wanted: a file system on which InnoDB transparent page compression works was posted in 2015. Note: the logical file size will not change as a result of the hole-punching. You should monitor the physical file size (excluding any ‘sparse’ blocks), as reported by ls -s.
Disclosure: I did the low-level design and the implementation of ROW_FORMAT=COMPRESSED based on a high-level idea of Heikki Tuuri, between 2005 and 2007. I believe that even that format is past its ‘best before’ date. The implementation is also complicating the operation of the buffer pool and crash recovery. Because Heikki wanted to minimize the changes to page headers, we are wasting close to 100 bytes per page. And because the page footer format that I designed uses 14-bit offsets, we can only support innodb_page_size≤16k. I did not even consider extending the format for instant column changes ( I think that we should gradually deprecate support for ROW_FORMAT=COMPRESSED, retaining read-only support to allow upgrades. This would simplify the buffer pool code further. When it comes to page_compressed, in my opinion we should consider removing the option altogether, maybe retaining read-only support for a few major releases. Existing data files could be converted with an external tool. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-06-09 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In MariaDB 10.5, the write performance to ROW_FORMAT=COMPRESSED tables may degrade due to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nuno [ 2020-06-09 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello, Thank you very much for your detailed response, which I appreciate very much. I see you want to deprecate ROW_FORMAT=COMPRESSED, which I'm ok with, as long as the alternative/successor becomes a better option, in terms of Storage and Performance. What I don't want is to have my hands tied because: 1) my database is too large which makes it impracticable to convert to the new option, without requiring my website to go offline or tables being 100% locked 2) not being able to upgrade MariaDB because the performance will degrade due to the option I'm using There is a similar example where something that I thought was implemented as an improvement and an online transaction, actually locked all tables while it was running – see MDEV-18385 for the innodb_defragment option. But if it will be possible to convert my large ROW_FORMAT=COMPRESSED tables to the new option, online & smoothly without causing an impact on the uptime, that's all good for me > When it comes to page_compressed, in my opinion we should consider removing the option altogether I'm not sure what you tried to say here. So PAGE_COMPRESSED will be deprecated too? Or do you mean this will be the default and required for all tables? Thank you very much. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-06-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have filed I do not see much future for PAGE_COMPRESSED, unless the tests that should be done in the scope of I realize that forcing users to use an offline tool to convert data files would not allow a smooth upgrade path. Maybe we should start by throwing deprecation warnings for the page_compressed attribute, or just warn that the attribute will be ignored, and create a normal uncompressed table. Then, after several major releases, we could remove the code altogether. RocksDB was developed with data compression in mind. I would expect you to get much better results with it. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nuno [ 2020-06-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Marko, Ok, I see. So, in summary, both ROW_FORMAT=COMPRESSED and PAGE_COMPRESSED should not be used. Fair enough. However, my feedback is that it's a VERY bad idea to make tables read-only in a major update (10.6) like this, without first warning the clients in a previous major version (10.5). This will make many people very upset, especially those who are not able to test on a Test server and expect a smooth upgrade directly in production Anyway - my understanding then is that I should keep ROW_FORMAT=COMPRESSED where it is already on, I would like to take this opportunity now that I'm working on an upgrade, and would like to try this on my Test server. https://mariadb.com/kb/en/getting-started-with-myrocks/ I'm not familiar at all with RocksDB, but I assume none of the "innodb_*" variables will apply to RocksDB and that I will have to look into the below, am I right? https://mariadb.com/kb/en/myrocks-system-variables/ And, does it play well with InnoDB tables? This feels like a forced learning about an engine I'm not familiar with (and risky to put in production, even if it works in the Test env), when InnoDB should have its own stable compression. Thank you very much. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi nunop, thank you for the feedback. We have not made any decision regarding compression in InnoDB yet. I do not disagree that PAGE_COMPRESSED could be a better option than ROW_FORMAT=COMPRESSED. The InnoDB code changes related to that are much simpler. The only question is if sparse files work efficiently on your file system. Your questions on RocksDB would better be asked in a support ticket. I do not have much experience in any other storage engines than InnoDB. I believe that compression and the ability to efficiently perform searched updates are conflicting goals. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ian Gilfillan [ 2021-01-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko are the | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-01-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
greenman, I am not aware of anyone having developed or run benchmarks. Alex/AT mentioned in | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alex/AT [ 2021-01-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello Marko, My worried comment in 19780 was about "I do not see much future for PAGE_COMPRESSED". Actually it works in place of TokuDB, although it's a good deal larger and slower, but still much better than uncompressed. Our configuration is pretty much straightforward (although not 100% typical for database servers): Dell Compellent SC3020 SAN (SSD only) pushed over multiple links of 10G iSCSI, connected to XenServer pool servers where MariaDB 10.5 servers on CentOS 8 reside. The underlying filesystem is basic ext4 without any special tuning applied. MariaDB servers tuning is only related to buffer pool side, flush @ TRX commit (we disable the transaction flush and rely on each second one), log size & buffering. No special tuning otherwise. The load patterns differ. One is heavy Zabbix installation with lots of history and hosts. Another is mobile usage pre-billing which processes and stores incoming dataflow in short amounts of time. Another is heavy logging storage (400Gb of compressed From what I see, I can tell TokuDB was the best storage for almost everything mentioned above (except for Confluence where InnoDB is a must due to system requirements). Uncompressed InnoDB performs very badly especially during read queries, saturating the SAN links. Compressed InnoDB survives well, but still behaves worse - although it is worth mentioning our links are never saturated with both TokuDB and page-compressed InnoDB. One of the concerns of hole punching is slow file replacement on DDL. Well, this kicks in, but is totally tolerable (took under a minute for 400G mail logging file). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alex/AT [ 2021-01-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Measuring exact performance may be pretty much tough because running synthetics won't do much. If you please can suggest some general guidelines for measuring, I may be able to do it on copies of running systems though. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alex/AT [ 2021-01-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
An example of mail logging system (~366G of data, compressed to 88G):
TokuDB size was around 32G. Write-mostly, a multitude of transactions per second, but reads (searches) may be done in huge bulks, not always properly indexed.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alex/AT [ 2021-01-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
To add to it, I tried to apply RocksDB to the task, but it consumes 20G of RAM and dies - literally - it does something with files, moving stuff around levels in logs, but does not arrive in time and starts doing it again and again and again. Did not test with more RAM, current InnoDB instance perfectly lives with 8G in total, 4G for buffer pool, like TokuDB fit okay into its 4G cache. That's to stress RocksDB is not a replacement for 'traditional engine' compression in any way. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nuno [ 2021-01-29 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Alex/AT Seems TokuDB has been doing well for you. But since it's obsolete and MariaDB has suggested "MyRocks" as the long-term migration path, have you tried MyRocks? (I honestly never heard about any of these 3 so far, so I have no idea how similar they are, how good they are, etc...) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-08-02 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
nunop, I just checked your numbers, and it seems to me that what you quote as "Actual file size" in fact is the logical file size, because each number is a multiple of the InnoDB page size. I do not think that we have any SQL interface for reporting the allocated file size. On GNU/Linux, you can get it reported by the following:
On my system, the first reported number is the allocated size, in units of 1024 bytes. It appears to include the file system overhead (indirect blocks). axel recently conducted some tests for nunop, can you repeat the experiment on MariaDB 10.5 or 10.6 and get the data for the actual file size? Would using COMPRESSED columns ( One thing that may be an inherent problem with the page_compressed approach is file system fragmentation, which was mentioned years ago in a blog post. It might be less of a problem nowadays, depending on the file system and operating system kernel version, of course. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alex/AT [ 2021-08-02 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
As this thread still flows and I see LZ4 tests, I want to ask, may there be any chance of adding LZMA (XZ) or ZSTD to the PAGE_COMPRESSED tables in some foreseeable future? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alex/AT [ 2021-08-02 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I know LZMA is already included but this option is not in RPM builds and requires customly building MariaDB i.e. for LZMA now which is less than convenient. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nuno [ 2021-08-02 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I can no longer test, as I have already migrated the database to a bigger storage (for significantly more monthly expense), one of the reasons being that MariaDB no longer supports ROW_FORMAT=COMPRESSED, so we no longer use that row format. I did test using COMPRESSED columns, at the time I was testing/rehearsing the migration, but I gave up on the idea because of the error reported below, and no one replied to me at the time: Thank you. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-08-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
nunop, thank you. MariaDB still supports ROW_FORMAT=COMPRESSED. Alex/AT, I agree that it would be useful to cover also LZMA and ZSTD in | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
https://lwn.net/Articles/864363/ mentions that Linux 5.15 might finally include fixes for race conditions around hole-punching, which is what page_compressed tables are using. I can imagine that when not using innodb_flush_method=O_DIRECT in both the server and any backup programs, that race condition could result in data corruption. The patchset author comments that even kernel crashes are possible. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
As noted in |