[MDEV-11696] Page Compression Has No Effect on Table Data File Size Created: 2016-12-31 Updated: 2018-03-09 Resolved: 2017-03-30 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Documentation |
| Affects Version/s: | 10.2.2, 10.2.3 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Minor |
| Reporter: | Shawn Weeks | Assignee: | Ian Gilfillan |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Windows 10 Pro 1607 x64, 128GB RAM, i7 6800k MariaDB 10.2.2 Beta |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
Testing zlib page compression and it appears to have no effect on the size of the table data file. To run this test you'll have set tmp_table_size=1G and max_heap_table_size=1G in your my.ini. |
| Comments |
| Comment by Vladislav Vaintroub [ 2017-01-03 ] |
|
The page compression does not have any effect on file size, even if it worked |
| Comment by Jan Lindström (Inactive) [ 2017-01-03 ] |
| Comment by Elena Stepanova [ 2017-01-03 ] |
|
I suggest to convert it into a documentation issue. The KB page should clearly explain what are benefits of using compression without trimming (if any) and what should be expected with such configuration. jplindst, could you please provide necessary first-hand information to greenman? (Or if you want to edit the page yourself, you can obviously do it). |
| Comment by Shawn Weeks [ 2017-01-03 ] |
|
It is rather confusing expecting compression to reduce the size of the table. Coming from Oracle and searching on Table Compression for MariaDB https://mariadb.com/kb/en/mariadb/compression/ is the first thing that comes up. In Oracle compression a table reduces the size of that table and I'm really unclear what benefit Page Compression actually has other than reducing writes on SSD which still sounds like it reduces the size of the data written. The documentaiton compares it to InnoDB Compressed at https://mariadb.com/kb/en/mariadb/xtradbinnodb-storage-formats/#compressed which says "The compressed format results in the smallest data size in most cases." Also coming from Hadoop world where zlib and snappy are data compression formats for hive that do actually reduce the table data file size can be confusing as well. |
| Comment by Shawn Weeks [ 2017-01-03 ] |
|
Also in reference to Vladislav's comment wouldn't storing 3k of data be less than storing 16k of data? In my case there is no reduction in the size of the table no matter how arbitrarily large I create the table nor how compressible the data may be so how is it writing smaller blocks with compression and yet still writing the same amount of data to the tablespace. |
| Comment by Shawn Weeks [ 2017-01-03 ] |
|
Another note the blog at https://mariadb.org/significant-performance-boost-with-new-mariadb-page-compression-on-fusionio/ says "The compression means less data is transferred between disk and memory, and takes up less space on disk and in memory." specifiaclly says it takes up less space on disk. |
| Comment by Vladislav Vaintroub [ 2017-01-03 ] |
|
I was commenting about writing 3K instead of 16K , but it will also be written at the very start of the page . File size does not change , a page would still look like (3K compressed data, unused 13K). |
| Comment by Shawn Weeks [ 2017-01-03 ] |
|
So does that mean compressed table files are mostly empty space? Or would you be able to write more in a page? |
| Comment by Vladislav Vaintroub [ 2017-01-03 ] |
|
It should be a sparse file, on Windows NTFS specifically, with the holes being just marked as zeroed space, and not taking physical disk. There is at least one obvious bug that would prevent it (wrong DeviceIoControl call, synchronous on a file opened for async IO), but may be more, I did not try to debug this yet. If you're up to experiment, you can set NTFS compressed attribute on a newly created .ibd files, this would give you sparse file, that works about the same as innodb envisioned it should have worked. |
| Comment by Jan Lindström (Inactive) [ 2017-01-04 ] |
|
Firstly, following does not work for 10.2, do not use 10.2 on production. You should separate the two issues, is tablespace compressed or not and the actual file system storage usage. In page compression tablespace pages are compressed. What actually is written to the file system depends on innodb-use-trim parameter (we could not allow to use page compression if trim is not enabled and if file system does not support sparse files)
Does it take less space on device depends on device and file system. For tools it could look like we have reserved some space for file but some of that space is not used (i.e. ls -l <file> vs ls -ls --block-size=1 <file>). Note in the blog at https://mariadb.org/significant-performance-boost-with-new-mariadb-page-compression-on-fusionio/ says "The compression means less data is transferred between disk and memory, and takes up less space on disk and in memory.", Here memory means that on InnoDB buffer pool we have only uncompressed data compared to row_compressed where you could have both. Naturally, page compression does not save memory compared to uncompressed. Less data is transferred and we physically write only the payload, punch_hole operation in my understanding only operates on file system metadata. |
| Comment by Ian Gilfillan [ 2017-03-30 ] |
|
Expanded docs to describe this |
| Comment by Marko Mäkelä [ 2018-03-09 ] |
|
The page_compressed could still be improved further:
|