[MDEV-23325] InnoDB silently writes uncompressed data on compression error or if compressed length = 0 Created: 2020-07-29 Updated: 2020-09-28 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.5, 10.6 |
| Fix Version/s: | 10.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Kartik Soneji | Assignee: | Marko Mäkelä |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | compression, innodb | ||
| Environment: |
All |
||
| Description |
|
While working on On further investigation, I found that InnoDB will silently write uncompressed data if the compression fails (returns non-zero status) or if the compressed length is 0. storage/innobase/fil/fil0pagecompress.cc#L211-L220
To reproduce:
|
| Comments |
| Comment by Marko Mäkelä [ 2020-07-29 ] |
|
Keep in mind that the compression algorithm could in fact slightly expand the data instead of compressing it. For a string of n bits, there are 2^n possible contents. If a compression algorithm transforms some combinations of the n bits into fewer than n bits, then it must correspondingly expand other combinations to more than n bits. Because of this, page_compressed compression can fail. The call that you pointed to is returning 0 when the compression would not succeed. It is not an error to have uncompressed pages in a page_compressed data file. The file would be valid even if all pages are in uncompressed format. Can you reproduce this problem on a main branch of the server, in a way that shows some user impact? You may want to use a command like ls -ls to show the physical size in addition to the logical size of the data file. And you should shut down the server before checking the file size, to ensure that the buffer pool will have been written out to the data file. |
| Comment by Marko Mäkelä [ 2020-07-29 ] |
|
I think that it was an unfortunate choice to introduce page_compressed in the first place, because it only seems to have been useful on one proprietary file system (NVMFS for FusionIO) that is no longer available. If we know that the required compression algorithm is not available in the server, we should block access to the table. Ideally, perhaps we should abort the server startup, so that the DBA has a chance to correct the situation without causing any permanent damage. If we simply pretended that the table does not exist, then the purge of history in the background would discard any undo log records that referred to the table, without actually removing any garbage from the table. But maybe this is an acceptable limitation. Users could always invoke OPTIMIZE TABLE or similar afterwards. I believe that the correct place for such a check might be the function dict_load_table_one(). At least for tables that were created with innodb_checksum_algorithm=full_crc32 we will know the compression algorithm based on the FSP_SPACE_FLAGS in the first page of the file, or fil_space_t::flags. |
| Comment by Kartik Soneji [ 2020-07-29 ] |
|
Can you reproduce this problem on a main branch of the server, in a way that shows some user impact? If we know that the required compression algorithm is not available in the server, we should block access to the table. |