[MDEV-25734] mbstream breaks page compression on XFS Created: 2021-05-19 Updated: 2023-11-03 Resolved: 2023-10-17 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | mariabackup, Storage Engine - InnoDB |
| Affects Version/s: | 10.2, 10.3, 10.4, 10.5, 10.6 |
| Fix Version/s: | 10.4.32, 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Muhammad Irfan | Assignee: | Sergei Golubchik |
| Resolution: | Fixed | Votes: | 6 |
| Labels: | mariabackup, page-compression | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
As InnoDB page compression uses “hole punching,” and it seems like ignored by somehow. mariabackup SST method leaves compressed data from donor to decompressed on joiner node. mariabackup SST not copying compressed files so page compression seems not beneficial in this case.
page compression is enabled. Where size of tablespace file is:
Now, let's try mariabackup:
Let's verify backup table t1 tablespace file size again:
As per https://dev.mysql.com/doc/refman/5.7/en/innodb-page-compression.html I think that is related xtrabackup report https://jira.percona.com/browse/PXB-1557 |
| Comments |
| Comment by Christian [ 2021-07-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Also mariabackup in 10.5 can't handle sparse-files. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-07-23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
thiru, you fixed something similar in | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-07-23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In 10.6, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Thirunarayanan Balathandayuthapani [ 2021-09-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I don't see any uncompressed table in target directory.
Please verify what was the mariabackup command given by the user. Whether this issue is specific to centOS ? Please get the information about what filesystem they were using? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2021-09-21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Got similar issue.
and compression setting like
The issue was noticed when 'optimize' and several migrations were performed on the node running standalone: dataset size reduced significantly. Then data copied with SST to other nodes and SST took much longer than it was expected according to connection speed and dataset size on source node. The reason is that dataset size changed dramatically with SST and streamed volume appeared to be matching uncompressed data size.
On comparing tabespace file sizes, it was found that each and every .ibd file is bigger on Node3 and Node2 than it was on Node1, however, data size in information_schema is also identical for same table on source server and replica, meaning tablespaces are not compressed on repilcas while they were on source one.
Mariabackup command to make local backup is: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Muhammad Irfan [ 2021-09-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
thiru It appears on CentOS7/8 and it happens on XFS filesystem during mariabackup SST | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Thirunarayanan Balathandayuthapani [ 2021-09-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Wrote the sample program to do create, write & seek system call.
In ext4 file system the output file `f1.txt` has different block size
In xfs filesystem:
AFAIK, Mariabackup doesn't do anything wrong logically. Mariabackup preserve punch hole for compressed format( | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2021-09-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Sorry, but this doesn't explain the case with same issue happening while streaming data to stdout and compressing it on the fly with mariabackup --stream=xbstream | zstd -o... | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Thirunarayanan Balathandayuthapani [ 2021-09-24 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
euglorg yes, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-10-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
As far as I understand, the problem is specifically with the Galera snapshot transfer (SST). Either something in the SST script or the mariabackup options that it invokes needs to be changed. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2021-10-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Marko, sorry, but the problem seems to be not just with SST as it also happens when mariabackup is used for creating a backup with mariabackup --stream=xbstream | zstd -o..., there's an example of complete command line with all the arguments above. Again, compression in archived data sent through the pipe as xbstream is also broken. As SST relies on mariabackup, it was just noticed on SST, but upon investigating backups it became clear that problem happens with mariabackup itself. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2021-10-07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
euglorg - why do you uncompress the stream? What does that prove? That uncompressed stream does not have holes, yes, we know about that, and thus the suggestion is to compress it. Lightly, so that larger runs of binary zeros are omitted. If you would like to point out that mbstream -x does not create holes, on some filesystem, I would think there is something wrong with filesystem. mariabackup makes an effort to create holes, portably, with original Unix APIs from 1970ies, seek() behind the end of file, and write() there. This popular answer from stackoverflow describes how to programmatically create holes, and this is what mariabackup exactly does. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2021-10-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Vladislav, sorry for misunderstanding, let's probably explain the problem step by step. First, your question is a bit unclear as there's no any mention on stream uncompression in the command I provided:
It is expected that data received from pipe will have pages compressed, isn't it?
but second command always produces dataset that has uncompressed tablespace data. Now we use XFS. If you are sure this is filesystem problem, would you expect data to have same size in case I uncompress archived backup data to ext4fs? If the above logic is wrong, please let me know. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-10-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
euglorg, did you try the test program that thiru posted? It is producing different results on ext4 and XFS. As far as I can tell, the function write_compressed() in extra/mariabackup/ds_local.cc is creating sparse files in the traditional way: by writing and seeking past the current end of the file. We might try to convince XFS harder to create sparse files, by invoking a Linux specific fallocate() system call, to make it closer to the way how writes from the InnoDB buffer pool work. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-10-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I found a question that suggests that this is an XFS design feature: This one may be relevant too: If XFS is on purpose so reluctant to create sparse files with tiny blocks when they are created in the traditional POSIX way (seeking and writing) as opposed to FALLOC_FL_PUNCH_HOLE, maybe it is not an ideal match for the page_compressed tables. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2021-10-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here is another link http://web.archive.org/web/20171010154045/http://oss.sgi.com/archives/xfs/2012-02/msg00413.html, with discussion on xfs mount options. disclaimer - I did not try that. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2021-10-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It is probably helpful to punch holes on XFS explicitly, with FALLOC_FL_PUNCH_HOLE, after determinining that file system is XFS. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2021-10-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Marko, I haven't tried that test. Instead, I tried to extract backup archive to ext4fs partition and have to confirm that the issue happens on XFS only.
The issue is that XFS has better performance for multithreaded IO, that was the reason of using it. Probably, Vladislav is right and XFS requires specific handling if page compression is in use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2021-10-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Perhaps, fallocate --dig-holes on the backup files could help, after decompression. As a workaround | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2021-10-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
But this workaround is probably far from being good option for SST transfer. It will make it much slower. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2021-10-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
the xbstream/mbstream format is currently not-optimized for the holes. Thus, the suggestion to use lightweight stream compression. Lightweight means it would be is capable of detecting and compressing large runs of 0 bytes, yet would not try too hard to compress already compressed data. zlib with level 1 might be a good candidate | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Guillaume Lefranc [ 2021-10-27 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Just as a side note, punching holes with fallocate on XFS is slow: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2021-10-27 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We do not do fallocate in backup, tanj. There is the traditional, not specifically Linuxy, solution , lseek() behind the end of file, and then write(). distance between old end of file, and the place we write, is supposed to be the hole. The most modern in the current backup solution, is that we do set sparse file attribute on Windows, which works since year 2000 or so The server does fallocates, and maybe it does them slow, but there is nothing else we do. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Guillaume Lefranc [ 2021-11-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@wlad I was referring to Marko's comment: We might try to convince XFS harder to create sparse files, by invoking a Linux specific fallocate() system call, to make it closer to the way how writes from the InnoDB buffer pool work. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2021-11-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It is not up to me entirely, but I do not welcome a workaround for every bug-of-day on every possible Linux filesystem. Maybe we eventually just document the behavior of this and that filesystem, and leave it as-is. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2021-11-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Indeed, workaround for every 'bug-of-the-day' is a bad idea, strongly agree with Vladislav!
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Guillaume Lefranc [ 2021-11-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I'd also agree that proper documentation is preferable to workaround. In this case writing that ext4 is preferable as a filesystem "in certain cases" Just as a note because I tried on XFS but fallocate --dig-holes does not reclaim space sadly. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2021-11-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Tried fallocate --dig-holes on some files from my dataset (on xfs), there's some effect but not that expected difference seen on ext4fs. It probably depends on real data size, hole sizes and extents allocated in every particular case. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-03-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It looks like there is not much that can be improved outside the file system code in the operating system kernel. euglorg, the traditional UNIX way to create sparse files is to seek beyond the end of the file and write. The Linux fallocate() flag FALLOC_FL_PUNCH_HOLE is much newer; I do not think that it existed 20 years ago. I mostly agree with the legendary blog post https://smalldatum.blogspot.com/2015/10/wanted-file-system-on-which-innodb.html which is a criticism of the page_compressed format. By outsourcing some complexity to the Linux kernel, we essentially are at the mercy of the kernel. Either we get small annoyances like this one, or potential data corruption like https://lwn.net/Articles/864363/. I have not seen that in the wild, but I remember kernel crashes or file system corruption when Oracle tested the MySQL 5.7 response to the MariaDB 10.1 page_compressed format on a FusionIO device. We do test page_compressed internally, but mostly on the ext4 file system, and I do not think that we have tests that would monitor the actual space allocation. To my surprise, there still are users who prefer my filesystem-agnostic and overly complex implementation of ROW_FORMAT=COMPRESSED. I wanted to deprecate and remove write support for it, but due to community feedback, I reverted that decision in | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2022-03-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you for your explanation, marko. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-03-21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
euglorg, thank you. Let’s address this in the documentation. I would expect the InnoDB page_compressed tables to be able to save space in a traditional file system, such as Linux ext4 or Microsoft NTFS, at the cost of increased fragmentation. The documentation might also include an example how to check the actually allocated physical blocks. That could be ls -ls on GNU/Linux. On file systems that support snapshots, the allocation of blocks could be quite different, and questions like ‘how much space does a file consume’ become more complicated. We have observed that enabling O_DIRECT ( | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Guillaume Lefranc [ 2022-09-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It seems that Percona has fixed this by adding fallocate support in xbstream, it would be nice if MariaDB did backport this to mbstream. In relevance to my previous comments, fallocate --dig-holes works with XFS, if speculative allocation is disabled (allocsize=64k in mount options) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Valerii Kravchuk [ 2023-04-27 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
So, do we plan to fix anything in mariabackup implementation for XFS in frames of this bug report? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-05-11 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I think that tanj’s suggestion is reasonable. This should be relatively simple to implement, but a bit tricky to test. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This change looks OK to me. This will add some fallocate() system calls, but only when page_compressed tables are being used. |