[MDEV-25734] mbstream breaks page compression on XFS Created: 2021-05-19  Updated: 2023-11-03  Resolved: 2023-10-17

Status: Closed
Project: MariaDB Server
Component/s: mariabackup, Storage Engine - InnoDB
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.4.32, 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3

Type: Bug Priority: Critical
Reporter: Muhammad Irfan Assignee: Sergei Golubchik
Resolution: Fixed Votes: 6
Labels: mariabackup, page-compression

Issue Links:
Relates
relates to MDEV-16129 InnoDB page compression not working a... Open
relates to MDEV-22839 ROW_FORMAT=COMPRESSED vs PAGE_COMPRES... Open
relates to MDEV-15527 page_compressed compressed page parti... Closed

 Description   

As InnoDB page compression uses “hole punching,” and it seems like ignored by somehow. mariabackup SST method leaves compressed data from donor to decompressed on joiner node. mariabackup SST not copying compressed files so page compression seems not beneficial in this case.

MariaDB [tt]> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `msg` varchar(100) DEFAULT NULL,
  UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=34209256 DEFAULT CHARSET=latin1 `PAGE_COMPRESSED`='ON'
1 row in set (0.005 sec)
 
+--------------+------------+------------+
| TABLE_SCHEMA | TABLE_NAME | ROW_FORMAT |
+--------------+------------+------------+
| tt           | t1         | Dynamic    |
+--------------+------------+------------+
 
MariaDB [tt]> show global variables like '%compression%';
+------------------------------------------+------------------+
| Variable_name                            | Value            |
+------------------------------------------+------------------+
| column_compression_threshold             | 100              |
| column_compression_zlib_level            | 6                |
| column_compression_zlib_strategy         | DEFAULT_STRATEGY |
| column_compression_zlib_wrap             | OFF              |
| innodb_compression_algorithm             | zlib             |
| innodb_compression_default               | ON               |
| innodb_compression_failure_threshold_pct | 5                |
| innodb_compression_level                 | 1                |
| innodb_compression_pad_pct_max           | 50               |
+------------------------------------------+------------------+
9 rows in set (0.001 sec)

page compression is enabled. Where size of tablespace file is:

$ ls -lrth /var/lib/mysql/tt/t1.ibd
2.1G /var/lib/mysql/tt/t1.ibd
 
$ du -hs --block-size=1 /var/lib/mysql/tt/t1.ibd
1098792960	/var/lib/mysql/tt/t1.ibd
 
$ du -hs /var/lib/mysql/tt/t1.ibd
1.1G	/var/lib/mysql/tt/t1.ibd

Now, let's try mariabackup:

mariabackup --backup --target-dir=/home/vagrant/backup/ -u root -p'xxxx'
.
[00] 2021-05-19 20:08:13 completed OK!

Let's verify backup table t1 tablespace file size again:

$ ls -la /home/vagrant/backup/tt/t1.ibd
-rw-r-----. 1 root root 2231369728 May 19 20:08 /home/vagrant/backup/tt/t1.ibd
 
$ du -hs --block-size=1 /home/vagrant/backup/tt/t1.ibd
2214154240	/home/vagrant/backup/tt/t1.ibd
 
$ du -hs /home/vagrant/backup/tt/t1.ibd
2.1G	/home/vagrant/backup/tt/t1.ibd

As per https://dev.mysql.com/doc/refman/5.7/en/innodb-page-compression.html
Preserving page compression when moving a page-compressed tablespace file from one host to another requires a utility that preserves sparse files

I think that is related xtrabackup report https://jira.percona.com/browse/PXB-1557



 Comments   
Comment by Christian [ 2021-07-05 ]

Also mariabackup in 10.5 can't handle sparse-files.
didn't test 10.6

Comment by Marko Mäkelä [ 2021-07-23 ]

thiru, you fixed something similar in MDEV-15527 in IMPORT TABLESPACE.

Comment by Marko Mäkelä [ 2021-07-23 ]

In 10.6, MDEV-26029 disables sparse files on purpose on thinly provisioned storage (computational storage device), specifically ScaleFlux. We must not break that logic when merging this fix to 10.6.

Comment by Thirunarayanan Balathandayuthapani [ 2021-09-08 ]

MariaDB [(none)]> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
MariaDB [test]> show create table t1;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                           |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `msg` varchar(100) DEFAULT NULL,
  UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=96860716 DEFAULT CHARSET=latin1 `page_compressed`=1 |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.000 sec)
 
MariaDB [test]> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 67108864 |
+----------+
1 row in set (43.579 sec)
 
MariaDB [test]>  show global variables like '%compression%';
+------------------------------------------+------------------+
| Variable_name                            | Value            |
+------------------------------------------+------------------+
| column_compression_threshold             | 100              |
| column_compression_zlib_level            | 6                |
| column_compression_zlib_strategy         | DEFAULT_STRATEGY |
| column_compression_zlib_wrap             | OFF              |
| innodb_compression_algorithm             | zlib             |
| innodb_compression_default               | ON               |
| innodb_compression_failure_threshold_pct | 5                |
| innodb_compression_level                 | 1                |
| innodb_compression_pad_pct_max           | 50               |
+------------------------------------------+------------------+
 
Current file size:
===================
 
thiru@rr:~/server/10.3/brl/data/test$ ls -lsrth ./t1.ibd 
3.4G -rw-rw---- 1 thiru dev 6.0G Sep  8 12:30 ./t1.ibd
 
thiru@rr:~/server/10.3/brl/data/test$ du -hs ./t1.ibd 
3.4G	./t1.ibd
 
 
Running the mariabackup with the following command:
===================================================
./mariabackup --target-dir=$PWD/backup --backup --socket=$PWD/../../data/mysql.sock --user=root
 
 
Checking the file size in target directory:
===============================================
 
thiru@rr:~/server/10.3/brl/extra/mariabackup/backup/test$ ls -lsrth ./t1.ibd 
3.4G -rw-r----- 1 thiru dev 6.0G Sep  8 12:34 ./t1.ibd
 
thiru@rr:~/server/10.3/brl/extra/mariabackup/backup/test$ du -hs ./t1.ibd 
3.4G	./t1.ibd

I don't see any uncompressed table in target directory.

thiru@rr:~/server/10.3/brl/data/test$ cat /proc/version
Linux version 5.4.0-62-generic (buildd@lgw01-amd64-045) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #70-Ubuntu SMP Tue Jan 12 12:45:47 UTC 2021

Please verify what was the mariabackup command given by the user. Whether this issue is specific to centOS ? Please get the information about what filesystem they were using?

Comment by Eugene [ 2021-09-21 ]

Got similar issue.
Our setup is Linux 5.10.67-gentoo, 3 absolutely identical servers with absolutely same mariadb configuration. Mariadb version is mariadb-10.4.21.
We have similar definitions on all the tables with ENGINE=InnoDB ... 'PAGE_COMPRESSED'='ON'.
We also have compression enabled by default for new tables:

innodb_compression_algorithm		= snappy
innodb_compression_default		= ON

and compression setting like

MariaDB [(none)]> show global variables like '%compression%';
+------------------------------------------+------------------+
| Variable_name                            | Value            |
+------------------------------------------+------------------+
| column_compression_threshold             | 100              |
| column_compression_zlib_level            | 6                |
| column_compression_zlib_strategy         | DEFAULT_STRATEGY |
| column_compression_zlib_wrap             | OFF              |
| innodb_compression_algorithm             | snappy           |
| innodb_compression_default               | ON               |
| innodb_compression_failure_threshold_pct | 5                |
| innodb_compression_level                 | 6                |
| innodb_compression_pad_pct_max           | 50               |
+------------------------------------------+------------------+

The issue was noticed when 'optimize' and several migrations were performed on the node running standalone: dataset size reduced significantly. Then data copied with SST to other nodes and SST took much longer than it was expected according to connection speed and dataset size on source node. The reason is that dataset size changed dramatically with SST and streamed volume appeared to be matching uncompressed data size.
The result is:

Node3
---------------
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda8      3.5T  2.5T  1.1T  71% /var/lib/mysql
---------------
Node2
---------------
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda8      3.5T  2.5T  1.1T  71% /var/lib/mysql
---------------
Node1
---------------
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda8      3.5T  1.7T  1.9T  49% /var/lib/mysql

On comparing tabespace file sizes, it was found that each and every .ibd file is bigger on Node3 and Node2 than it was on Node1, however, data size in information_schema is also identical for same table on source server and replica, meaning tablespaces are not compressed on repilcas while they were on source one.
Also, on examining the backup taken on Node1 with mariabackup tool, it was found that tablespaces inside the backup have very same size they have on replicating (Node3 and Node2) servers (say, the table has size of 25GB on source node, it has size of 40GB on both replicas and in backup copy, while source node still operates compressed table with 25GB of size)
SST section in mysql configuration file has no relation to this:

[SST]
compressor='/usr/bin/zstd --fast=3'
decompressor='/usr/bin/zstd -d'

Mariabackup command to make local backup is:
mariabackup -u root -p --backup --galera-info --stream=xbstream --parallel 16 --use-memory=16G --socket=/var/run/mysqld/mysqld.sock --datadir=/var/lib/mysql | /usr/bin/zstd --fast -T16 -q -o /mysql_backup.zst
Again, on uncompressing archive mysql_backup.zst it was noticed that data size it contains is 30% bigger than in backup source directory.
This all means that tablespace data is streamed uncompressed even on preforming local backup, and, thus, same happens on using mariabackup for SST while no compression happens on replica node.

Comment by Muhammad Irfan [ 2021-09-22 ]

thiru It appears on CentOS7/8 and it happens on XFS filesystem during mariabackup SST

Comment by Thirunarayanan Balathandayuthapani [ 2021-09-22 ]

Wrote the sample program to do create, write & seek system call.

#include<unistd.h>
#include<fcntl.h>
#include<stdio.h>
#include<string.h>
 
#define DATA_SIZE 16384
 
int main()
{
  /* Variable to store user content */
  char data[DATA_SIZE];
  int data_length[8]= {16384, 109, 244, 249, 7346, 7078, 7146, 7839};
  int fd = open("./f1.txt", O_RDWR | O_CREAT);
 
  memset(data, 'a', DATA_SIZE);
  for (int i= 0; i < 8; i++)
  {
    int punch_hole_len= DATA_SIZE - data_length[i];
    write(fd, data, data_length[i]);
    if (punch_hole_len)
        lseek(fd, punch_hole_len, SEEK_CUR);
  }
 
  close(fd);
  printf("File created and saved successfully \n");
  return 0;
}

In ext4 file system the output file `f1.txt` has different block size

[~/mariarepo/server/10.7/10.7-sample]$df -T ./
Filesystem     Type 1K-blocks      Used Available Use% Mounted on
/dev/nvme0n1p7 ext4 364903200 326044404  20253024  95% /home
 
[~/mariarepo/server/10.7/10.7-sample]$ls -ls f1.txt 
64 ----r-x--T 1 thiru thiru 122527 Sep 22 17:46 f1.txt
 
[~/mariarepo/server/10.7/10.7-sample]$du -hs f1.txt 
64K	f1.txt

In xfs filesystem:

[/data/volumes/xfs32m]$df -T ./
Filesystem     Type 1K-blocks  Used Available Use% Mounted on
/dev/loop3     xfs      29348  1904     27444   7% /data/volumes/xfs32m
 
[/data/volumes/xfs32m]$ls -ls ./f1.txt 
76 -rwx--s--T 1 root root 122527 Sep 22 17:47 ./f1.txt
 
[/data/volumes/xfs32m]$du -hs ./f1.txt 
76K	./f1.txt

AFAIK, Mariabackup doesn't do anything wrong logically. Mariabackup preserve punch hole for compressed format(MDEV-13023).
I think the problem lies within file system.

Comment by Eugene [ 2021-09-22 ]

Sorry, but this doesn't explain the case with same issue happening while streaming data to stdout and compressing it on the fly with mariabackup --stream=xbstream | zstd -o...
Files inside the archive (that is filesystem-independent) have bigger size than they have in source /var/lib/mysql directory. Thus, mariadb handles them incorrectly while reading and streaming anywhere regardless of filesystem they will be written into.

Comment by Thirunarayanan Balathandayuthapani [ 2021-09-24 ]

euglorg yes, MDEV-13023 doesn't deal with compressed file while streaming. Thanks for your analysis. I will consider this issue as Mariabackup doesn't punch hole the file during streaming.

Comment by Marko Mäkelä [ 2021-10-01 ]

As far as I understand, the problem is specifically with the Galera snapshot transfer (SST). Either something in the SST script or the mariabackup options that it invokes needs to be changed.

Comment by Eugene [ 2021-10-01 ]

Marko, sorry, but the problem seems to be not just with SST as it also happens when mariabackup is used for creating a backup with mariabackup --stream=xbstream | zstd -o..., there's an example of complete command line with all the arguments above. Again, compression in archived data sent through the pipe as xbstream is also broken. As SST relies on mariabackup, it was just noticed on SST, but upon investigating backups it became clear that problem happens with mariabackup itself.

Comment by Vladislav Vaintroub [ 2021-10-07 ]

euglorg - why do you uncompress the stream? What does that prove?
Compressing the stream , lightly, gives you the ability to save on some binary zeros, which are actually, and undeniably, in the stream.
uncompressing the compressed stream can be combined, via pipe, with mbstream -x , to actually create a data directory. And that step creates files with the holes. So, you have compressed stream, and compressed/with holes data directory.

That uncompressed stream does not have holes, yes, we know about that, and thus the suggestion is to compress it. Lightly, so that larger runs of binary zeros are omitted.

If you would like to point out that mbstream -x does not create holes, on some filesystem, I would think there is something wrong with filesystem. mariabackup makes an effort to create holes, portably, with original Unix APIs from 1970ies, seek() behind the end of file, and write() there. This popular answer from stackoverflow describes how to programmatically create holes, and this is what mariabackup exactly does.

Comment by Eugene [ 2021-10-08 ]

Vladislav, sorry for misunderstanding, let's probably explain the problem step by step.

First, your question is a bit unclear as there's no any mention on stream uncompression in the command I provided:

mariabackup -u root -p --backup --galera-info --stream=xbstream --parallel 16 --use-memory=16G --socket=/var/run/mysqld/mysqld.sock --datadir=/var/lib/mysql | ...

It is expected that data received from pipe will have pages compressed, isn't it?
Here commands to the right from the pipe are omitted as uncompressed stream is sent through the pipe while there's no any explicit mention on uncompressing tablespace data in either command or related hel page explanations (this has no relation to compressing and uncompressing the stream sent through socat while performing SST, for sure).
To clear the case, full backup-restore command set would be:

#create backup archive
mariabackup -u root -p --backup --galera-info --stream=xbstream --parallel 16 --use-memory=16G --socket=/var/run/mysqld/mysqld.sock --datadir=/var/lib/mysql | /usr/bin/zstd --fast -T16 -q -o /mysql_backup.zst
#uncompress the archive
zstd -d -c  /mysql_backup.zst | mbstream -x -C /mnt/mariabackuptmp
#prepare the backup
mariabackup --prepare --innodb-doublewrite --target-dir /mnt/mariabackuptmp

but second command always produces dataset that has uncompressed tablespace data. Now we use XFS. If you are sure this is filesystem problem, would you expect data to have same size in case I uncompress archived backup data to ext4fs?
Please give me a couple of days, I'll try to extract archived backup to ext4fs... If it will have same size the original data has had, then the problem can be within FS, then there should be explicit note in the documentation that mariadb can't use tablespace compression if mariabackup for SST and data resides on XFS partition.
However, in case zstd -d -c /mysql_backup.zst | mbstream -x -C /mnt/mariabackuptmp (see above how it was created) will produce also dataset with uncompressed tablespaces on ext4fs (mean, command chain mariabackup | mbstream -x creates dataset that is 30% bigger than expected), this would mean that data sent through the pipe has broken compression already, as it is not sent to any filesystem.

If the above logic is wrong, please let me know.

Comment by Marko Mäkelä [ 2021-10-12 ]

euglorg, did you try the test program that thiru posted? It is producing different results on ext4 and XFS.

As far as I can tell, the function write_compressed() in extra/mariabackup/ds_local.cc is creating sparse files in the traditional way: by writing and seeking past the current end of the file.

We might try to convince XFS harder to create sparse files, by invoking a Linux specific fallocate() system call, to make it closer to the way how writes from the InnoDB buffer pool work.

Comment by Marko Mäkelä [ 2021-10-12 ]

I found a question that suggests that this is an XFS design feature:
https://stackoverflow.com/questions/6940516/create-sparse-file-with-alternate-data-and-hole-on-ext3-and-xfs

This one may be relevant too:
https://serverfault.com/questions/406069/why-are-my-xfs-filesystems-suddenly-consuming-more-space-and-full-of-sparse-file

If XFS is on purpose so reluctant to create sparse files with tiny blocks when they are created in the traditional POSIX way (seeking and writing) as opposed to FALLOC_FL_PUNCH_HOLE, maybe it is not an ideal match for the page_compressed tables.

Comment by Vladislav Vaintroub [ 2021-10-12 ]

Here is another link http://web.archive.org/web/20171010154045/http://oss.sgi.com/archives/xfs/2012-02/msg00413.html, with discussion on xfs mount options. disclaimer - I did not try that.

Comment by Vladislav Vaintroub [ 2021-10-12 ]

It is probably helpful to punch holes on XFS explicitly, with FALLOC_FL_PUNCH_HOLE, after determinining that file system is XFS.

Comment by Eugene [ 2021-10-12 ]

Marko, I haven't tried that test. Instead, I tried to extract backup archive to ext4fs partition and have to confirm that the issue happens on XFS only.
The case looks as follows:

  • Original dataset with page compression has size of 1.7T
  • After running `mariabackup --stream=xbstream | /usr/bin/zstd -o /mysql_backup.zst && zstd -d -c /mysql_backup.zst | mbstream -x -C /mnt/mariabackuptmp` where /mnt/mariabackuptmp is XFS partition, dataset inside /mnt/mariabackuptmp has size of 2.5T
  • After running `mariabackup --stream=xbstream | /usr/bin/zstd -o /mysql_backup.zst && zstd -d -c /mysql_backup.zst | mbstream -x -C /mnt/mariabackuptmp` where /mnt/mariabackuptmp is ext4fs partition, dataset inside /mnt/mariabackuptmp has size of 1.7T

The issue is that XFS has better performance for multithreaded IO, that was the reason of using it. Probably, Vladislav is right and XFS requires specific handling if page compression is in use.

Comment by Vladislav Vaintroub [ 2021-10-12 ]

Perhaps, fallocate --dig-holes on the backup files could help, after decompression. As a workaround

Comment by Eugene [ 2021-10-12 ]

But this workaround is probably far from being good option for SST transfer. It will make it much slower.

Comment by Vladislav Vaintroub [ 2021-10-12 ]

the xbstream/mbstream format is currently not-optimized for the holes. Thus, the suggestion to use lightweight stream compression. Lightweight means it would be is capable of detecting and compressing large runs of 0 bytes, yet would not try too hard to compress already compressed data. zlib with level 1 might be a good candidate

Comment by Guillaume Lefranc [ 2021-10-27 ]

Just as a side note, punching holes with fallocate on XFS is slow:
https://issues.apache.org/jira/browse/KUDU-2052
So, eventually one of the solutions could be to patch mariabackup so if XFS is detected XFS_IOC_UNRESVSP64 is used to punch holes in files

Comment by Vladislav Vaintroub [ 2021-10-27 ]

We do not do fallocate in backup, tanj. There is the traditional, not specifically Linuxy, solution , lseek() behind the end of file, and then write(). distance between old end of file, and the place we write, is supposed to be the hole. The most modern in the current backup solution, is that we do set sparse file attribute on Windows, which works since year 2000 or so

The server does fallocates, and maybe it does them slow, but there is nothing else we do.

Comment by Guillaume Lefranc [ 2021-11-05 ]

@wlad I was referring to Marko's comment:

We might try to convince XFS harder to create sparse files, by invoking a Linux specific fallocate() system call, to make it closer to the way how writes from the InnoDB buffer pool work.

Comment by Vladislav Vaintroub [ 2021-11-05 ]

It is not up to me entirely, but I do not welcome a workaround for every bug-of-day on every possible Linux filesystem. Maybe we eventually just document the behavior of this and that filesystem, and leave it as-is.

Comment by Eugene [ 2021-11-05 ]

Indeed, workaround for every 'bug-of-the-day' is a bad idea, strongly agree with Vladislav!
But looking at this case, it is not even clear what part should be treated as a bug: xfs behaves correctly from filesystem point of view because: (a) correct way to create sparse file is fallocate() and not seek-beyond-EOF, and (b) xfs preallocates blocks to minimize fragmentation, thus, it behaves correctly utilizing certain space that should be a hole on other filesystems. Thus, there's even no reason to report this as a bug of xfs.
The point is that both MariaDB does nothing wrong, just uses generic method of punching holes, but also xfs does nothing wrong, too, while together this causes an unwanted effect with unpredictable disk space utilization after SST.
For now I'd ask for

  • adding a note into documentation about this (known) behavior that is specific for SST with xfs.
  • leaving behavior unchanged but leaving the bug open (yes, this sounds like "known bug is a feature")
    The reason for leaving the bug open is that there's a question whether using seek-beyond-EOF is in fact correct method of handling hole punching: this probably should not be reevaluated and replaced with another (fallocate?) filesystem-specific methods and not just for xfs but for other filesystems also... not sure in fact.
Comment by Guillaume Lefranc [ 2021-11-05 ]

I'd also agree that proper documentation is preferable to workaround. In this case writing that ext4 is preferable as a filesystem "in certain cases"

Just as a note because I tried on XFS but fallocate --dig-holes does not reclaim space sadly.

Comment by Eugene [ 2021-11-08 ]

Tried fallocate --dig-holes on some files from my dataset (on xfs), there's some effect but not that expected difference seen on ext4fs. It probably depends on real data size, hole sizes and extents allocated in every particular case.
By the way, it would be good to add another note on dealing with holes (regardless of filesystem, probably). The case is that in case you have sparse files containing tablespaces on SSD drive and issue 'fstrim' command to trim unused blocks, fstrim locks files having holes and trimming them (that's expected) with freezing IO on them, thus, freezing mariadbd. Once space is trimmed, files "released" and mariadbd resumes normal operation (fstrim can still be handling unoccupied space on same drive yet). On galera cluster this triggers flowcontrol and slows down whole cluster. No reason to report this as a separate bug (as this is also expected behavior in fact and caused by same compression with storing data in sparse files) but a warning about this effect might be helpful.

Comment by Marko Mäkelä [ 2022-03-18 ]

It looks like there is not much that can be improved outside the file system code in the operating system kernel.

euglorg, the traditional UNIX way to create sparse files is to seek beyond the end of the file and write. The Linux fallocate() flag FALLOC_FL_PUNCH_HOLE is much newer; I do not think that it existed 20 years ago.

I mostly agree with the legendary blog post https://smalldatum.blogspot.com/2015/10/wanted-file-system-on-which-innodb.html which is a criticism of the page_compressed format. By outsourcing some complexity to the Linux kernel, we essentially are at the mercy of the kernel. Either we get small annoyances like this one, or potential data corruption like https://lwn.net/Articles/864363/. I have not seen that in the wild, but I remember kernel crashes or file system corruption when Oracle tested the MySQL 5.7 response to the MariaDB 10.1 page_compressed format on a FusionIO device. We do test page_compressed internally, but mostly on the ext4 file system, and I do not think that we have tests that would monitor the actual space allocation.

To my surprise, there still are users who prefer my filesystem-agnostic and overly complex implementation of ROW_FORMAT=COMPRESSED. I wanted to deprecate and remove write support for it, but due to community feedback, I reverted that decision in MDEV-27736.

Comment by Eugene [ 2022-03-18 ]

Thank you for your explanation, marko.
It seems that there's really nothing to be done here (except adding warnings in documentation about the case), at least at the moment.

Comment by Marko Mäkelä [ 2022-03-21 ]

euglorg, thank you. Let’s address this in the documentation.

I would expect the InnoDB page_compressed tables to be able to save space in a traditional file system, such as Linux ext4 or Microsoft NTFS, at the cost of increased fragmentation.

The documentation might also include an example how to check the actually allocated physical blocks. That could be ls -ls on GNU/Linux.

On file systems that support snapshots, the allocation of blocks could be quite different, and questions like ‘how much space does a file consume’ become more complicated. We have observed that enabling O_DIRECT (MDEV-24854) caused problems for several Linux file systems.

Comment by Guillaume Lefranc [ 2022-09-22 ]

It seems that Percona has fixed this by adding fallocate support in xbstream, it would be nice if MariaDB did backport this to mbstream.
https://jira.percona.com/browse/PXB-2614

In relevance to my previous comments, fallocate --dig-holes works with XFS, if speculative allocation is disabled (allocsize=64k in mount options)

Comment by Valerii Kravchuk [ 2023-04-27 ]

So, do we plan to fix anything in mariabackup implementation for XFS in frames of this bug report?

Comment by Marko Mäkelä [ 2023-05-11 ]

I think that tanj’s suggestion is reasonable. This should be relatively simple to implement, but a bit tricky to test.

Comment by Marko Mäkelä [ 2023-10-16 ]

This change looks OK to me. This will add some fallocate() system calls, but only when page_compressed tables are being used.

Generated at Thu Feb 08 09:39:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.