[MDEV-11068] Review which innodb_compression_algorithm to support in binary packages - Jira

Valerii Kravchuk created issue - 2016-10-17 03:54

Arnaud Adant added a comment - 2016-10-21 14:22

Please note that zlib is also available by default but it would be nice to have the other ones to compare.

Arnaud Adant added a comment - 2016-10-21 14:22 Please note that zlib is also available by default but it would be nice to have the other ones to compare.

Sergei Golubchik added a comment - 2017-05-27 11:05

We cannot add new dependencies to rpm packages after GA.

Sergei Golubchik added a comment - 2017-05-27 11:05 We cannot add new dependencies to rpm packages after GA.

Sergei Golubchik made changes - 2017-05-29 09:34

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-12933~~ [ ~~MDEV-12933~~ ]

Sergei Golubchik made changes - 2017-05-29 09:34

Fix Version/s

10.1 [ 16100 ]

Julien Fritsch made changes - 2017-09-21 16:22

Comment

[ [~valerii] : the customer issue 12544 has been resolved, should we remove the customer id from this bug then ?
]

Marko Mäkelä added a comment - 2019-03-22 15:06

I do not think that it makes sense to enable bzip2 at all. It has a very large memory footprint, and it is designed for compressing much larger input than the innodb_page_size blocks (default 16384 bytes).

Marko Mäkelä added a comment - 2019-03-22 15:06 I do not think that it makes sense to enable bzip2 at all. It has a very large memory footprint, and it is designed for compressing much larger input than the innodb_page_size blocks (default 16384 bytes).

Marko Mäkelä made changes - 2019-03-22 15:06

NRE Projects		RM_105_CANDIDATE
Labels	compression packaging	compression packaging performance

Geoff Montee (Inactive) made changes - 2020-03-05 16:56

Link

This issue relates to MDEV-21877 [ MDEV-21877 ]

Marko Mäkelä made changes - 2020-04-21 08:29

Link

This issue relates to MDEV-11916 [ MDEV-11916 ]

Marko Mäkelä made changes - 2020-04-21 08:29

Link

This issue relates to ~~MDEV-15528~~ [ ~~MDEV-15528~~ ]

Marko Mäkelä made changes - 2020-04-21 08:29

Link

This issue relates to ~~MDEV-8139~~ [ ~~MDEV-8139~~ ]

Marko Mäkelä added a comment - 2020-04-21 08:29

I think that we will need some benchmarks to check not only which file systems work well with page_compressed tables, but also the efficiency of different compression algorithms (in terms of CPU usage and saved storage space).

The benchmark effort may have to wait for MDEV-11916 and ~~MDEV-8139~~ to be fixed. In ~~MDEV-15528~~ we are already enabling additional savings when entire pages are being freed.

Marko Mäkelä added a comment - 2020-04-21 08:29 I think that we will need some benchmarks to check not only which file systems work well with page_compressed tables, but also the efficiency of different compression algorithms (in terms of CPU usage and saved storage space). The benchmark effort may have to wait for MDEV-11916 and MDEV-8139 to be fixed. In MDEV-15528 we are already enabling additional savings when entire pages are being freed.

Marko Mäkelä made changes - 2020-04-21 09:03

Assignee

Axel Schwenke [ axel ]

Sergei Golubchik made changes - 2020-04-21 12:37

Link

This issue is blocked by ~~MDEV-12933~~ [ ~~MDEV-12933~~ ]

Sergei Golubchik made changes - 2020-04-21 12:38

Link

This issue relates to ~~MDEV-12933~~ [ ~~MDEV-12933~~ ]

Marko Mäkelä made changes - 2020-04-23 10:05

Summary

Build packages with both bzip2, lz4, lzma and lzo support for compressed tables

Review which innodb_compression_algorithm to support in binary packages

Marko Mäkelä made changes - 2020-04-23 10:06

Labels

compression packaging performance

Compatibility compression packaging performance

Marko Mäkelä made changes - 2020-04-23 10:06

Link

This issue blocks MDEV-21877 [ MDEV-21877 ]

Marko Mäkelä made changes - 2020-04-23 10:06

Link

This issue relates to MDEV-21877 [ MDEV-21877 ]

Marko Mäkelä added a comment - 2020-04-23 11:28

I think that of all the innodb_compression_algorithm that are implemented in the source code, bzip2 is worst match for InnoDB. The reason is that the ‘input files’ are individual InnoDB pages, at most innodb_page_size bytes of payload to compress. That is, 4KiB, 8KiB, 16KiB, 32KiB, or 64KiB. But, the bzip2 input block size is 1 MiB. The encoder of the compressed data stream could waste some code space for representing longer lengths that would never occur in our use case. Also, the memory usage of bzip2 could be huge. Even the bzip2 --small option is consuming up to 2.5 bytes per input byte.

Maybe we could at least agree to remove bzip2 support in some version? 10.6?

If zlib is soon to support SIMD-based optimizations for modern CPUs, maybe the practical advantage of other implementations of the Lempel-Ziv 1977 and 1978 algorithms will be reduced?

Note: If we ever enable some compression algorithm in our distributed binary packages, then I am afraid that it will be very hard to remove those algorithms later, because users would complain that their data is inaccessible after an upgrade. This could be addressed by creating an external tool that would convert data files, but then that tool would have to depend on all those compression libraries ‘forever’.

Marko Mäkelä added a comment - 2020-04-23 11:28 I think that of all the innodb_compression_algorithm that are implemented in the source code, bzip2 is worst match for InnoDB. The reason is that the ‘input files’ are individual InnoDB pages, at most innodb_page_size bytes of payload to compress. That is, 4KiB, 8KiB, 16KiB, 32KiB, or 64KiB. But, the bzip2 input block size is 1 MiB. The encoder of the compressed data stream could waste some code space for representing longer lengths that would never occur in our use case. Also, the memory usage of bzip2 could be huge. Even the bzip2 --small option is consuming up to 2.5 bytes per input byte. Maybe we could at least agree to remove bzip2 support in some version? 10.6? If zlib is soon to support SIMD-based optimizations for modern CPUs, maybe the practical advantage of other implementations of the Lempel-Ziv 1977 and 1978 algorithms will be reduced? Note: If we ever enable some compression algorithm in our distributed binary packages, then I am afraid that it will be very hard to remove those algorithms later, because users would complain that their data is inaccessible after an upgrade. This could be addressed by creating an external tool that would convert data files, but then that tool would have to depend on all those compression libraries ‘forever’.

Geoff Montee (Inactive) made changes - 2020-04-23 13:25

Link

This issue relates to MDEV-21877 [ MDEV-21877 ]

Otto Kekäläinen added a comment - 2020-04-26 07:33

I did not spot any compression changes yet in recent 10.5 commits, but the autopkgtest at https://salsa.debian.org/mariadb-team/mariadb-server/-/jobs/692724 started failing with `ERROR 1231 (42000) at line 2: Variable 'innodb_compression_algorithm' can't be set to the value of 'snappy'`. Did you decide to remove snappy already or shall I investigate this as a regression?

Otto Kekäläinen added a comment - 2020-04-26 07:33 I did not spot any compression changes yet in recent 10.5 commits, but the autopkgtest at https://salsa.debian.org/mariadb-team/mariadb-server/-/jobs/692724 started failing with `ERROR 1231 (42000) at line 2: Variable 'innodb_compression_algorithm' can't be set to the value of 'snappy'`. Did you decide to remove snappy already or shall I investigate this as a regression?

Otto Kekäläinen added a comment - 2020-05-14 15:10 - edited

Just a reminder that the regression on the 10.5 branch still exists ^. Value 'snappy' for variable `innodb_compression_algorithm` has stopped being a recognized one.

It is rather annoying in my otherwise pretty CI pipeline at https://salsa.debian.org/mariadb-team/mariadb-server/pipelines/136485

Otto Kekäläinen added a comment - 2020-05-14 15:10 - edited Just a reminder that the regression on the 10.5 branch still exists ^. Value 'snappy' for variable `innodb_compression_algorithm` has stopped being a recognized one. It is rather annoying in my otherwise pretty CI pipeline at https://salsa.debian.org/mariadb-team/mariadb-server/pipelines/136485

Otto Kekäläinen made changes - 2020-05-14 15:12

Attachment

image-2020-05-14-18-12-01-528.png [ 51726 ]

Otto Kekäläinen added a comment - 2020-05-19 13:03

Regarding the two comments above, I figured out the snappy and RocksDB failures in debian/test/smoke and will soon submit a PR about them.

Note that in addition to the options listed at https://mariadb.com/kb/en/innodb-page-compression/#configuring-the-innodb-page-compression-algorithm there is also zstd as it is used by RocksDB.

For binaries published in Debian/Ubuntu officially we have only zlib in Ubuntu Bionic (MariaDB 10.1) and both zlib, lz4 and snappy in Ubuntu Focal (MariaDB 10.3).

Snappy was enabled in https://salsa.debian.org/mariadb-team/mariadb-10.3/-/commit/278531a7dfa7d60a60b067d089860c92a4e1221b - was this an OK decision?

If somebody wants to test this, this can be quickly copy-pasted:

mariadb --version

mariadb -e 'SET GLOBAL innodb_compression_algorithm=none;'

mariadb -e 'SET GLOBAL innodb_compression_algorithm=zlib;'

mariadb -e 'SET GLOBAL innodb_compression_algorithm=lz4;'

mariadb -e 'SET GLOBAL innodb_compression_algorithm=lzo;'

mariadb -e 'SET GLOBAL innodb_compression_algorithm=lzma;'

mariadb -e 'SET GLOBAL innodb_compression_algorithm=bzip2;'

mariadb -e 'SET GLOBAL innodb_compression_algorithm=snappy;'

Otto Kekäläinen added a comment - 2020-05-19 13:03 Regarding the two comments above, I figured out the snappy and RocksDB failures in debian/test/smoke and will soon submit a PR about them. Note that in addition to the options listed at https://mariadb.com/kb/en/innodb-page-compression/#configuring-the-innodb-page-compression-algorithm there is also zstd as it is used by RocksDB. For binaries published in Debian/Ubuntu officially we have only zlib in Ubuntu Bionic (MariaDB 10.1) and both zlib, lz4 and snappy in Ubuntu Focal (MariaDB 10.3). Snappy was enabled in https://salsa.debian.org/mariadb-team/mariadb-10.3/-/commit/278531a7dfa7d60a60b067d089860c92a4e1221b - was this an OK decision? If somebody wants to test this, this can be quickly copy-pasted: mariadb --version mariadb -e 'SET GLOBAL innodb_compression_algorithm=none;' mariadb -e 'SET GLOBAL innodb_compression_algorithm=zlib;' mariadb -e 'SET GLOBAL innodb_compression_algorithm=lz4;' mariadb -e 'SET GLOBAL innodb_compression_algorithm=lzo;' mariadb -e 'SET GLOBAL innodb_compression_algorithm=lzma;' mariadb -e 'SET GLOBAL innodb_compression_algorithm=bzip2;' mariadb -e 'SET GLOBAL innodb_compression_algorithm=snappy;'

Otto Kekäläinen added a comment - 2020-05-19 13:22

For the record, RocksDB compression status on current 10.5 branch:

# grep -E "(Compression)? supported:" /var/lib/mysql/#rocksdb/LOG

2020/05/19-13:15:14.987332 7fa080c1d800 Compression algorithms supported:

2020/05/19-13:15:14.987333 7fa080c1d800     kZSTDNotFinalCompression supported: 0

2020/05/19-13:15:14.987334 7fa080c1d800     kZSTD supported: 0

2020/05/19-13:15:14.987335 7fa080c1d800     kXpressCompression supported: 0

2020/05/19-13:15:14.987336 7fa080c1d800     kLZ4HCCompression supported: 1

2020/05/19-13:15:14.987337 7fa080c1d800     kLZ4Compression supported: 1

2020/05/19-13:15:14.987338 7fa080c1d800     kBZip2Compression supported: 0

2020/05/19-13:15:14.987338 7fa080c1d800     kZlibCompression supported: 1

2020/05/19-13:15:14.987339 7fa080c1d800     kSnappyCompression supported: 1

Otto Kekäläinen added a comment - 2020-05-19 13:22 For the record, RocksDB compression status on current 10.5 branch: # grep -E "(Compression)? supported:" /var/lib/mysql/#rocksdb/LOG 2020/05/19-13:15:14.987332 7fa080c1d800 Compression algorithms supported: 2020/05/19-13:15:14.987333 7fa080c1d800 kZSTDNotFinalCompression supported: 0 2020/05/19-13:15:14.987334 7fa080c1d800 kZSTD supported: 0 2020/05/19-13:15:14.987335 7fa080c1d800 kXpressCompression supported: 0 2020/05/19-13:15:14.987336 7fa080c1d800 kLZ4HCCompression supported: 1 2020/05/19-13:15:14.987337 7fa080c1d800 kLZ4Compression supported: 1 2020/05/19-13:15:14.987338 7fa080c1d800 kBZip2Compression supported: 0 2020/05/19-13:15:14.987338 7fa080c1d800 kZlibCompression supported: 1 2020/05/19-13:15:14.987339 7fa080c1d800 kSnappyCompression supported: 1

Marko Mäkelä added a comment - 2020-05-19 13:25

otto, thank you for investigating this.

I think that we should try to remove the ‘useless’ algortihms (determined by benchmarking) and extend innochecksum (or create a separate tool) to allow re-encoding data files from any previously supported page_compressed algorithm into the supported ones.

Marko Mäkelä added a comment - 2020-05-19 13:25 otto , thank you for investigating this. I think that we should try to remove the ‘useless’ algortihms (determined by benchmarking) and extend innochecksum (or create a separate tool) to allow re-encoding data files from any previously supported page_compressed algorithm into the supported ones.

Otto Kekäläinen added a comment - 2020-06-06 09:10

Related commit https://github.com/MariaDB/server/commit/6af37ba881fee7e6f651d5e0730c9374337ad1b4 by serg

Did not seem to have had an effect on this. Outputs from latest 10.5 at the time of writing:

root@e1cbb08df912:/etc/mysql# mariadb --version

mariadb  Ver 15.1 Distrib 10.5.4-MariaDB, for debian-linux-gnu (x86_64) using  EditLine wrapper

root@e1cbb08df912:/etc/mysql# mariadb -e 'SET GLOBAL innodb_compression_algorithm=none;'

root@e1cbb08df912:/etc/mysql# mariadb -e 'SET GLOBAL innodb_compression_algorithm=zlib;'

root@e1cbb08df912:/etc/mysql# mariadb -e 'SET GLOBAL innodb_compression_algorithm=lz4;'

root@e1cbb08df912:/etc/mysql# mariadb -e 'SET GLOBAL innodb_compression_algorithm=lzo;'

ERROR 1231 (42000) at line 1: Variable 'innodb_compression_algorithm' can't be set to the value of 'lzo'

root@e1cbb08df912:/etc/mysql# mariadb -e 'SET GLOBAL innodb_compression_algorithm=lzma;'

ERROR 1231 (42000) at line 1: Variable 'innodb_compression_algorithm' can't be set to the value of 'lzma'

root@e1cbb08df912:/etc/mysql# mariadb -e 'SET GLOBAL innodb_compression_algorithm=bzip2;'

ERROR 1231 (42000) at line 1: Variable 'innodb_compression_algorithm' can't be set to the value of 'bzip2'

root@e1cbb08df912:/etc/mysql# mariadb -e 'SET GLOBAL innodb_compression_algorithm=snappy;'

ERROR 1231 (42000) at line 1: Variable 'innodb_compression_algorithm' can't be set to the value of 'snappy'

root@e1cbb08df912:/etc/mysql# grep -E "(Compression)? supported:" /var/lib/mysql/#rocksdb/LOG

2020/06/06-08:50:16.909747 7f68390b5800 Compression algorithms supported:

2020/06/06-08:50:16.909748 7f68390b5800 	kZSTDNotFinalCompression supported: 0

2020/06/06-08:50:16.909749 7f68390b5800 	kZSTD supported: 0

2020/06/06-08:50:16.909750 7f68390b5800 	kXpressCompression supported: 0

2020/06/06-08:50:16.909751 7f68390b5800 	kLZ4HCCompression supported: 1

2020/06/06-08:50:16.909752 7f68390b5800 	kLZ4Compression supported: 1

2020/06/06-08:50:16.909753 7f68390b5800 	kBZip2Compression supported: 0

2020/06/06-08:50:16.909753 7f68390b5800 	kZlibCompression supported: 1

2020/06/06-08:50:16.909754 7f68390b5800 	kSnappyCompression supported: 1

2020/06/06-08:50:16.909756 7f68390b5800 Fast CRC32 supported: Supported on x86

Autopkgtest at https://salsa.debian.org/mariadb-team/mariadb-server/-/jobs/787208 also still failing.

Otto Kekäläinen added a comment - 2020-06-06 09:10 Related commit https://github.com/MariaDB/server/commit/6af37ba881fee7e6f651d5e0730c9374337ad1b4 by serg Did not seem to have had an effect on this. Outputs from latest 10.5 at the time of writing: root@e1cbb08df912:/etc/mysql# mariadb --version mariadb Ver 15.1 Distrib 10.5.4-MariaDB, for debian-linux-gnu (x86_64) using EditLine wrapper root@e1cbb08df912:/etc/mysql# mariadb -e 'SET GLOBAL innodb_compression_algorithm=none;' root@e1cbb08df912:/etc/mysql# mariadb -e 'SET GLOBAL innodb_compression_algorithm=zlib;' root@e1cbb08df912:/etc/mysql# mariadb -e 'SET GLOBAL innodb_compression_algorithm=lz4;' root@e1cbb08df912:/etc/mysql# mariadb -e 'SET GLOBAL innodb_compression_algorithm=lzo;' ERROR 1231 (42000) at line 1: Variable 'innodb_compression_algorithm' can't be set to the value of 'lzo' root@e1cbb08df912:/etc/mysql# mariadb -e 'SET GLOBAL innodb_compression_algorithm=lzma;' ERROR 1231 (42000) at line 1: Variable 'innodb_compression_algorithm' can't be set to the value of 'lzma' root@e1cbb08df912:/etc/mysql# mariadb -e 'SET GLOBAL innodb_compression_algorithm=bzip2;' ERROR 1231 (42000) at line 1: Variable 'innodb_compression_algorithm' can't be set to the value of 'bzip2' root@e1cbb08df912:/etc/mysql# mariadb -e 'SET GLOBAL innodb_compression_algorithm=snappy;' ERROR 1231 (42000) at line 1: Variable 'innodb_compression_algorithm' can't be set to the value of 'snappy' root@e1cbb08df912:/etc/mysql# grep -E "(Compression)? supported:" /var/lib/mysql/#rocksdb/LOG 2020/06/06-08:50:16.909747 7f68390b5800 Compression algorithms supported: 2020/06/06-08:50:16.909748 7f68390b5800 kZSTDNotFinalCompression supported: 0 2020/06/06-08:50:16.909749 7f68390b5800 kZSTD supported: 0 2020/06/06-08:50:16.909750 7f68390b5800 kXpressCompression supported: 0 2020/06/06-08:50:16.909751 7f68390b5800 kLZ4HCCompression supported: 1 2020/06/06-08:50:16.909752 7f68390b5800 kLZ4Compression supported: 1 2020/06/06-08:50:16.909753 7f68390b5800 kBZip2Compression supported: 0 2020/06/06-08:50:16.909753 7f68390b5800 kZlibCompression supported: 1 2020/06/06-08:50:16.909754 7f68390b5800 kSnappyCompression supported: 1 2020/06/06-08:50:16.909756 7f68390b5800 Fast CRC32 supported: Supported on x86 Autopkgtest at https://salsa.debian.org/mariadb-team/mariadb-server/-/jobs/787208 also still failing.

Otto Kekäläinen added a comment - 2020-06-06 22:45

OK, I managed to solve above mentioned issues. PR available at https://github.com/MariaDB/server/pull/1582

Otto Kekäläinen added a comment - 2020-06-06 22:45 OK, I managed to solve above mentioned issues. PR available at https://github.com/MariaDB/server/pull/1582

Marko Mäkelä made changes - 2020-06-09 06:03

Link

This issue relates to MDEV-22839 [ MDEV-22839 ]

Marko Mäkelä made changes - 2020-08-08 07:40

Link

This issue blocks ~~MDEV-20255~~ [ ~~MDEV-20255~~ ]

Sergei Golubchik made changes - 2020-09-13 09:00

Link

This issue relates to MDEV-21877 [ MDEV-21877 ]

Marko Mäkelä added a comment - 2021-03-03 06:59

I reiterate what I said on 2020-05-19: I do not think that we should introduce new file formats lightly. If we add a compression library to our distributed packages, there will be a significant additional cost for removing the code later. Users who enabled an algorithm would have to execute additional steps on an upgrade to a later version where we might want to remove that form of compression. And we would have to provide an upgrade tool for converting affected files.

To save us from such trouble, we should run some benchmarks beforehand and determine which library provides the best ratio between CPU usage and compression savings. I think that we will need two types of I/O bound benchmarks: ~~MDEV-23399~~ style (large redo log, and the data does not completely fit in the buffer pool), and ~~MDEV-23855~~ style (tiny redo log, frequent checkpoint flushing, while all data fits in the buffer pool). The former should involve both page reads and writes, and the latter should basically be write-only.

Later in 2020, I learned about thinly provisioned smart SSDs that would compress data on the fly. They present themselves as larger-than-real capacity. I think that with such storage, and with a configuration option that disables the hole-punching in InnoDB, the page_compressed tables could become a viable option. In that case, the files would be completely regular (not sparse) on the file system level.

Marko Mäkelä added a comment - 2021-03-03 06:59 I reiterate what I said on 2020-05-19: I do not think that we should introduce new file formats lightly. If we add a compression library to our distributed packages, there will be a significant additional cost for removing the code later. Users who enabled an algorithm would have to execute additional steps on an upgrade to a later version where we might want to remove that form of compression. And we would have to provide an upgrade tool for converting affected files. To save us from such trouble, we should run some benchmarks beforehand and determine which library provides the best ratio between CPU usage and compression savings. I think that we will need two types of I/O bound benchmarks: MDEV-23399 style (large redo log, and the data does not completely fit in the buffer pool), and MDEV-23855 style (tiny redo log, frequent checkpoint flushing, while all data fits in the buffer pool). The former should involve both page reads and writes, and the latter should basically be write-only. Later in 2020, I learned about thinly provisioned smart SSDs that would compress data on the fly. They present themselves as larger-than-real capacity. I think that with such storage, and with a configuration option that disables the hole-punching in InnoDB, the page_compressed tables could become a viable option. In that case, the files would be completely regular (not sparse) on the file system level.

Julien Fritsch made changes - 2021-03-03 07:35

Description

It seems RPMs for 10.1.x come with only LZMA of all possible compression algorithms. It is suggested to build from source when other algorithms are needed (see https://mariadb.com/kb/en/mariadb/compression/).

On some systems (like RHEL) that we provide packages for some of these algorithms (like LZ4) are available as packages. I think it makes sense (maybe only for Enterprise binaries?) to build with them added and then add dependencies for the related packages.

It seems RPMs for 10.1.x come with only LZMA of all possible compression algorithms. It is suggested to build from source when other algorithms are needed (see https://mariadb.com/kb/en/mariadb/compression/).

On some systems (like RHEL) that we provide packages for some of these algorithms (like LZ4) are available as packages. I think it makes sense (maybe only for Enterprise binaries?) to build with them added and then add dependencies for the related packages.

---------------------
I reiterate what I said on 2020-05-19: I do not think that we should introduce new file formats lightly. If we add a compression library to our distributed packages, there will be a significant additional cost for removing the code later. Users who enabled an algorithm would have to execute additional steps on an upgrade to a later version where we might want to remove that form of compression. And we would have to provide an upgrade tool for converting affected files.

To save us from such trouble, we should run some benchmarks beforehand and determine which library provides the best ratio between CPU usage and compression savings. I think that we will need two types of I/O bound benchmarks: ~~MDEV-23399~~ style (large redo log, and the data does not completely fit in the buffer pool), and ~~MDEV-23855~~ style (tiny redo log, frequent checkpoint flushing, while all data fits in the buffer pool). The former should involve both page reads and writes, and the latter should basically be write-only.

Later in 2020, I learned about thinly provisioned smart SSDs that would compress data on the fly. They present themselves as larger-than-real capacity. I think that with such storage, and with a configuration option that disables the hole-punching in InnoDB, the page_compressed tables could become a viable option. In that case, the files would be completely regular (not sparse) on the file system level.

Julien Fritsch made changes - 2021-03-03 07:35

Description

It seems RPMs for 10.1.x come with only LZMA of all possible compression algorithms. It is suggested to build from source when other algorithms are needed (see https://mariadb.com/kb/en/mariadb/compression/).

On some systems (like RHEL) that we provide packages for some of these algorithms (like LZ4) are available as packages. I think it makes sense (maybe only for Enterprise binaries?) to build with them added and then add dependencies for the related packages.

---------------------
I reiterate what I said on 2020-05-19: I do not think that we should introduce new file formats lightly. If we add a compression library to our distributed packages, there will be a significant additional cost for removing the code later. Users who enabled an algorithm would have to execute additional steps on an upgrade to a later version where we might want to remove that form of compression. And we would have to provide an upgrade tool for converting affected files.

To save us from such trouble, we should run some benchmarks beforehand and determine which library provides the best ratio between CPU usage and compression savings. I think that we will need two types of I/O bound benchmarks: ~~MDEV-23399~~ style (large redo log, and the data does not completely fit in the buffer pool), and ~~MDEV-23855~~ style (tiny redo log, frequent checkpoint flushing, while all data fits in the buffer pool). The former should involve both page reads and writes, and the latter should basically be write-only.

Later in 2020, I learned about thinly provisioned smart SSDs that would compress data on the fly. They present themselves as larger-than-real capacity. I think that with such storage, and with a configuration option that disables the hole-punching in InnoDB, the page_compressed tables could become a viable option. In that case, the files would be completely regular (not sparse) on the file system level.

It seems RPMs for 10.1.x come with only LZMA of all possible compression algorithms. It is suggested to build from source when other algorithms are needed (see https://mariadb.com/kb/en/mariadb/compression/).

On some systems (like RHEL) that we provide packages for some of these algorithms (like LZ4) are available as packages. I think it makes sense (maybe only for Enterprise binaries?) to build with them added and then add dependencies for the related packages.

---------------------
But we
I reiterate what I said on 2020-05-19: I do not think that we should introduce new file formats lightly. If we add a compression library to our distributed packages, there will be a significant additional cost for removing the code later. Users who enabled an algorithm would have to execute additional steps on an upgrade to a later version where we might want to remove that form of compression. And we would have to provide an upgrade tool for converting affected files.

To save us from such trouble, we should run some benchmarks beforehand and determine which library provides the best ratio between CPU usage and compression savings. I think that we will need two types of I/O bound benchmarks: ~~MDEV-23399~~ style (large redo log, and the data does not completely fit in the buffer pool), and ~~MDEV-23855~~ style (tiny redo log, frequent checkpoint flushing, while all data fits in the buffer pool). The former should involve both page reads and writes, and the latter should basically be write-only.

Later in 2020, I learned about thinly provisioned smart SSDs that would compress data on the fly. They present themselves as larger-than-real capacity. I think that with such storage, and with a configuration option that disables the hole-punching in InnoDB, the page_compressed tables could become a viable option. In that case, the files would be completely regular (not sparse) on the file system level.

Julien Fritsch made changes - 2021-03-03 07:36

Description

It seems RPMs for 10.1.x come with only LZMA of all possible compression algorithms. It is suggested to build from source when other algorithms are needed (see https://mariadb.com/kb/en/mariadb/compression/).

On some systems (like RHEL) that we provide packages for some of these algorithms (like LZ4) are available as packages. I think it makes sense (maybe only for Enterprise binaries?) to build with them added and then add dependencies for the related packages.

---------------------
But we
I reiterate what I said on 2020-05-19: I do not think that we should introduce new file formats lightly. If we add a compression library to our distributed packages, there will be a significant additional cost for removing the code later. Users who enabled an algorithm would have to execute additional steps on an upgrade to a later version where we might want to remove that form of compression. And we would have to provide an upgrade tool for converting affected files.

To save us from such trouble, we should run some benchmarks beforehand and determine which library provides the best ratio between CPU usage and compression savings. I think that we will need two types of I/O bound benchmarks: ~~MDEV-23399~~ style (large redo log, and the data does not completely fit in the buffer pool), and ~~MDEV-23855~~ style (tiny redo log, frequent checkpoint flushing, while all data fits in the buffer pool). The former should involve both page reads and writes, and the latter should basically be write-only.

Later in 2020, I learned about thinly provisioned smart SSDs that would compress data on the fly. They present themselves as larger-than-real capacity. I think that with such storage, and with a configuration option that disables the hole-punching in InnoDB, the page_compressed tables could become a viable option. In that case, the files would be completely regular (not sparse) on the file system level.

It seems RPMs for 10.1.x come with only LZMA of all possible compression algorithms. It is suggested to build from source when other algorithms are needed (see https://mariadb.com/kb/en/mariadb/compression/).

On some systems (like RHEL) that we provide packages for some of these algorithms (like LZ4) are available as packages. I think it makes sense (maybe only for Enterprise binaries?) to build with them added and then add dependencies for the related packages.

---------------------
But we cannot add new dependencies to rpm packages after GA. And we should not introduce new file formats lightly. If we add a compression library to our distributed packages, there will be a significant additional cost for removing the code later. Users who enabled an algorithm would have to execute additional steps on an upgrade to a later version where we might want to remove that form of compression. And we would have to provide an upgrade tool for converting affected files. To save us from such trouble, we should run some benchmarks beforehand and determine which library provides the best ratio between CPU usage and compression savings.

Marko Mäkelä added a comment - 2021-03-15 09:43

Perhaps we should limit our offering to ZLIB and ZSTD. ZSTD is currently being used by RocksDB, but there is no InnoDB interface for it yet.

Marko Mäkelä added a comment - 2021-03-15 09:43 Perhaps we should limit our offering to ZLIB and ZSTD. ZSTD is currently being used by RocksDB, but there is no InnoDB interface for it yet.

Marko Mäkelä made changes - 2021-03-24 06:33

Link

This issue is blocked by ~~MDEV-12933~~ [ ~~MDEV-12933~~ ]

Marko Mäkelä made changes - 2021-03-24 06:37

Link

This issue blocks ~~MDEV-12933~~ [ ~~MDEV-12933~~ ]

Marko Mäkelä added a comment - 2021-03-24 06:41

I noticed that MDEV-22310 has been filed for implementing ZSTD support, presumably in InnoDB. I think that we would definitely need a prototype of that before proceeding with benchmarks.

In my opinion, ideally we should not support more the than zlib and ZSTD. Only if benchmarks indicate that some other implementation offers significantly better compression ratio at a comparable CPU overhead, we could enable it.

Marko Mäkelä added a comment - 2021-03-24 06:41 I noticed that MDEV-22310 has been filed for implementing ZSTD support, presumably in InnoDB. I think that we would definitely need a prototype of that before proceeding with benchmarks. In my opinion, ideally we should not support more the than zlib and ZSTD . Only if benchmarks indicate that some other implementation offers significantly better compression ratio at a comparable CPU overhead, we could enable it.

Marko Mäkelä made changes - 2021-03-24 06:41

Link

This issue is blocked by MDEV-22310 [ MDEV-22310 ]

Marko Mäkelä made changes - 2021-03-25 06:40

Link

This issue blocks ~~MDEV-22895~~ [ ~~MDEV-22895~~ ]

Geoff Montee (Inactive) made changes - 2021-03-26 19:58

Link

This issue blocks MENT-1167 [ MENT-1167 ]

Rob Schwyzer (Inactive) added a comment - 2021-03-29 18:51 - edited

In my opinion, ideally we should not support more the than zlib and ZSTD. Only if benchmarks indicate that some other implementation offers significantly better compression ratio at a comparable CPU overhead, we could enable it.

There is a strong argument for LZ4 as a faster algorithm-
https://www.percona.com/blog/2016/04/13/evaluating-database-compression-methods-update/

Ideally that would provide ZSTD and zlib for users prioritizing compression ratio, while LZ4 provides an option for getting ~50% of that ratio with a much smaller hit on performance (the real key for this is LZ4's massive advantage in decompression performance, so for customers who have workloads where writes are more upfront while reads are the heavier workload overall, this can be a very minimal performance hit while still reducing disk space usage by about 3x). Its compression performance is still much better than ZSTD and zlib as well so for the other use-case of customers who enable compression to buy time while they perform a major SAN upgrade or similar, LZ4 is more viable on the whole to enable that use-case as well.

Rob Schwyzer (Inactive) added a comment - 2021-03-29 18:51 - edited In my opinion, ideally we should not support more the than zlib and ZSTD. Only if benchmarks indicate that some other implementation offers significantly better compression ratio at a comparable CPU overhead, we could enable it. There is a strong argument for LZ4 as a faster algorithm- https://www.percona.com/blog/2016/04/13/evaluating-database-compression-methods-update/ Ideally that would provide ZSTD and zlib for users prioritizing compression ratio, while LZ4 provides an option for getting ~50% of that ratio with a much smaller hit on performance (the real key for this is LZ4's massive advantage in decompression performance, so for customers who have workloads where writes are more upfront while reads are the heavier workload overall, this can be a very minimal performance hit while still reducing disk space usage by about 3x). Its compression performance is still much better than ZSTD and zlib as well so for the other use-case of customers who enable compression to buy time while they perform a major SAN upgrade or similar, LZ4 is more viable on the whole to enable that use-case as well.

Marko Mäkelä made changes - 2021-06-28 09:51

Link

This issue is blocked by ~~MDEV-26029~~ [ ~~MDEV-26029~~ ]

Marko Mäkelä added a comment - 2021-07-22 09:51

I see that wlad expressed some skepticism towards ZSTD in MDEV-22310 (which was actually filed for the client/server communication protocol). It is true that we can enable support for LZ4 with trivial effort in our distributed executables, because the support is already present in the source code. We might implement and enable support for ZSTD in InnoDB later, if it turns out to be significantly better than other alternatives.

Marko Mäkelä added a comment - 2021-07-22 09:51 I see that wlad expressed some skepticism towards ZSTD in MDEV-22310 (which was actually filed for the client/server communication protocol). It is true that we can enable support for LZ4 with trivial effort in our distributed executables, because the support is already present in the source code. We might implement and enable support for ZSTD in InnoDB later, if it turns out to be significantly better than other alternatives.

Axel Schwenke made changes - 2021-07-27 08:42

Fix Version/s

N/A [ 14700 ]

Axel Schwenke made changes - 2021-07-27 08:42

Status

Open [ 1 ]

In Progress [ 3 ]

Sergei Golubchik made changes - 2021-08-11 15:03

Link

This issue blocks ~~MDEV-12933~~ [ ~~MDEV-12933~~ ]

Sergei Golubchik made changes - 2021-12-06 21:22

Workflow

MariaDB v3 [ 77852 ]

MariaDB v4 [ 131818 ]

AirFocus made changes - 2022-08-09 16:11

Description

It seems RPMs for 10.1.x come with only LZMA of all possible compression algorithms. It is suggested to build from source when other algorithms are needed (see https://mariadb.com/kb/en/mariadb/compression/).

On some systems (like RHEL) that we provide packages for some of these algorithms (like LZ4) are available as packages. I think it makes sense (maybe only for Enterprise binaries?) to build with them added and then add dependencies for the related packages.

---------------------
But we cannot add new dependencies to rpm packages after GA. And we should not introduce new file formats lightly. If we add a compression library to our distributed packages, there will be a significant additional cost for removing the code later. Users who enabled an algorithm would have to execute additional steps on an upgrade to a later version where we might want to remove that form of compression. And we would have to provide an upgrade tool for converting affected files. To save us from such trouble, we should run some benchmarks beforehand and determine which library provides the best ratio between CPU usage and compression savings.

It seems RPMs for 10.1.x come with only LZMA of all possible compression algorithms. It is suggested to build from source when other algorithms are needed (see https://mariadb.com/kb/en/mariadb/compression/).

On some systems (like RHEL) that we provide packages for some of these algorithms (like LZ4) are available as packages. I think it makes sense (maybe only for Enterprise binaries?) to build with them added and then add dependencies for the related packages.

----

----

----

----

----

\-
But we cannot add new dependencies to rpm packages after GA. And we should not introduce new file formats lightly. If we add a compression library to our distributed packages, there will be a significant additional cost for removing the code later. Users who enabled an algorithm would have to execute additional steps on an upgrade to a later version where we might want to remove that form of compression. And we would have to provide an upgrade tool for converting affected files. To save us from such trouble, we should run some benchmarks beforehand and determine which library provides the best ratio between CPU usage and compression savings.

Julien Fritsch made changes - 2022-08-10 08:06

Description

It seems RPMs for 10.1.x come with only LZMA of all possible compression algorithms. It is suggested to build from source when other algorithms are needed (see https://mariadb.com/kb/en/mariadb/compression/).

On some systems (like RHEL) that we provide packages for some of these algorithms (like LZ4) are available as packages. I think it makes sense (maybe only for Enterprise binaries?) to build with them added and then add dependencies for the related packages.

----

----

----

----

----

\-
But we cannot add new dependencies to rpm packages after GA. And we should not introduce new file formats lightly. If we add a compression library to our distributed packages, there will be a significant additional cost for removing the code later. Users who enabled an algorithm would have to execute additional steps on an upgrade to a later version where we might want to remove that form of compression. And we would have to provide an upgrade tool for converting affected files. To save us from such trouble, we should run some benchmarks beforehand and determine which library provides the best ratio between CPU usage and compression savings.

It seems RPMs for 10.1.x come with only LZMA of all possible compression algorithms. It is suggested to build from source when other algorithms are needed (see https://mariadb.com/kb/en/mariadb/compression/).

On some systems (like RHEL) that we provide packages for some of these algorithms (like LZ4) are available as packages. I think it makes sense (maybe only for Enterprise binaries?) to build with them added and then add dependencies for the related packages.

----

----

----

----

----

-
But we cannot add new dependencies to rpm packages after GA. And we should not introduce new file formats lightly. If we add a compression library to our distributed packages, there will be a significant additional cost for removing the code later. Users who enabled an algorithm would have to execute additional steps on an upgrade to a later version where we might want to remove that form of compression. And we would have to provide an upgrade tool for converting affected files. To save us from such trouble, we should run some benchmarks beforehand and determine which library provides the best ratio between CPU usage and compression savings.

Julien Fritsch made changes - 2022-10-24 11:22

Fix Version/s

N/A [ 14700 ]

Julien Fritsch made changes - 2023-11-30 16:29

Issue Type

Task [ 3 ]

New Feature [ 2 ]

Julien Fritsch made changes - 2023-12-07 09:57

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Marko Mäkelä added a comment - 2023-12-07 12:31

This was sort-of addressed in ~~MDEV-12933~~.

Marko Mäkelä added a comment - 2023-12-07 12:31 This was sort-of addressed in MDEV-12933 .

Marko Mäkelä made changes - 2023-12-07 12:31

Link

This issue relates to ~~MDEV-12933~~ [ ~~MDEV-12933~~ ]

Sergei Golubchik made changes - 2023-12-12 20:30

Fix Version/s		N/A [ 14700 ]
Assignee	Axel Schwenke [ axel ]	Sergei Golubchik [ serg ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Jira Automation (IT) made changes - 2024-07-04 08:37

Zendesk Related Tickets

193884 139416

MariaDB Server

Review which innodb_compression_algorithm to support in binary packages

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration