[MDEV-12933] sort out the compression library chaos Created: 2017-05-27 Updated: 2023-12-07 Resolved: 2021-10-27 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Compiling, Packaging |
| Fix Version/s: | 10.7.1 |
| Type: | Task | Priority: | Critical |
| Reporter: | Sergei Golubchik | Assignee: | Sergei Golubchik |
| Resolution: | Fixed | Votes: | 5 |
| Labels: | Preview_10.7, gsoc19, gsoc20 | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
As MariaDB is getting more storage engines and as they're getting more features, MariaDB can optionally use more and more compression libraries for various purposes. InnoDB, TokuDB, RocksDB — they all can use different sets of compression libraries. Compiling them all in would result in a lot of run-time/rpm/deb dependencies, most of which will be never used by most of the users. Not compiling them in, would result in requests to compile them in. While most users don't use all these libraries, many users use some of these libraries. A solution could be to load these libraries on request, without creating a packaging dependency. There are different ways to do it
|
| Comments |
| Comment by Robert Bindar [ 2021-03-03 ] | |||||||||||||||||
|
This was a GSoC project in 2020 where Kartik Soneji has done significant progress. There is a PR in the server which is currently in review. | |||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-24 ] | |||||||||||||||||
|
I think that before we can advance on this, we must demonstrate the usefulness of each compression implementation that we would intend to support in the packages that we distribute. Enabling support for some compression library will practically extend our file format. If we later remove support, there will be complaints for those who would have to convert their files to a supported format. Enabling support for all thinkable compression libraries would unnecessarily bloat the code. Because InnoDB is a widely used storage engine, I think that this task is blocked by | |||||||||||||||||
| Comment by Robert Bindar [ 2021-03-24 ] | |||||||||||||||||
|
Hey marko! I love that you still keep an eye on this, thanks a lot! I'm probably looking at this from the wrong perspective or something because I still don't understand how this refactoring job enables support for other compression libraries except the ones that we already have support for. In my understanding, assuming we merge this project in the server, it's hard to say that the project will enable support for a new compression library XYZ (xyz that we don't currently support) because in order to consider we have support for such library, we would have to first implement a service for XYZ, then add lines of code in a storage engine where we would want to use that compression method and only after these two steps are done, the users can install libXYZ on their system and launch the server with --use-compression=XYZ and configure whatever variables to make a storage engine compress with XYZ. Let me know if I'm wrong. If you can explain in a bit more details so that my silly brain understands it too, I would appreciate a lot. I do agree though that this task should be blocked by | |||||||||||||||||
| Comment by Sergei Golubchik [ 2021-03-24 ] | |||||||||||||||||
|
No, I don't think that you should wait for | |||||||||||||||||
| Comment by Marko Mäkelä [ 2021-08-04 ] | |||||||||||||||||
|
The preliminary results that I have seen for | |||||||||||||||||
| Comment by Sergei Golubchik [ 2021-08-28 ] | |||||||||||||||||
|
to do:
| |||||||||||||||||
| Comment by Daniel Black [ 2021-08-31 ] | |||||||||||||||||
|
pushed bb-10.7-danielblack-mdev-12933-fixup - tested locally on clang-12 As fix for compile error:
| |||||||||||||||||
| Comment by Daniel Black [ 2021-08-31 ] | |||||||||||||||||
|
Second embedded compilation fix added to same branch. Please review/cherry-pick as needed. | |||||||||||||||||
| Comment by Sergei Golubchik [ 2021-08-31 ] | |||||||||||||||||
|
marko, could you please review InnoDB changes in this patch? | |||||||||||||||||
| Comment by Marko Mäkelä [ 2021-08-31 ] | |||||||||||||||||
|
I am sorry, but In fil_node_open_file_low() I would suggest to invoke sql_print_error() directly and to follow the common formatting rules (no TABs, and have { in a separate line). In innodb_init_params() and innodb_compression_algorithm_validate() we could use switch and page_compression_algorithms to avoid code duplication. Otherwise the InnoDB changes look fine to me. | |||||||||||||||||
| Comment by Sergei Golubchik [ 2021-08-31 ] | |||||||||||||||||
|
rebased. fixed formatting. removed code duplication. I didn't change fil_node_open_file_low() to use sql_print_error() because this function uses ib:: just few lines above, so it'd look rather inconsistent. | |||||||||||||||||
| Comment by Sergei Golubchik [ 2021-09-01 ] | |||||||||||||||||
|
A description of what was done: bzip2/lz4/lzma/lzo/snappy compression is now provided via services they're almost like normal services, but in include/providers/ and they're supposed to provide exactly the same interface as original compression libraries (but not everything, only enough of if for the code to compile). the services are implemented via dummy functions that return corresponding error values (LZMA_PROG_ERROR, LZO_E_INTERNAL_ERROR, etc). the actual compression libraries are linked into corresponding provider plugins. Providers are daemon plugins that when loaded replace service pointers to point to actual compression functions. That is, run-time dependency on compression libraries is now on plugins, and the server doesn't need any compression libraries to run, but will automatically support the compression when a plugin is loaded. InnoDB and Mroonga use compression plugins now. RocksDB doesn't, because it comes with standalone utility binaries that cannot load plugins. In other words, InnoDB (and Mroonga) support all compression algorithms now. The is no need for a special build to support, for example, snappy. One only needs to install the corresponding plugin. Server package (RPM of DEB) itself does not depend on any compression libraries anymore (besides zlib, and except libraries that other libraries might need indirectly). There is one package DEB/RPM per provider plugin, it depends on the corresponding compression library. When installed — the server gets the ability to use the compression. If not installed, using the compression will result in an error. | |||||||||||||||||
| Comment by Elena Stepanova [ 2021-09-05 ] | |||||||||||||||||
|
There is an obvious backward compatibility problem here. I assume it is expected, as it is inevitable with this solution, and I don't see a way around it. Due to that very library chaos that we have had so far, current releases have different subsets of compression algorithms on different systems. For example, deb-based ones have lz4 enabled, rpms have lzma and more. Now, when upgrade to 10.7 is performed, all of them will become disabled, and to enable them again users need to install extra packages. And even if they do guess to check whether extra packages may be needed, it looks like there isn't an easy way to determine which compression algorithms are de-facto used in a given instance. Or, if there is, I couldn't find it so far. | |||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-05 ] | |||||||||||||||||
|
elenst, you are spot on. Before the introduction of innodb_checksum_algorithm=full_crc32 and | |||||||||||||||||
| Comment by Elena Stepanova [ 2021-09-05 ] | |||||||||||||||||
|
Other intermediate notes:
| |||||||||||||||||
| Comment by Vladislav Vaintroub [ 2021-09-06 ] | |||||||||||||||||
|
This is btw a good way to identify whether anyone uses Innodb compression with any non-standard algorithm, Maybe the amount of those people is 0, and we never hear a complaint about backward compatibility. I did not notice too many bugs by snappy users | |||||||||||||||||
| Comment by Ian Gilfillan [ 2021-09-21 ] | |||||||||||||||||
|
For those wanting to try it out, this is now available as a preview release. See https://mariadb.org/10-7-preview-feature-provider-plugins/ and https://mariadb.com/kb/en/compression-plugins/. | |||||||||||||||||
| Comment by Elena Stepanova [ 2021-10-26 ] | |||||||||||||||||
|
As far as I can tell for now, the functionality works as planned and thus can be pushed into 10.7 and released with 10.7.1. "As planned" also implies that we are knowingly breaking compatiblity to some extent, hopefully for the greater good in future. We can't know how many users will be affected, it depends on how much the non-zlib compression is currently used. General server considerationsIf the server upgrade is performed in a usual manner (by replacing existing packages with new ones), all tables compressed with non-zlib compression algorithms will inevitably become unreadable.
or
for each affected table. The user needs to pay attention to the mentioned algorithms and install all corresponding provider_xxxx packages. Uninstallation of providers at runtime should be done with caution. Algorithms are still available till server restart, which can create false impression that the tables remain functional (not just to users, but to tools like MariaBackup or mysqldump). Config considerationsIf the server config has a non-default value of innodb_compression_algorithm, the corresponding provider needs to be installed, preferably simultaneously. Otherwise the upgrade will happen, but the server won't start afterwards.
It seems to be harmless, the upgrade still continues and eventually succeeds. MariaBackupMariaBackup in general is only expected to work with a matching version of the server. It is particularly important in this case, because old versions of mariabackup won't be able to deal with the provider libraries. ReplicationWith upgrade through replication, when the replica is upgraded first, it is important not to enable replication until the compression libraries are sorted out, so that binlog events aren't attempted on currently inaccessible tables. Other than that, since the compression algorithm isn't passed over through the binary log, considerations are the same as for the general server – only tables which already exist in the replica are important. GaleraRolling upgrade or adding a 10.7 node to an older-version-based cluster can be tricky with physical methods of SST. | |||||||||||||||||
| Comment by MG [ 2021-10-27 ] | |||||||||||||||||
|
serg, While this bug did mention the worrisome ZSTDNotFinal stub name used in RocksDB, the zstd Github says that it is "used continuously to compress large amounts of data in multiple formats and use cases. Zstandard is considered safe for production environments." I was hoping we would see zstd as an additional MariaDB compression library via this now closed bug. Would it make sense to have this feature request in a new MDEV? | |||||||||||||||||
| Comment by Sergei Golubchik [ 2021-10-28 ] | |||||||||||||||||
|
mg, this issue didn't touch RocksDB at all, only InnoDB and Mroonga. Because currently it can only remove dependencies from server plugins, not from external utility executables. And RocksDB has two of them. But anyway, what do you mean "see zstd as an additional MariaDB compression library"? See where, in RocksDB? In InnoDB? In the protocol? | |||||||||||||||||
| Comment by MG [ 2021-10-28 ] | |||||||||||||||||
|
serg, I did mean for InnoDB and meant to use the phrasing from the blog post accompanying this feature in release notes. |