Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12933

sort out the compression library chaos

Details

    Description

      As MariaDB is getting more storage engines and as they're getting more features, MariaDB can optionally use more and more compression libraries for various purposes.

      InnoDB, TokuDB, RocksDB — they all can use different sets of compression libraries. Compiling them all in would result in a lot of run-time/rpm/deb dependencies, most of which will be never used by most of the users. Not compiling them in, would result in requests to compile them in. While most users don't use all these libraries, many users use some of these libraries.

      A solution could be to load these libraries on request, without creating a packaging dependency. There are different ways to do it

      • hide all compression libraries behind a single unified compression API. Either develop our own or use something like Squash. This would require changing all engines to use this API
      • use the same approach as in server services — create a service per compression library, a service implementation will just return an error code for any function invocation if the corresponding library is not installed. this way — may be — we could avoid modifying all affected storage engines

      Attachments

        Issue Links

          Activity

            greenman Ian Gilfillan added a comment -

            For those wanting to try it out, this is now available as a preview release. See https://mariadb.org/10-7-preview-feature-provider-plugins/ and https://mariadb.com/kb/en/compression-plugins/.

            greenman Ian Gilfillan added a comment - For those wanting to try it out, this is now available as a preview release. See https://mariadb.org/10-7-preview-feature-provider-plugins/ and https://mariadb.com/kb/en/compression-plugins/ .
            elenst Elena Stepanova added a comment - - edited

            As far as I can tell for now, the functionality works as planned and thus can be pushed into 10.7 and released with 10.7.1.

            "As planned" also implies that we are knowingly breaking compatiblity to some extent, hopefully for the greater good in future. We can't know how many users will be affected, it depends on how much the non-zlib compression is currently used.
            Thus every effort should be made to document it and make it clear and visible to users.
            Here are some notes, they are far from complete. To be updated (serg, marko, feel free to edit as you deem fit).

            General server considerations

            If the server upgrade is performed in a usual manner (by replacing existing packages with new ones), all tables compressed with non-zlib compression algorithms will inevitably become unreadable.
            If the user knows in advance which algorithms are in use, the corresponding provider_xxxx packages should be installed right away.
            In any case, after the upgrade is performed, mysql_upgrade must be run – it must be run in any case, but this time it is highly recommended to run it manually, even with --force option if it claims it has already been done, and inspect the output – the exit code cannot be relied upon. Alternatively, mysqlcheck --all-databases can be run.
            If there is a problem with compression algorithms, it will demonstrate as something like

            Warning  : MariaDB tried to use the LZMA compression, but its provider plugin is not loaded
            Error    : Table 'test.t' doesn't exist in engine
            status   : Operation failed
            

            or

            Error    : Table test/t is compressed with lzma, which is not currently loaded. Please load the lzma provider plugin to open the table
            error    : Corrupt
            

            for each affected table. The user needs to pay attention to the mentioned algorithms and install all corresponding provider_xxxx packages.
            After plugin installation, the server will need to be restarted.
            Naturally until the tables are brought back to order, all incoming traffic must be disabled.

            Uninstallation of providers at runtime should be done with caution. Algorithms are still available till server restart, which can create false impression that the tables remain functional (not just to users, but to tools like MariaBackup or mysqldump).

            Config considerations

            If the server config has a non-default value of innodb_compression_algorithm, the corresponding provider needs to be installed, preferably simultaneously. Otherwise the upgrade will happen, but the server won't start afterwards.
            Even if the corresponding provider is installed simultaneously with the new server package, the installation can throw intermediate errors, particularly with deb packages

            Installing new version of config file /etc/mysql/mariadb.conf.d/50-server.cnf ...
            mariadb-extra.socket is a disabled or a static unit, not starting it.
            mariadb-extra.socket is a disabled or a static unit, not starting it.
            Job for mariadb.service failed because the control process exited with error code.
            See "systemctl status mariadb.service" and "journalctl -xe" for details.
            

            It seems to be harmless, the upgrade still continues and eventually succeeds.
            Alternatively, innodb_compression_algorithm setting can be (at least temporarily) disabled before the upgrade.

            MariaBackup

            MariaBackup in general is only expected to work with a matching version of the server. It is particularly important in this case, because old versions of mariabackup won't be able to deal with the provider libraries.
            With the latest commits in the feature tree I didn't come up with specifiic faulty scenarios involving MariaBackup, but I expect them to be possible, particularly involving runtime uninstallation of providers.

            Replication

            With upgrade through replication, when the replica is upgraded first, it is important not to enable replication until the compression libraries are sorted out, so that binlog events aren't attempted on currently inaccessible tables. Other than that, since the compression algorithm isn't passed over through the binary log, considerations are the same as for the general server – only tables which already exist in the replica are important.

            Galera

            Rolling upgrade or adding a 10.7 node to an older-version-based cluster can be tricky with physical methods of SST.
            Judging by the new node alone, it is impossible to say in advance which algorithms may be needed, so it is likely that not all necessary providers will be installed in advance.
            When the SST is performed, e.g. via MariaBackup, if the libraries are missing, it will throw some errors, but SST will still succeed, which means that the node will join the cluster and will start processing queries, including the queries against the tables which it cannot yet handle. At best (if the new nodes are minority) it will make the node leave the cluster; if too many nodes are upgraded/added at once, it can probably cause the entire cluster failure. Maybe Galera experts can offer advice on how it should be handled best on the user side.

            elenst Elena Stepanova added a comment - - edited As far as I can tell for now, the functionality works as planned and thus can be pushed into 10.7 and released with 10.7.1. "As planned" also implies that we are knowingly breaking compatiblity to some extent, hopefully for the greater good in future. We can't know how many users will be affected, it depends on how much the non-zlib compression is currently used. Thus every effort should be made to document it and make it clear and visible to users. Here are some notes, they are far from complete. To be updated ( serg , marko , feel free to edit as you deem fit). General server considerations If the server upgrade is performed in a usual manner (by replacing existing packages with new ones), all tables compressed with non-zlib compression algorithms will inevitably become unreadable. If the user knows in advance which algorithms are in use, the corresponding provider_xxxx packages should be installed right away. In any case, after the upgrade is performed, mysql_upgrade must be run – it must be run in any case, but this time it is highly recommended to run it manually, even with --force option if it claims it has already been done, and inspect the output – the exit code cannot be relied upon. Alternatively, mysqlcheck --all-databases can be run. If there is a problem with compression algorithms, it will demonstrate as something like Warning : MariaDB tried to use the LZMA compression, but its provider plugin is not loaded Error : Table 'test.t' doesn't exist in engine status : Operation failed or Error : Table test/t is compressed with lzma, which is not currently loaded. Please load the lzma provider plugin to open the table error : Corrupt for each affected table. The user needs to pay attention to the mentioned algorithms and install all corresponding provider_xxxx packages. After plugin installation, the server will need to be restarted. Naturally until the tables are brought back to order, all incoming traffic must be disabled . Uninstallation of providers at runtime should be done with caution. Algorithms are still available till server restart, which can create false impression that the tables remain functional (not just to users, but to tools like MariaBackup or mysqldump). Config considerations If the server config has a non-default value of innodb_compression_algorithm, the corresponding provider needs to be installed, preferably simultaneously. Otherwise the upgrade will happen, but the server won't start afterwards. Even if the corresponding provider is installed simultaneously with the new server package, the installation can throw intermediate errors, particularly with deb packages Installing new version of config file /etc/mysql/mariadb.conf.d/50-server.cnf ... mariadb-extra.socket is a disabled or a static unit, not starting it. mariadb-extra.socket is a disabled or a static unit, not starting it. Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details. It seems to be harmless, the upgrade still continues and eventually succeeds. Alternatively, innodb_compression_algorithm setting can be (at least temporarily) disabled before the upgrade. MariaBackup MariaBackup in general is only expected to work with a matching version of the server. It is particularly important in this case, because old versions of mariabackup won't be able to deal with the provider libraries. With the latest commits in the feature tree I didn't come up with specifiic faulty scenarios involving MariaBackup, but I expect them to be possible, particularly involving runtime uninstallation of providers. Replication With upgrade through replication, when the replica is upgraded first, it is important not to enable replication until the compression libraries are sorted out, so that binlog events aren't attempted on currently inaccessible tables. Other than that, since the compression algorithm isn't passed over through the binary log, considerations are the same as for the general server – only tables which already exist in the replica are important. Galera Rolling upgrade or adding a 10.7 node to an older-version-based cluster can be tricky with physical methods of SST. Judging by the new node alone, it is impossible to say in advance which algorithms may be needed, so it is likely that not all necessary providers will be installed in advance. When the SST is performed, e.g. via MariaBackup, if the libraries are missing, it will throw some errors, but SST will still succeed , which means that the node will join the cluster and will start processing queries, including the queries against the tables which it cannot yet handle. At best (if the new nodes are minority) it will make the node leave the cluster; if too many nodes are upgraded/added at once, it can probably cause the entire cluster failure. Maybe Galera experts can offer advice on how it should be handled best on the user side.
            mg MG added a comment - - edited

            serg, While this bug did mention the worrisome ZSTDNotFinal stub name used in RocksDB, the zstd Github says that it is "used continuously to compress large amounts of data in multiple formats and use cases. Zstandard is considered safe for production environments."

            I was hoping we would see zstd as an additional MariaDB compression library via this now closed bug. Would it make sense to have this feature request in a new MDEV?

            mg MG added a comment - - edited serg , While this bug did mention the worrisome ZSTDNotFinal stub name used in RocksDB, the zstd Github says that it is "used continuously to compress large amounts of data in multiple formats and use cases. Zstandard is considered safe for production environments." I was hoping we would see zstd as an additional MariaDB compression library via this now closed bug. Would it make sense to have this feature request in a new MDEV?

            mg, this issue didn't touch RocksDB at all, only InnoDB and Mroonga. Because currently it can only remove dependencies from server plugins, not from external utility executables. And RocksDB has two of them.

            But anyway, what do you mean "see zstd as an additional MariaDB compression library"? See where, in RocksDB? In InnoDB? In the protocol?

            serg Sergei Golubchik added a comment - mg , this issue didn't touch RocksDB at all, only InnoDB and Mroonga. Because currently it can only remove dependencies from server plugins, not from external utility executables. And RocksDB has two of them. But anyway, what do you mean "see zstd as an additional MariaDB compression library"? See where, in RocksDB? In InnoDB? In the protocol?
            mg MG added a comment - - edited

            serg, I did mean for InnoDB and meant to use the phrasing from the blog post accompanying this feature in release notes.

            mg MG added a comment - - edited serg , I did mean for InnoDB and meant to use the phrasing from the blog post accompanying this feature in release notes.

            People

              serg Sergei Golubchik
              serg Sergei Golubchik
              Votes:
              5 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.