Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22895

Implement server support for making compression library dependencies optional

Details

    Description

      Squash is off the table for now, there are two strong arguments against it:

      1. Using Squash requires changing code in all the storage engines using compression libraries, including 3rd party storage engines and this is problematic because it is very unlikely we will be able to propagate these changes upstream.
      2. In the unified API it provides, Squash does not support all the bits and parts our current storage engines use from compression libraries. Tweaking the storage engines code to get rid of API calls that Squash doesn't support is too complicated and amplifies the problem stated at point 1.

      Given the above, we should implement this with MariaDB services using an approach like the one below:

      1. Add new `--use-compression=snappy,bzip2` option in the server.
      Code place: sql/sys_vars.cc

      2. Choose one compression library and create a new mariadb service that contains dummy functions for all corresponding functions within the compression library (only functions that are used by the storage engines we support). Make sure the service is built correctly, test it by calling one of its functions from a dynamic plugin’s init function and see it printing something.
      For creating services, the instructions are in the libservices/HOWTO file

      3. Alter the service so that it works with static plugins as well (we need this for storage engines such as InnoDB). Quickly test that it works with a dummy print as well, from InnoDB for example.
      The `#ifdef MYSQL_DYNAMIC_PLUGIN` used within a service is the key to make static plugins have access to the service structure. See how the `debug_sync` service does it to understand how the changes in our service should look like.

      4. We need to have the ability to trick storage engines into calling our custom service API instead of the compression library code. Thus, we need to tweak the include paths so that storage engines using this library will include our own header, with our own functions, and this without actually changing any storage engine source code.
      After this step is done, we should easily see in a debugger that calls, from a storage engine to the compression library we’ve picked, go through this service.
      Code place: cmake/plugin.cmake

      5. The server should try to load the compression libraries based on the values passed within the `use-compression` option. If a library is loaded correctly, then the function pointers within the corresponding compression service should be set to point within that compression library.
      Once this will be done, we can safely move to writing some proper tests for this service and onto the next compression library
      Code place: To be debated when we get there, but a new file ‘sql/compression_libs.cc’ that contains 1 function (called from mysqld.cc) for dinamically binding compression functions within the service structure and all the dummy functions returning

      6. Tests should be written for each service that takes care of one compression library.
      As per Sergei G suggestion, one test `/mysql-test/suite/compression_libs/bzip2.test` might look something like:
      `set global innodb_compression_algorithm = ABC;
      create table innodb_compressed(c1 int, b char(20)) engine=innodb row_format=compressed key_block_size=8;
      `
      This test should be run with both `--use-compression=bzip2` flag and without

      Attachments

        Issue Links

          Activity

            KartikSoneji Kartik Soneji added a comment -

            We should add global status variables to the server so that the user can easily view which libraries are loaded.
            Additionally, we should integrate the status variables with the storage engines as well, so that the engines are aware of what libraries are loaded. This will prevent the engines from getting into invalid states.
            For example, with the current implementation of the lzma service, running

            set global innodb_compression_algorithm = "lzma";
            create table t(a int, b char(20)) engine=innodb PAGE_COMPRESSED=1;
            

            will not cause an error even if lzma is not loaded.
            This is because InnoDB has additional error checking, and it will silently write uncompressed data to disk if the compressed length is 0.

            KartikSoneji Kartik Soneji added a comment - We should add global status variables to the server so that the user can easily view which libraries are loaded. Additionally, we should integrate the status variables with the storage engines as well, so that the engines are aware of what libraries are loaded. This will prevent the engines from getting into invalid states. For example, with the current implementation of the lzma service, running set global innodb_compression_algorithm = "lzma" ; create table t(a int , b char (20)) engine=innodb PAGE_COMPRESSED=1; will not cause an error even if lzma is not loaded. This is because InnoDB has additional error checking, and it will silently write uncompressed data to disk if the compressed length is 0.
            KartikSoneji Kartik Soneji added a comment -

            Maybe it is better to change all to auto for the --use-compression= switch.
            auto will try to load all libraries, but will silently ignore the ones it cannot load.
            Specifying a library will force it to be loaded, and the server will exit with an error unless the library can be loaded.
            It also solves the issue of --use-compression=lzma, all being a valid configuration.

            --use-compression= Behavior
            <not specified> Defaults to auto
            "" Don't load any libraries
            auto Try to load all libraries, silently skip ones that cannot be loaded
            lzma Try to load only lzma, exit with an error if lzma cannot be loaded
            lzma, auto Try to load all libraries, exit with an error if lzma cannot be loaded
            KartikSoneji Kartik Soneji added a comment - Maybe it is better to change all to auto for the --use-compression= switch. auto will try to load all libraries, but will silently ignore the ones it cannot load. Specifying a library will force it to be loaded, and the server will exit with an error unless the library can be loaded. It also solves the issue of --use-compression=lzma, all being a valid configuration. --use-compression= Behavior <not specified> Defaults to auto "" Don't load any libraries auto Try to load all libraries, silently skip ones that cannot be loaded lzma Try to load only lzma , exit with an error if lzma cannot be loaded lzma, auto Try to load all libraries, exit with an error if lzma cannot be loaded

            I think that before this task can be meaningfully worked on, we need a resolution on MDEV-11068. Which compression algorithms are actually useful in practice? Do we really need more than zlib and ZSTD (MDEV-22310)?

            I would strongly advise against enabling support for any compression algorithms, unless we will commit to that support for the indefinite future. I think that we have an implied promise of seamless upgrades.

            If we enabled support of (say) innodb_compression_algorithm=bzip2 in some package that we distribute, some user could complain when we disabled that support later, after deciding that the memory and CPU overhead during compression and decompression is not worth the possible savings compared to innodb_compression_algorithm=zlib (which we can commit to always supporting, because zlib is a core dependency of the server).

            A minimum requirement for enabling new compression algorithms should be the development of a tool that can convert data files from one innodb_compression_algorithm to another in-place, to facilitate upgrades.

            marko Marko Mäkelä added a comment - I think that before this task can be meaningfully worked on, we need a resolution on MDEV-11068 . Which compression algorithms are actually useful in practice? Do we really need more than zlib and ZSTD ( MDEV-22310 )? I would strongly advise against enabling support for any compression algorithms, unless we will commit to that support for the indefinite future. I think that we have an implied promise of seamless upgrades. If we enabled support of (say) innodb_compression_algorithm=bzip2 in some package that we distribute, some user could complain when we disabled that support later, after deciding that the memory and CPU overhead during compression and decompression is not worth the possible savings compared to innodb_compression_algorithm=zlib (which we can commit to always supporting, because zlib is a core dependency of the server). A minimum requirement for enabling new compression algorithms should be the development of a tool that can convert data files from one innodb_compression_algorithm to another in-place, to facilitate upgrades.
            KartikSoneji Kartik Soneji added a comment -

            Which compression algorithms are actually useful in practice?

            Is there any telemetry/user poll data on this?

            KartikSoneji Kartik Soneji added a comment - Which compression algorithms are actually useful in practice ? Is there any telemetry/user poll data on this?

            part of MDEV-12933

            serg Sergei Golubchik added a comment - part of MDEV-12933

            People

              serg Sergei Golubchik
              robertbindar Robert Bindar
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.