Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-987

LZ4 compression for on-disk columnar data

Details

    • New Feature
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 6.1.1
    • 6.1.1
    • writeengine
    • None

    Description

      According to different comparisions, e.g. here LZ4 might have:

      • better compression rate
      • better decompression speed
      • almost the same compression speed
        compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation.

      The chunk size is an important parameter used to define how much worth of data is compressed in one go to store in the compressed columnar file. As of now it is set to 4MB that might be less apropriate for LZ4 so one should compare different compressed chunk size values.
      In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. There will be no separate knob to control compression used by CompressedInetStreamSocket.

      If LZ4 performs as well as expected(faster decompression, better compression, compression speed parity) it will become our default.

      Attachments

        Activity

          LinuxJedi Andrew Hutchings (Inactive) created issue -
          toddstoffel Todd Stoffel (Inactive) made changes -
          Field Original Value New Value
          Assignee Todd Stoffel [ toddstoffel ]
          toddstoffel Todd Stoffel (Inactive) made changes -
          Fix Version/s N/A [ 22302 ]
          toddstoffel Todd Stoffel (Inactive) made changes -
          Rank Ranked higher
          toddstoffel Todd Stoffel (Inactive) made changes -
          Fix Version/s 1.7 [ 23713 ]
          Fix Version/s N/A [ 22302 ]
          toddstoffel Todd Stoffel (Inactive) made changes -
          Fix Version/s 1.5 [ 22800 ]
          Fix Version/s 1.7 [ 23713 ]
          toddstoffel Todd Stoffel (Inactive) made changes -
          Rank Ranked higher
          toddstoffel Todd Stoffel (Inactive) made changes -
          Epic Link MCOL-3351 [ 76533 ]
          toddstoffel Todd Stoffel (Inactive) made changes -
          Rank Ranked higher
          toddstoffel Todd Stoffel (Inactive) made changes -
          Fix Version/s 1.6 [ 23712 ]
          Fix Version/s 1.5 [ 22800 ]
          toddstoffel Todd Stoffel (Inactive) made changes -
          Fix Version/s Icebox [ 22302 ]
          Fix Version/s 1.6 [ 23712 ]
          toddstoffel Todd Stoffel (Inactive) made changes -
          Rank Ranked higher
          toddstoffel Todd Stoffel (Inactive) made changes -
          Rank Ranked higher
          toddstoffel Todd Stoffel (Inactive) made changes -
          Rank Ranked higher
          toddstoffel Todd Stoffel (Inactive) made changes -
          Rank Ranked higher
          toddstoffel Todd Stoffel (Inactive) made changes -
          Rank Ranked higher
          toddstoffel Todd Stoffel (Inactive) made changes -
          Rank Ranked lower
          toddstoffel Todd Stoffel (Inactive) made changes -
          Rank Ranked lower
          drrtuy Roman made changes -
          Component/s writeengine [ 13510 ]
          drrtuy Roman made changes -
          Affects Version/s 6.1.1 [ 25600 ]
          drrtuy Roman made changes -
          Fix Version/s 6.1.1 [ 25600 ]
          Fix Version/s Icebox [ 22302 ]
          drrtuy Roman made changes -
          Assignee Todd Stoffel [ toddstoffel ] Roman [ drrtuy ]
          drrtuy Roman made changes -
          Summary New additional compression algorithms for ColumnStore LZ4 compression for on-disk columnar data
          drrtuy Roman made changes -
          Description We should look into adding additional compression algorithms to ColumnStore for additional use cases:

          1. LZ4 - should be faster than Snappy with a better compression ratio
          2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used.

          If LZ4 performs as well as expected it could possibly become our default.
          According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
          * better compression rate
          * better decompression speed
          * almost the same compression speed
          compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedISS(TCP socket) implementation.

          In the end MCS must have another compression method that is controlled via https://jira.mariadb.org/browse/MCOL-987https://jira.mariadb.org/browse/MCOL-987

          2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used.

          If LZ4 performs as well as expected it could possibly become our default.
          drrtuy Roman made changes -
          Description According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
          * better compression rate
          * better decompression speed
          * almost the same compression speed
          compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedISS(TCP socket) implementation.

          In the end MCS must have another compression method that is controlled via https://jira.mariadb.org/browse/MCOL-987https://jira.mariadb.org/browse/MCOL-987

          2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used.

          If LZ4 performs as well as expected it could possibly become our default.
          According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
          * better compression rate
          * better decompression speed
          * almost the same compression speed
          compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedISS(TCP socket) implementation.

          In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type.

          2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used.

          If LZ4 performs as well as expected it could possibly become our default.
          drrtuy Roman made changes -
          Description According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
          * better compression rate
          * better decompression speed
          * almost the same compression speed
          compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedISS(TCP socket) implementation.

          In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type.

          2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used.

          If LZ4 performs as well as expected it could possibly become our default.
          According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
          * better compression rate
          * better decompression speed
          * almost the same compression speed
          compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation.

          In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type.

          2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used.

          If LZ4 performs as well as expected it could possibly become our default.
          drrtuy Roman made changes -
          Description According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
          * better compression rate
          * better decompression speed
          * almost the same compression speed
          compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation.

          In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type.

          2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used.

          If LZ4 performs as well as expected it could possibly become our default.
          According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
          * better compression rate
          * better decompression speed
          * almost the same compression speed
          compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation.

          In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type.
          There will be no separate knob to control compression used by CompressedInetStreamSocket.

          2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used.

          If LZ4 performs as well as expected it could possibly become our default.
          drrtuy Roman made changes -
          Description According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
          * better compression rate
          * better decompression speed
          * almost the same compression speed
          compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation.

          In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type.
          There will be no separate knob to control compression used by CompressedInetStreamSocket.

          2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used.

          If LZ4 performs as well as expected it could possibly become our default.
          According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
          * better compression rate
          * better decompression speed
          * almost the same compression speed
          compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation.

          There must be a comparison
          In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. There will be no separate knob to control compression used by CompressedInetStreamSocket.

          2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used.

          If LZ4 performs as well as expected it could possibly become our default.
          drrtuy Roman made changes -
          Description According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
          * better compression rate
          * better decompression speed
          * almost the same compression speed
          compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation.

          There must be a comparison
          In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. There will be no separate knob to control compression used by CompressedInetStreamSocket.

          2. Zstd (not to be confused with Zlib) - slower than Snappy (but still faster than most disks), much better compression ratio. Its dictionary mode should probably be used.

          If LZ4 performs as well as expected it could possibly become our default.
          According to different comparisions, e.g. [here|https://doordash.engineering/2019/01/02/speeding-up-redis-with-compression/] LZ4 might have:
          * better compression rate
          * better decompression speed
          * almost the same compression speed
          compared with Snappy. MCS uses Snappy by default for both columnar files and CompressedInetStreamSocket(TCP socket) implementation.

          The chunk size is an important parameter used to define how much worth of data is compressed in one go to store in the compressed columnar file. As of now it is set to 4MB that might be less apropriate for LZ4 so one should compare different compressed chunk size values.
          In the end MCS must have another compression method that is controlled via the session variable columnstore_compression_type. There will be no separate knob to control compression used by CompressedInetStreamSocket.

          If LZ4 performs as well as expected(faster decompression, better compression, compression speed parity) it will become our default.
          gdorman Gregory Dorman (Inactive) made changes -
          Assignee Roman [ drrtuy ] Denis Khalikov [ JIRAUSER48434 ]
          gdorman Gregory Dorman (Inactive) made changes -
          Sprint 2021-5 [ 504 ]
          gdorman Gregory Dorman (Inactive) made changes -
          Rank Ranked higher
          toddstoffel Todd Stoffel (Inactive) made changes -
          Rank Ranked lower
          denis0x0D Denis Khalikov (Inactive) made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
          gdorman Gregory Dorman (Inactive) made changes -
          Sprint 2021-5 [ 504 ] 2021-5, 2021-6 [ 504, 509 ]
          denis0x0D Denis Khalikov (Inactive) made changes -
          Status In Progress [ 3 ] Stalled [ 10000 ]
          denis0x0D Denis Khalikov (Inactive) made changes -
          Status Stalled [ 10000 ] In Progress [ 3 ]
          denis0x0D Denis Khalikov (Inactive) made changes -
          Status In Progress [ 3 ] Stalled [ 10000 ]
          denis0x0D Denis Khalikov (Inactive) made changes -
          Assignee Denis Khalikov [ JIRAUSER48434 ] Roman [ drrtuy ]
          gdorman Gregory Dorman (Inactive) made changes -
          Status Stalled [ 10000 ] In Progress [ 3 ]
          gdorman Gregory Dorman (Inactive) made changes -
          Assignee Roman [ drrtuy ] Gregory Dorman [ gdorman ]
          gdorman Gregory Dorman (Inactive) made changes -
          Assignee Gregory Dorman [ gdorman ] Roman [ drrtuy ]
          Status In Progress [ 3 ] In Review [ 10002 ]
          gdorman Gregory Dorman (Inactive) made changes -
          Sprint 2021-5, 2021-6 [ 504, 509 ] 2021-5, 2021-6, 2021-7 [ 504, 509, 514 ]
          gdorman Gregory Dorman (Inactive) made changes -
          Sprint 2021-5, 2021-6, 2021-7 [ 504, 509, 514 ] 2021-5, 2021-6, 2021-7, 2021-8 [ 504, 509, 514, 521 ]
          gdorman Gregory Dorman (Inactive) made changes -
          Sprint 2021-5, 2021-6, 2021-7, 2021-8 [ 504, 509, 514, 521 ] 2021-5, 2021-6, 2021-7, 2021-8, 2021-9 [ 504, 509, 514, 521, 541 ]
          drrtuy Roman made changes -
          Status In Review [ 10002 ] In Testing [ 10301 ]
          drrtuy Roman made changes -
          Assignee Roman [ drrtuy ] Daniel Lee [ dleeyh ]
          dleeyh Daniel Lee (Inactive) made changes -
          Resolution Fixed [ 1 ]
          Status In Testing [ 10301 ] Closed [ 6 ]

          People

            dleeyh Daniel Lee (Inactive)
            LinuxJedi Andrew Hutchings (Inactive)
            Votes:
            4 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.