[MDEV-22762] Investigate if using Squash is viable for unifying compression APIs Created: 2020-06-01  Updated: 2021-04-05  Resolved: 2021-04-05

Status: Closed
Project: MariaDB Server
Component/s: Compiling, Packaging
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Robert Bindar Assignee: Robert Bindar
Resolution: Fixed Votes: 0
Labels: None

Attachments: File Storage Engines.md    
Issue Links:
Blocks
blocks MDEV-12933 sort out the compression library chaos Closed

 Description   

Whilst the balance is leaning towards implementing the project using MariaDB services, the Squash approach is still on the table. We need a bit more information before we can weigh accurately how much work and maintenance may go into the services approach and how much convincing work we might need to do if we pick Squash.

In any of these cases, this research will turn out to be super helpful for the coding part of the project, so here are the items we need to follow:

1. Get the list of all Storage Engines that use compression libraries
2. Get the list of all compression libraries used by storage engines supported by MariaDB
3. Investigate if Squash supports all the APIs used by storage engines. Whilst we are still in the search mode within these code places, get the functions names of the compression libraries where Storage Engines make API calls and compile a list with them.



 Comments   
Comment by Robert Bindar [ 2020-06-01 ]

Hi KartikSoneji! This is the first task you'll tackle during GSoC's coding period, for some reason I can't assign it to you, but tagging you here for notification purposes ) Good luck and let us know at any point if you have questions. The results from research and your thoughts on this should be attached here so that we have one place for all the info when we start coding the project

Comment by Kartik Soneji [ 2020-06-09 ]

I have attached the report, but Jira does not support previewing MarkDown files.
You can view a rendered preview here.

Comment by Kartik Soneji [ 2020-06-09 ]

Storage Engines

  • Archive
  • Blackhole
  • Cassandra
  • Connect
  • Csv
  • Federated
  • Federatedx
  • Heap
  • Innobase
  • Maria
  • Mroonga
  • Myisam
  • Myisammrg
  • Oqgraph
  • Perfschema
  • RocksDB
  • Sequence
  • Sphinx
  • Spider
  • Tokudb (depricated)
Storage Engine Links Compression Libraries
Archive �� AZlib
Connect   BZip2, Zlib
Innobase �� BZip2, LZ4, LZMA, LZO, Snappy, Zlib
Aria   Zlib
Mroonga �� LZ4, Zlib, ZStandard
RocksDB �� BZip2, LZ4, LZ4HC, Snappy, Xpress, Zlib, ZStandard
Tokudb (depricated)   FastLZ, LZ, LZF, LZMA, LZMA2, LZO, QuickLZ, Snappy, Zlib
\ AZlib BZip2 LZ4 LZ4HC LZMA LZO Snappy Xpress Zlib ZStandard
Archive Y                  
Connect   Y             Y  
Innobase   Y Y   Y Y Y   Y  
Aria                 Y  
Mroonga     Y           Y Y
RocksDB   Y Y Y     Y Y Y Y
20 1 3 2 1 1 1 2 1 5 2
Library `apt install` `gcc --print-file-name=` `/usr/include`
AZlib      
BZip2 libbz2-dev libbz2.so bzlib.h
LZ4 liblz4-dev liblz4.so lz4.h, lz4frame.h, lz4frame_static.h
LZ4HC liblz4-dev liblz4.so lz4hc.h
LZMA liblzma-dev liblzma.so lzma.h, lzma/*
LZO liblzo-dev liblzo2.so lzo/*
Snappy libsnappy-dev libsnappy.so snappy-c.h, snappy-sinksource.h, snappy-stubs-public.h, snappy.h
Xpress      
Zlib libz-dev libz.so zlib.h
ZStandard libzstd-dev libzstd.so zstd.h, zstd_errors.h

Install all libraries:
`sudo apt install -y libbz2-dev liblz4-dev liblzma-dev liblzo-dev libsnappy-dev libz-dev libzstd-dev`.

Header files in use

bzlib.h
lz4.h
lz4hc.h
lzma.h
lzo/lzo1x.h
snappy-c.h
snappy.h
zlib.h
zstd.h

Xpress is Windows only:
https://github.com/facebook/rocksdb/blob/master/port/xpress.h

#pragma once
 
// Xpress on Windows is implemeted using Win API
#if defined(ROCKSDB_PLATFORM_POSIX)
    #error "Xpress compression not implemented"
#elif defined(OS_WIN)
    #include "port/win/xpress_win.h"
#endif

And has 2 functions:
https://github.com/facebook/rocksdb/blob/master/port/win/xpress_win.h

#pragma once
#include <string>
#include "rocksdb/rocksdb_namespace.h"
 
namespace ROCKSDB_NAMESPACE {
    namespace port {
        namespace xpress {
            bool Compress(const char* input, size_t length, std::string* output);
            char* Decompress(const char* input_data, size_t input_length, int* decompress_size);
        }
    }
}  // namespace ROCKSDB_NAMESPACE

Functions

Archive

AZlib

crc32
deflate
deflateEnd
gzclose
gzerror
gzflush
gzread
gzrewind
gzseek
gztell
gzwrite
inflate
inflateEnd
inflateReset
uncompress

Connect

BZip2

BZ2_bzCompress
BZ2_bzCompressEnd
BZ2_bzCompressInit
BZ2_bzDecompress
BZ2_bzDecompressEnd
BZ2_bzDecompressInit

Zlib

crc32
deflate
deflateEnd
get_crc_table
gzclose
gzeof
gzerror
gzflush
gzgets
gzopen
gzputs
gzread
gzrewind
gzseek
gztell
gzwrite
inflate
inflateEnd

Innobase

BZip2

BZ2_bzBuffToBuffCompress
BZ2_bzBuffToBuffDecompress

LZ4

LZ4_compress
LZ4_compress_default
LZ4_compress_limitedOutput
LZ4_decompress_safe

LZMA

lzma_easy_buffer_encode
lzma_stream_buffer_decode

LZO

lzo1x_1_15_compress
lzo1x_1_compress
lzo1x_decompress
lzo1x_decompress_safe

Snappy

snappy_compress
snappy_max_compressed_length
snappy_uncompress

Zlib

adler32
compress2
compressBound
deflate
deflateEnd
deflateReset
inflate
inflateEnd
inflateInit
inflateInit2
uncompress

Aria

Zlib

compress
crc32
uncompress

Mroonga

LZ4

LZ4_compress
LZ4_compressBound
LZ4_compress_default
LZ4_decompress_safe

Zlib

compress
compressBound
deflate
deflateBound
deflateEnd
deflateInit2
inflate
inflateEnd
inflateInit2
inflateReset
uncompress

ZStandard

ZSTD_compress
ZSTD_compressBound
ZSTD_decompress
ZSTD_getErrorName
ZSTD_isError

RocksDB

BZip2

BZ2_bzCompress
BZ2_bzCompressEnd
BZ2_bzCompressInit
BZ2_bzDecompress
BZ2_bzDecompressEnd
BZ2_bzDecompressInit

LZ4

LZ4_compressBound
LZ4_compress_fast_continue
LZ4_compress_limitedOutput
LZ4_compress_limitedOutput_continue
LZ4_createStream
LZ4_createStreamDecode
LZ4_decompress_safe
LZ4_decompress_safe_continue
LZ4_freeStream
LZ4_freeStreamDecode
LZ4_loadDict

LZ4HC

LZ4_compressHC2_limitedOutput
LZ4_compressHC_limitedOutput
LZ4_compressHC_limitedOutput_continue
LZ4_compress_HC_continue
LZ4_createStreamHC
LZ4_freeStreamHC
LZ4_loadDictHC
LZ4_resetStreamHC

Snappy

snappy::GetUncompressedLength
snappy::MaxCompressedLength
snappy::RawCompress

Xpress* (Windows only)

Compress
Decompress

Zlib

crc32
deflate
deflateEnd
deflateSetDictionary
inflate
inflateEnd
inflateSetDictionary

ZStandard

ZDICT_isError
ZDICT_trainFromBuffer
ZSTD_compress
ZSTD_compressBound
ZSTD_compress_usingCDict
ZSTD_compress_usingDict
ZSTD_createCCtx
ZSTD_createCCtx_advanced
ZSTD_createCDict
ZSTD_createDCtx
ZSTD_createDCtx_advanced
ZSTD_createDDict_byReference
ZSTD_decompress
ZSTD_decompress_usingDDict
ZSTD_decompress_usingDict
ZSTD_freeCCtx
ZSTD_freeCDict
ZSTD_freeDCtx
ZSTD_freeDDict
ZSTD_sizeof_DDict
ZSTD_versionNumber

Comment by Sergei Golubchik [ 2021-03-24 ]

robertbindar, is this one done? As far as I can see using of squash has been "investigated", right?

Comment by Robert Bindar [ 2021-03-24 ]

serg yes, it's done, but I kept it open for reference and for finding it easier.

Comment by Sergei Golubchik [ 2021-03-24 ]

then, please, let's close it. you can find it either way, closed or open

Generated at Thu Feb 08 09:17:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.