Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22839

ROW_FORMAT=COMPRESSED vs PAGE_COMPRESSION=1 - size comparison

Details

    • Task
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.6
    • Server
    • None

    Description

      I have the following tables:

      Table1:
      row_format=DYNAMIC
      Size = 5.87 GB
      Index = 3.88 GB
      Free = 4 MB
      Avg Row Length = 197 B
      Rows = 38686411
      (Actual file size: 15359541248 bytes)

      .
      Table2:
      row_format=COMPRESSED key_block_size=8
      Size = 4.26 GB
      Index = 2.72 GB
      Free = 3.5 MB
      Avg Row Length = 86 B
      Rows = 83063985
      (Actual file size: 12448694272 bytes)

      .
      Both tables have some meta data (ints/enums) and a main text field (100% variable length, with user messages/comments).

      .
      As experiment, for Table1, I made the text column COMPRESSED (using pt-online-schema-change), and converted to:
      row_format=DYNAMIC `PAGE_COMPRESSED`=1 `PAGE_COMPRESSION_LEVEL`=6

      It became like this:
      Size = 6.23 GB
      Index = 6.61 GB
      Free = 5 MB
      Avg Row Length = 173 B
      (Actual file size: 13988003840 bytes)

      .
      For Table2, using normal ALTER TABLE, I converted to:
      row_format=DYNAMIC `PAGE_COMPRESSED`=1 `PAGE_COMPRESSION_LEVEL`=9

      It became like this:
      Size = 12.59 GB
      Index = 7.06 GB
      Free = 4 MB
      Avg Row Length = 159 B
      (Actual file size: 21399339008 bytes)

      .
      These are values from information_schema.

      I understand that the actual file sizes don't matter so much (they can be sparse files),
      but looking at the Size/Index, the PAGE_COMPRESSED seems to be causing the tables to be bigger, rather than smaller.

      Table1 – From no compression at all, to be compressed, the data increased 1GB, and indexes doubled.
      (I wanted to compress the text field only, but the innodb default configs made pt-online-schema-change also put PAGE_COMPRESSED)

      Table2 – From ROW_FORMAT=COMPRESSED to maximum level of PAGE_COMPRESSION, both data and index increased considerably.

      Does it make sense that PAGE_COMPRESSION, even with level 9, is compressing less than ROW_FORMAT=COMPRESSED?

      Attachments

        Issue Links

          Activity

            Alex/AT Alex/AT added a comment -

            I know LZMA is already included but this option is not in RPM builds and requires customly building MariaDB i.e. for LZMA now which is less than convenient.

            Alex/AT Alex/AT added a comment - I know LZMA is already included but this option is not in RPM builds and requires customly building MariaDB i.e. for LZMA now which is less than convenient.
            nunop Nuno added a comment -

            Nuno, can you repeat the experiment on MariaDB 10.5 or 10.6 and get the data for the actual file size? Would using COMPRESSED columns (MDEV-11371) together with either page_compressed or thinly provisioned storage beat the ROW_FORMAT=COMPRESSED size?

            I can no longer test, as I have already migrated the database to a bigger storage (for significantly more monthly expense), one of the reasons being that MariaDB no longer supports ROW_FORMAT=COMPRESSED, so we no longer use that row format.

            I did test using COMPRESSED columns, at the time I was testing/rehearsing the migration, but I gave up on the idea because of the error reported below, and no one replied to me at the time:
            https://jira.mariadb.org/browse/MDEV-24797

            Thank you.

            nunop Nuno added a comment - Nuno, can you repeat the experiment on MariaDB 10.5 or 10.6 and get the data for the actual file size? Would using COMPRESSED columns ( MDEV-11371 ) together with either page_compressed or thinly provisioned storage beat the ROW_FORMAT=COMPRESSED size? I can no longer test, as I have already migrated the database to a bigger storage (for significantly more monthly expense), one of the reasons being that MariaDB no longer supports ROW_FORMAT=COMPRESSED, so we no longer use that row format. I did test using COMPRESSED columns, at the time I was testing/rehearsing the migration, but I gave up on the idea because of the error reported below, and no one replied to me at the time: https://jira.mariadb.org/browse/MDEV-24797 Thank you.

            nunop, thank you. MDEV-11371 and MDEV-24797 are outside the storage engine layer, so I cannot help with that myself.

            MariaDB still supports ROW_FORMAT=COMPRESSED. MDEV-22367 was filed to remove that support, but no decision has been made yet. I think that the minimum prerequisite for the removal would be that a viable replacement has been demonstrated to exist, by running extensive performance tests. I do not remember any for ROW_FORMAT=COMPRESSED, other than some Oracle internal benchmarks that were around 2007 and were not at all flattering.

            Alex/AT, I agree that it would be useful to cover also LZMA and ZSTD in MDEV-11068. ZSTD would have to be implemented first. It should not be hard, and code contributions are always welcome.

            marko Marko Mäkelä added a comment - nunop , thank you. MDEV-11371 and MDEV-24797 are outside the storage engine layer, so I cannot help with that myself. MariaDB still supports ROW_FORMAT=COMPRESSED . MDEV-22367 was filed to remove that support, but no decision has been made yet. I think that the minimum prerequisite for the removal would be that a viable replacement has been demonstrated to exist, by running extensive performance tests. I do not remember any for ROW_FORMAT=COMPRESSED , other than some Oracle internal benchmarks that were around 2007 and were not at all flattering. Alex/AT , I agree that it would be useful to cover also LZMA and ZSTD in MDEV-11068 . ZSTD would have to be implemented first. It should not be hard, and code contributions are always welcome.

            https://lwn.net/Articles/864363/ mentions that Linux 5.15 might finally include fixes for race conditions around hole-punching, which is what page_compressed tables are using. I can imagine that when not using innodb_flush_method=O_DIRECT in both the server and any backup programs, that race condition could result in data corruption. The patchset author comments that even kernel crashes are possible.

            marko Marko Mäkelä added a comment - https://lwn.net/Articles/864363/ mentions that Linux 5.15 might finally include fixes for race conditions around hole-punching, which is what page_compressed tables are using. I can imagine that when not using innodb_flush_method=O_DIRECT in both the server and any backup programs, that race condition could result in data corruption. The patchset author comments that even kernel crashes are possible.

            As noted in MDEV-25734, file systems can behave differently with regard to operations on sparse files. (XFS would allocate more physical space than ext4fs in one case.) Therefore, when testing page_compressed tables (which rely on sparse files), the operating system kernel version and the file system type need to be documented.

            marko Marko Mäkelä added a comment - As noted in MDEV-25734 , file systems can behave differently with regard to operations on sparse files. (XFS would allocate more physical space than ext4fs in one case.) Therefore, when testing page_compressed tables (which rely on sparse files), the operating system kernel version and the file system type need to be documented.

            People

              marko Marko Mäkelä
              nunop Nuno
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.