Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14108

Log compression isn't working as expected

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Not a Bug
    • 10.2.9
    • N/A
    • Server
    • Debian Jessie, Fedora Core 26

    Description

      Hi Guys, I found this new great feature of mariadb (log compression), but unfortunately i cannot make it working. Basically on the servers we're writing terabytes of logs on SSD disks so that would be great if I can get it enabled.

      log_bin_compress_min_len=10
      log_bin_compress=1

      log_bin_compress_min_len=10
      log_bin_compress='ON'

      log_bin_compress_min_len=128
      log_bin_compress='ON'

      Nothing works as expected. I tried enabling by SET GLOBAL and also using config file
      https://mariadb.com/kb/en/library/compressing-events-to-reduce-size-of-the-binary-log/

      Still there's plaintext in the log and it doesn't seem that anything is getting compressed... i'm doing something wrong or this feature is no longer supported / broken in 10.2.9?

      Log file dump:
      https://www.screencast.com/t/08Ig7s023Jzg

      Thanks,
      Slawomir.

      Attachments

        Activity

          pslawek83,

          What makes you think that the log in the screenshot is not compressed? It seems compressed to me.
          Let's take the first event as an example, the UPDATE.
          The fact that you can see the statement in plain text is not an indication of the events being or not being compressed. Most likely you have Annotate_rows events turned on (they are ON by default in 10.2), that's what you see as the plain text. You can switch binlog_annotate_row_events=off if you don't want them.
          Further, we have accounts (probably the schema name) and ___targets_mapping... (the table name) – the table map.
          But after that, we only see garbage. If it was still clear text, we would have seen the old and new field values, including the easily recognizable long 'sssssss...', but it's not there. The next cleartext event is COMMIT, which is never compressed anyway.

          I didn't dig into other events, but I think it's the same for them.

          Further, if you are judging by the log size, please note that it's event compression, not log compression. Even if you are writing terabytes of logs, but each individual event is as small as the ones we see in the screenshot, then probably you'll have minimal effect, if any at all.

          For a simple experiment to see that the events are written in the special compressed format, use mysqlbinlog to read the binlog. It will show Update_compressed_rows: and such as event types.
          For an even simpler experiment to see that the compression actually affects the binlog size, try to write some very long and well-compressible text into a large text column and compare binlog size of such an event written with compression turned on vs compression turned off. You can switch compression on/off dynamically, you don't need to restart the server for that.

          elenst Elena Stepanova added a comment - pslawek83 , What makes you think that the log in the screenshot is not compressed? It seems compressed to me. Let's take the first event as an example, the UPDATE . The fact that you can see the statement in plain text is not an indication of the events being or not being compressed. Most likely you have Annotate_rows events turned on (they are ON by default in 10.2), that's what you see as the plain text. You can switch binlog_annotate_row_events=off if you don't want them. Further, we have accounts (probably the schema name) and ___targets_mapping... (the table name) – the table map. But after that, we only see garbage. If it was still clear text, we would have seen the old and new field values, including the easily recognizable long 'sssssss...', but it's not there. The next cleartext event is COMMIT, which is never compressed anyway. I didn't dig into other events, but I think it's the same for them. Further, if you are judging by the log size, please note that it's event compression, not log compression. Even if you are writing terabytes of logs, but each individual event is as small as the ones we see in the screenshot, then probably you'll have minimal effect, if any at all. For a simple experiment to see that the events are written in the special compressed format, use mysqlbinlog to read the binlog. It will show Update_compressed_rows: and such as event types. For an even simpler experiment to see that the compression actually affects the binlog size, try to write some very long and well-compressible text into a large text column and compare binlog size of such an event written with compression turned on vs compression turned off. You can switch compression on/off dynamically, you don't need to restart the server for that.

          Hi Elena, sorry for misunderstanding and thanks for the comment. Yes I thought that compression isn't working as i was seeing these raw queries and COMMITs in logs, and log size didn't get any smaller (also i got misled because i just switched from 10.1 to 10.2 mainly for this functionality). So I incorrectly assumed that if i enabled event compression i should see some difference in size... and also it was my assumption that query cleartext and commit would be compressed...

          So it was very strange to me that these logs (which i can compress from 1GB to 60-80MB using simple zip) are staying same size. Maybe you can later create some code to spawn a thread which will compress whole logfile after the server is done writing to it.

          I added some quick comment to your DOC page as i wasn't able to find anything else on this topic... so i think this can be closed as resolved and not a bug.

          Thanks,
          Slawomir.

          pslawek83 Slawomir Pryczek added a comment - Hi Elena, sorry for misunderstanding and thanks for the comment. Yes I thought that compression isn't working as i was seeing these raw queries and COMMITs in logs, and log size didn't get any smaller (also i got misled because i just switched from 10.1 to 10.2 mainly for this functionality). So I incorrectly assumed that if i enabled event compression i should see some difference in size... and also it was my assumption that query cleartext and commit would be compressed... So it was very strange to me that these logs (which i can compress from 1GB to 60-80MB using simple zip) are staying same size. Maybe you can later create some code to spawn a thread which will compress whole logfile after the server is done writing to it. I added some quick comment to your DOC page as i wasn't able to find anything else on this topic... so i think this can be closed as resolved and not a bug. Thanks, Slawomir.

          pslawek83, thanks for adding a note to the KB.

          So it was very strange to me that these logs (which i can compress from 1GB to 60-80MB using simple zip) are staying same size.

          It shouldn't really be strange to anyone who's using compression, that's exactly how it works, isn't it?
          For an extreme but easy to understand example, let's say that you have a file which is filled with 1m of the same letter 'a'. If you try to compress it, the effect will be great, easily 1000x or more.
          However, let's say that instead you have 10,000 files of 100 symbols 'a' in each file; so, the total size and contents is the same. Every 100-byte file will compress maybe 4-5 times, so the overall effect for all 10,000 files will also be only 4-5x, but not anywhere close to 1000x.
          Same here, it's compression of events only, each one separately, and not even all events but only of selected types. The KB already says this much: "Compression will have the most impact when events are of a non-negligible size, as each event is compressed individually."

          Maybe you can later create some code to spawn a thread which will compress whole logfile after the server is done writing to it.

          You can create a feature request for it, although I personally don't see such functionality on the database server side being particularly useful. The benefit of binlog event compression is that it happens right away, when the event is created, before it's written; and that the event is transferred in the compressed form. If you compress the whole logfile instead, you'll first need to write the whole file in the uncompressed form, and everything that happens on per-event basis will be also happening with uncompressed data.

          If you just want to compress binlogs after the fact for a more compact storage, the much easier and flexible solution would be to do it externally – then you can choose a compression tool which works best for you, group files in any way you need, etc.

          elenst Elena Stepanova added a comment - pslawek83 , thanks for adding a note to the KB. So it was very strange to me that these logs (which i can compress from 1GB to 60-80MB using simple zip) are staying same size. It shouldn't really be strange to anyone who's using compression, that's exactly how it works, isn't it? For an extreme but easy to understand example, let's say that you have a file which is filled with 1m of the same letter 'a'. If you try to compress it, the effect will be great, easily 1000x or more. However, let's say that instead you have 10,000 files of 100 symbols 'a' in each file; so, the total size and contents is the same. Every 100-byte file will compress maybe 4-5 times, so the overall effect for all 10,000 files will also be only 4-5x, but not anywhere close to 1000x. Same here, it's compression of events only, each one separately, and not even all events but only of selected types. The KB already says this much: "Compression will have the most impact when events are of a non-negligible size, as each event is compressed individually." Maybe you can later create some code to spawn a thread which will compress whole logfile after the server is done writing to it. You can create a feature request for it, although I personally don't see such functionality on the database server side being particularly useful. The benefit of binlog event compression is that it happens right away, when the event is created, before it's written; and that the event is transferred in the compressed form. If you compress the whole logfile instead, you'll first need to write the whole file in the uncompressed form, and everything that happens on per-event basis will be also happening with uncompressed data. If you just want to compress binlogs after the fact for a more compact storage, the much easier and flexible solution would be to do it externally – then you can choose a compression tool which works best for you, group files in any way you need, etc.

          Sure no problem, i'll be trying to add some comments when i find something worth mentioning.

          Yes the surprising part was not that it isn't compressed as effectively as "whole file", just that that there was no gain at all. I understand that it can't be compressed as effectively as whole file when you have dictionary local to event... probably there'll be x1.5-x2 gain when we'll be able to turn off Annotate, after db restore is done... so that's very nice too (it's dynamic, just new setting is applied after re-connect).

          For the second part, yes we're having probably not so common use case. Lots of live stats on tokudb, so db size is relatively small only logs are growing very fast because we're re-writing same keys over and over. For thirdparty apps as we're doing offline backups on replicas, so logs need to still be accessible by the master as these take a lot of time... so probably we'll explore some other options, like making these backups faster as that should be very easy to do...

          pslawek83 Slawomir Pryczek added a comment - Sure no problem, i'll be trying to add some comments when i find something worth mentioning. Yes the surprising part was not that it isn't compressed as effectively as "whole file", just that that there was no gain at all. I understand that it can't be compressed as effectively as whole file when you have dictionary local to event... probably there'll be x1.5-x2 gain when we'll be able to turn off Annotate, after db restore is done... so that's very nice too (it's dynamic, just new setting is applied after re-connect). For the second part, yes we're having probably not so common use case. Lots of live stats on tokudb, so db size is relatively small only logs are growing very fast because we're re-writing same keys over and over. For thirdparty apps as we're doing offline backups on replicas, so logs need to still be accessible by the master as these take a lot of time... so probably we'll explore some other options, like making these backups faster as that should be very easy to do...

          People

            Unassigned Unassigned
            pslawek83 Slawomir Pryczek
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.