Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17600

Mariabackup double compresses if qpress files are present

Details

    • Bug
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • 10.2.18
    • 10.2(EOL)
    • Backup
    • None
    • ubuntu 16.04

    Description

      Due to some reason uncompressing a previously made compressed backup the qpress files (.qp extension) were still left in the data directory. For the working of MariaDB this doesn't immediately do harm as these files aren't used by the database itself. So the data directory contains files like this:
      {{rw-rr- 1 mysql root 16K Nov 2 12:36 aria_log.00000001
      rw-r---- 1 mysql root 394 Nov 2 12:36 aria_log.00000001.qp
      rw-rr- 1 mysql root 52 Nov 2 12:36 aria_log_control
      rw-r---- 1 mysql root 129 Nov 2 12:36 aria_log_control.qp}}

      If mariabackup creates a backup now, it will simply include all MariaDB specific files with the .qp extension and include these in the backup stream. Once this backup is extracted from the xbstream the (temporary) directory looks like this:

      {{rw-rr- 1 mysql root 16K Nov 2 12:36 aria_log.00000001.qp
      rw-r---- 1 mysql root 394 Nov 2 12:36 aria_log.00000001.qp.qp
      rw-rr- 1 mysql root 52 Nov 2 12:36 aria_log_control.qp
      rw-r---- 1 mysql root 129 Nov 2 12:36 aria_log_control.qp.qp}}

      Once mariabackup would get into extracting these files again, it might actually uncompress these files in the wrong order: for example first uncompress the aria_log.00000001.qp.qp and then the aria_log.00000001.qp, resulting in an older version of the aria_log file being extracted.

      Regardless of the the qpress files still left in the data directory, I personally would have expected mariabackup to ignore files with the .qp extension. Xtrabackup, upon which mariabackup has been based upon, does ignore the .qp extension and only backup files that actually are used. I think this issue is due to the addition of MariaDB specific files to xtrabackup to make it MariaDB compatible.

      My bug report is not about how these files got left there in the first place and why they had not been removed, but rather about the re-compression and inclusion of compressed files that actually should have been ignored in the first place.

      To reproduce the issue, create a backup with compression enabled:

      /usr/bin/mariabackup --defaults-file=/etc/mysql/my.cnf --backup --galera-info --parallel 1 --compress=quicklz --stream=xbstream --no-timestamp > * /some/path/to/backup.xbstream

      Then extract the backup:

      cat /some/path/to/backup.xbstream | mbstream -x -C /var/lib/mysql_new

      sudo /usr/bin/mariabackup --decompress --target-dir /var/lib/mysql_new

      Attachments

        Activity

          wlad Vladislav Vaintroub added a comment - - edited

          I can elaborate on that. The individual file compression does not fit well into the architecture. There are only 2 possible backup types I can imagine

          • to a directory. Compression and encryption are possible, if underlying filesystem support that, which atm is BTRFS, ZFS or NTFS
          • to a stdout . Since every possible encryption and compression library works with streams, encryption, compression, and other things (stream to AWS S3) are possible. Please take some time to evaluate the examples https://mariadb.com/kb/en/library/using-encryption-and-compression-tools-with-mariabackup/ . You can actually use a qpress, as 3rd party compression tool , if you want to It will compress the full stream, not individual files.

          There is no need to reinvent a wheel and provide builtin support for functionality that is freely available and is actively developed by someone else. There are 3rd party multithreaded compressors, there are multithreaded encryptors, so that backup can actually concentrate on its core competencies, which is copying files

          wlad Vladislav Vaintroub added a comment - - edited I can elaborate on that. The individual file compression does not fit well into the architecture. There are only 2 possible backup types I can imagine to a directory. Compression and encryption are possible, if underlying filesystem support that, which atm is BTRFS, ZFS or NTFS to a stdout . Since every possible encryption and compression library works with streams, encryption, compression, and other things (stream to AWS S3) are possible. Please take some time to evaluate the examples https://mariadb.com/kb/en/library/using-encryption-and-compression-tools-with-mariabackup/ . You can actually use a qpress, as 3rd party compression tool , if you want to It will compress the full stream, not individual files. There is no need to reinvent a wheel and provide builtin support for functionality that is freely available and is actively developed by someone else. There are 3rd party multithreaded compressors, there are multithreaded encryptors, so that backup can actually concentrate on its core competencies, which is copying files

          Hi Vladislav,

          Thanks for looking into this.
          The reason why we use the built in compression is because we found it to be the best fitting solution for our situation. Our infrastructure has two cross data center Galera clusters. The network bandwidth between two of the locations is (at this moment) limited to a maximum of 60MB/s, so when we suffer from a MariaDB Galera cluster failure requiring a full SST between the two locations it can take hours to get the first copy through. This means our mean time to recovery is quite long. To lower this we benchmarked various compression methods to see which one would offer us the highest throughput to get the data quicker to the other side. Surprisingly that wasn't the one with the highest compression ratio, but actually the one built in into xtrabackup/mariabackup.

          To be honest, I didn't look up the options on the MariaDB website but in the binary it's not marked as deprecated. Can you elaborate a bit more why MariaDB decided to deprecate the compress option, while Percona did not deprecate it yet?
          Thanks!

          artvidaxl Art van Scheppingen added a comment - Hi Vladislav, Thanks for looking into this. The reason why we use the built in compression is because we found it to be the best fitting solution for our situation. Our infrastructure has two cross data center Galera clusters. The network bandwidth between two of the locations is (at this moment) limited to a maximum of 60MB/s, so when we suffer from a MariaDB Galera cluster failure requiring a full SST between the two locations it can take hours to get the first copy through. This means our mean time to recovery is quite long. To lower this we benchmarked various compression methods to see which one would offer us the highest throughput to get the data quicker to the other side. Surprisingly that wasn't the one with the highest compression ratio, but actually the one built in into xtrabackup/mariabackup. To be honest, I didn't look up the options on the MariaDB website but in the binary it's not marked as deprecated. Can you elaborate a bit more why MariaDB decided to deprecate the compress option, while Percona did not deprecate it yet? Thanks!

          Out of curiosity, why do you use compress, at all?

          We do not encourage to use --compress option anymore. It is documented deprecated. And, there is a documentation on how to use 3rd party compression and or encryption tools, when backup up to stream, which is what you do https://mariadb.com/kb/en/library/using-encryption-and-compression-tools-with-mariabackup/

          wlad Vladislav Vaintroub added a comment - Out of curiosity, why do you use compress, at all? We do not encourage to use --compress option anymore. It is documented deprecated. And, there is a documentation on how to use 3rd party compression and or encryption tools, when backup up to stream, which is what you do https://mariadb.com/kb/en/library/using-encryption-and-compression-tools-with-mariabackup/

          People

            vlad.lesin Vladislav Lesin
            artvidaxl Art van Scheppingen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.