Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32010

Deduplicated dump via zpaq technology

    XMLWordPrintable

Details

    • Task
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • None
    • Backup
    • None

    Description

      The functionality of mysqldump can be greatly enhanced by using a program with block deduplication (actually, it could be done even much better, being ASCII text, but I won't burden the discussion)

      One of these, in my opinion the best, is zpaq

      This is a program that has been in major Linux distributions for years (runs on Windows too), but is very little known: http://mattmahoney.net/dc/zpaq.html

      However, it has a major "flaw" for use with mysqldump: does not work with stdin (aka: no pipe).
      This requires saving the file locally, and then compressing it.
      The developer do not support this program for a few years now, so I intervened with a fork
      A small opensource project that adds, among other features, a necessary "piece" for mysql backups, namely support for the "pipe"
      https://github.com/fcorbelli/zpaqfranz

      So it is possible to do "things" like

      mysqldump -uroot -ppassword franco | zpaqfranz a archivio_franco.zpaq backup.sql -stdin
      mysqldump -uroot -ppassword --all-databases | zpaqfranz a archivio_franco.zpaq backup.sql -stdin

      My suggestion is: why not improve mysqldump with versioned/snapshot binary-archive too?

      Basically instead of using more or less complex scripts to "split" the various days of dumps (Monday, Tuesday, Wednesday...) to maintain a backup-history, you can store them all together as if they were snapshots.

      I am not hypothesizing the use of "my" program, but of "a" program integrated with mysqldump to compress and store deduplicated dumps in the main mariadb codebase.

      Something like
      mysqldump (...) -snapshot foo.whatever

      This would make life much, much easier for any DBA, and maybe not too hard to implement (I think that mysqldump somewhere "stream out" 1 byte at time to stdout, wherefore a "stream in" 1 byte ad time compressor will be OK. Just like zpaq)

      Just a suggestion!

      _I would have other suggestions for more complex situations (databases too large to use mysqldump), but they are less straightforward
      _

      I apologize if I opened a task (can get only task, bug or epic) on Jira but I am a beginner here, and so I hope to be forgiven

      Attachments

        Activity

          People

            Unassigned Unassigned
            fcorbelli Franco Corbelli
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.