[MDEV-32010] Deduplicated dump via zpaq technology Created: 2023-08-25  Updated: 2023-08-25

Status: Open
Project: MariaDB Server
Component/s: Backup
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Franco Corbelli Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

The functionality of mysqldump can be greatly enhanced by using a program with block deduplication (actually, it could be done even much better, being ASCII text, but I won't burden the discussion)

One of these, in my opinion the best, is zpaq

This is a program that has been in major Linux distributions for years (runs on Windows too), but is very little known: http://mattmahoney.net/dc/zpaq.html

However, it has a major "flaw" for use with mysqldump: does not work with stdin (aka: no pipe).
This requires saving the file locally, and then compressing it.
The developer do not support this program for a few years now, so I intervened with a fork
A small opensource project that adds, among other features, a necessary "piece" for mysql backups, namely support for the "pipe"
https://github.com/fcorbelli/zpaqfranz

So it is possible to do "things" like

mysqldump -uroot -ppassword franco | zpaqfranz a archivio_franco.zpaq backup.sql -stdin
mysqldump -uroot -ppassword --all-databases | zpaqfranz a archivio_franco.zpaq backup.sql -stdin

My suggestion is: why not improve mysqldump with versioned/snapshot binary-archive too?

Basically instead of using more or less complex scripts to "split" the various days of dumps (Monday, Tuesday, Wednesday...) to maintain a backup-history, you can store them all together as if they were snapshots.

I am not hypothesizing the use of "my" program, but of "a" program integrated with mysqldump to compress and store deduplicated dumps in the main mariadb codebase.

Something like
mysqldump (...) -snapshot foo.whatever

This would make life much, much easier for any DBA, and maybe not too hard to implement (I think that mysqldump somewhere "stream out" 1 byte at time to stdout, wherefore a "stream in" 1 byte ad time compressor will be OK. Just like zpaq)

Just a suggestion!

_I would have other suggestions for more complex situations (databases too large to use mysqldump), but they are less straightforward
_

I apologize if I opened a task (can get only task, bug or epic) on Jira but I am a beginner here, and so I hope to be forgiven


Generated at Thu Feb 08 10:28:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.