The purpose of this work is to improve current situation around backups by implementing a SQL BACKUP command.
BACKUP command sends the file segments to the client via MySQL's protocol ResultSet, i.e client issues BACKUP statement, server sends a result set back
The ResultSet consists of following fields
- VARCHAR(N) filename
- INT(SET?) flags (OPEN,CLOSE, SPARSE, RENAME, DELETE, etc)
- BIGINT offset
- LONGBLOB data
There will be a simple client program which will issue "BACKUP" statement ,read the result set from server, and apply it to create a directory with a copy of the database. Due to abiliity to use client API (in C, .NET, Java, etc), it should be relatively easy to write specialized backup applications for special needs (e.g "streaming to clould", distributing to multiple machines)
Most of the actual work is done by the storage engines, server initiates backup by calling storage engine's backup()_begin function. For every file chunk, storage engine calls server's backup_file(filename, flag, offset, length, data) function, that streams file chink as result set row to client. When backup is finished, engine calls `backup_end()` so that server would know when backup is finally done.
There default implementation of backup_start() enumerates engines own files, reads them, and signals backup_file() for every chunk read. FLUSH TABLES FOR BACKUP, must be issued for that.
Engines that support online backup, like Innodb would have much more complicated implementation.
Incremental/differential (changes since specific backup) and partial backups( only some tables) will be supported as well, and the client-server interface does not change much for that . For incremental backup, the client application might need to invent some "patch" format, or use some standard for binary diffs (rdiff? bsdiff? xdelta? ), and save diffs to be applied to "base backup" directory later.
There is a need to store different kind of metadata in the backup, usually binlog position, and some kind of timestamp (LSN) for incremental backup. This information can be stored as session variable available after BACKUP command, or send as extra result set after the backup one (protocol does support multi-result sets).
Unlike the current solution with mariabackup or xtrabackup, there won't be any kinds of "prepare" or "restore" application,
The directory that was made when full backup was stored should be usable for new database.
files from partial backups could just be copied to destination (and maybe "IMPORT TABLESPACE", if we still support that).
There should still be some kind of "patch" application for applying incremental backups to the base directory. Dependend on how our client stores incremental backup, we can do without it , and users would only need to use 3rd party bsdiff, rdiff,, mspatcha.