[MDEV-5336] Implement BACKUP STAGE for safe external backups Created: 2013-11-26  Updated: 2024-01-19  Resolved: 2019-02-04

Status: Closed
Project: MariaDB Server
Component/s: Backup, Server
Fix Version/s: 10.4.2

Type: Task Priority: Critical
Reporter: erkan yanar Assignee: Michael Widenius
Resolution: Fixed Votes: 19
Labels: Backup

Issue Links:
Duplicate
duplicates MDEV-8436 Implement "Backup Locks" such as "LOC... Closed
duplicates MDEV-12031 BACKUP LOCKS Closed
PartOf
includes MDEV-17308 BACKUP: mariabackup support Closed
includes MDEV-17309 BACKUP LOCK: DDL locking of tables du... Closed
Problem/Incident
causes MDEV-18067 Server crash in backup_end or asserti... Closed
causes MDEV-18068 Assertion `this == ticket->get_ctx()'... Closed
causes MDEV-18069 Server hang or crash in MDL_lock::inc... Closed
causes MDEV-18213 Unexpected ER_LOCK_DEADLOCK upon BACK... Open
causes MDEV-19749 MDL scalability regression after back... Open
causes MDEV-20945 BACKUP UNLOCK + FTWRL assertion failu... Closed
causes MDEV-20946 Hard FTWRL deadlock under user level ... Closed
Relates
relates to MDEV-11803 perfschema.stage_mdl_global fails in ... Open
relates to MDEV-14992 BACKUP: in-server backup Open
relates to MDEV-17310 BACKUP: Aria Closed
relates to MDEV-17311 BACKUP: RocksDB Open
relates to MDEV-17312 BACKUP: track and report DDLs Closed
relates to MDEV-17772 3 way lock : ALTER, MDL, BACKUP STAG... Closed
relates to MDEV-18023 Document Implement LOCK FOR BACKUP Closed
relates to MDEV-18465 Logging of DDL statements during backup Closed
relates to MDEV-21546 main.backup_stages occasionally fails... Closed
relates to MDEV-24845 Oddities around innodb_fatal_semaphor... Closed
relates to MDEV-31393 partial server freeze after LOCK TABL... Stalled
relates to MDEV-32970 mariabackup.data_directory test does ... Open
relates to MDEV-15636 mariabackup --lock-ddl-per-table hang... Closed
relates to MDEV-18917 Don't create xtrabackup_binlog_pos_in... Closed
relates to MDEV-19712 Backup stage queries commented out in... Closed
relates to MDEV-21546 main.backup_stages occasionally fails... Closed
relates to MDEV-25899 intermediate files operations are not... Closed
relates to MDEV-32932 port backup features from ES In Testing

 Description   

The purpose of this task is to ensure that mariabackup will be able to copy
any table from a local disk-based storage engine with a minimum of server performance
impact and minimum of locks. Main data for transactional tables will be copied
without any locks. Non transactional tables will be copied under a lock, but with
less waiting than the current FLUSH TABLES WITH READ LOCK.

Instead of using FLUSH TABLES WITH READ LOCK in mariabackup
introduce a new "BACKUP LOCK" that will not flush (ie close) InnoDB
tables and only block InnoDB commits, new DDL's and the final rename
that is part of ALTER TABLE.

  • Taking the BACKUP LOCK's should be “instant” in almost all cases (when
    using InnoDB or other crash safe storage handlers) as it has only to
    wait for the transaction at commit stage to complete.
  • BACKUP LOCK's shouldn't have to wait for running transactions that are using
    InnoDB. ALTER TABLE'S that are running will also not block BACKUP
    LOCK's.
  • At the last stage writing to the binlog and new commits should be blocked.
  • This lock will also solve the problem with MDEV-15636 (killing
    running queries that conflicts with FLUSH) as the backup locks will
    not conflict with other DDL locks.
  • Log tables (general log and slow log) and statistics tables should
    not be locked until the last stage (BLOCK COMMIT), but we would need a separate phase
    to lock and copy log tables in the last copy phase to ensure the
    tables are consistent.
  • Percona backup locks doesn't block any SELECT. This will cause
    backed up MyISAM and Aria tables to be regarded as not closed. If we
    do the same, we should as part of backup run aria_check --fast
    --update-state and myisamchk --fast --update-state on all myisam and
    Aria files.
    https://www.percona.com/blog/2014/03/11/introducing-backup-locks-percona-server-2/.
    The proposed solution will not have this problem.

With the above in mind, here is a detailed description of how the BACKUP STAGE's
should work:

  • Introduce a new "log changed tables" service that will log all DDL's
    on tables: CREATE, RENAME, DROP, TRUNCATE, ALTER. This is needed to be
    able to detect DDL's done during the backup for all storage engine
    during phase BLOCK_COMMIT. The current mariabackup can only detect DDL's for
    InnoDB that are stored in the redo log, not DDL on any other type of tables.
    The current idea is to create a file in mariadb_data/backup_ddl.log
  • In the following text, transactional means InnoDB or "InnoDB-like
    engine with redo log that can lock redo purges and can be copied
    without locks by an outside process".
  • MyRocs is "non-transactional" in this context copied in the stage BLOCK COMMIT.
  • During the backup, any files with a prefix of "#sql-" should be ignored.

BACKUP STAGE START

  • Start service to log changed tables.
  • Block purge of redo files (needed at least for Aria, not needed for
    InnoDB as InnoDB redo logs are created at startup). Requires new
    handler call.
  • Make a checkpoint for all transactional tables (to speed up recovery of
    backup). Requires new handler call. Note that the checkpoint is not critical,
    just a minor optimization.
  • Both of the above can be done with a 'prepare_for_backup()' handler call.

mariabackup can now copy all transactional tables and aria_log_control, aria_log.# and
other engines redo logs.
Next stage is to be done after all copying is done.

BACKUP STAGE FLUSH

  • FLUSH all changes for not active non transactional tables, except for statistics and log
    tables. Close the all tables that are not in use, to ensure they are marked as closed for
    the backup. One can get a list of all in use tables with "SHOW OPEN TABLES".
  • BLOCK all new write row locks for all non transactional tables
    (except statistics and log tables)
  • Mark all active non transactional tables (except statistics and log
    tables) to be flushed and closed at end of statement. When last
    instance of a table is flushed (and the table is marked as read only
    by all users, we should call handler->extra(EXTRA_MARK_CLOSED). This
    is needed to handle the case that someone opens a tables as read only
    while the table is still in use, in which case the table would never
    have been closed by everyone.
  • DDL's doesn't have to be blocked yet at this stage as they can't set the table in a
    non consistent state. CREATE ... SELECT may be blocked, will know more when
    doing the actual implementation.

Next lock can be taken directly after this lock. While waiting for the
next lock mariabackup can start copying all non transactional tables that are
not in use. This list of used tables can be found in information schema with
"SHOW OPEN TABLES".
mariabackup can also copy all new changes to the aria_log.# tables.

BACKUP STAGE BLOCK_DDL

  • Wait for all statements using write locked non-transactional tables to end. This should
    be done as we do with FTWRL, which aborts any current locks. This solves the deadlock
    that Sergei commented upon.
  • While waiting it could report to the client non-transactional tables as soon as they
    become unused, so that the client could copy them while waiting for other tables.
  • Block TRUNCATE TABLE, CREATE TABLE, DROP TABLE and RENAME TABLE. Block
    also start of a new ALTER TABLE and the final rename phase of ALTER TABLE.
  • Running ALTER TABLES are not blocked.
  • Running Algorithm=INPLACE ALTER TABLE'S should be blocked just before copying is completed.
    This may require a callback from the InnoDB code.

Next lock can be taken directly after this lock. While waiting for the
next lock mariabackup tool can start copying:

  • The rest of the non-transactional tables (as found from information schema)
  • All .frm, .trn and other system files,
  • New tables created before BLOCK DDL. The file names can be read from the
    new changed tables service. This log also allow the backup to do renames
    of tables on which RENAME's where done instead of copying them.
  • Copy changes to system log tables (this is easy as these are append only)
  • Copy changes to aria_log.# tables (this is easy as these are append only)
  • If there is a lot of new tables to copy (found be examining the backup ddl log) before going to
    BACKUP STAGE BLOCK_COMMIT, one could do a second loop and copy these before
    going to BLOCK_COMMIT as this would allow DDL's to proceed while copying.

BACKUP STAGE BLOCK_COMMIT

  • Lock the binary log and commit/rollback to ensure that no changes are
    committed to any tables. If there are active committ's or data to be copied to
    the binary log this will be allowed to finish before the lock is granted.
  • This doesn't lock temporary tables that are not used by replication. However
    these will be blocked when it's time to write to binary log.
  • Lock system log tables and statistics tables and close them.

When stage BLOCK_COMMITs returns, this is the 'backup time'.
Everything committed will be in the backup and everything not committed will roll back.
Transactional engines will continue to do changes to the redo log
during stage BLOCK COMMIT, but this is not important as all of these will roll
back later as the changes will not be committed.

mariabackup can now copy the last changes to the redo files for InnoDB
and Aria (aria_log.#), and the part of the binary log that was not copied before.
MyRocks files can also be hard linked
End of system log tables (slow_log and general_log) and all statistics tables (table_stats, column_stats and index_stats) should also be copied.

BACKUP STAGE END

  • Unlocks all BACKUP LOCKS
  • Call new handler call 'end_backup()' handler call, which will enable
    purge of redo files.
    After this one can potentially copy the MyRocks files as long as on doesn't
    copy anything new that happened after BACKUP STAGE END.

Other things:

  • Only one connection can run BACKUP STAGE START. If a second one tries, it will wait until the first one has executed BACKUP STAGE END.
  • If the user skips a BACKUP STAGE, all intermediate backup stages will automatically be run. This will allow us to add new BACKUP STAGE's in the future with even more precise locks without causing problems for tools using an earlier version of BACKUP STAGE's


 Comments   
Comment by erkan yanar [ 2015-01-28 ]

In the end we need something like LOCK TABLES FOR BACKUP like with Percona:
http://www.percona.com/doc/percona-server/5.6/management/backup_locks.html

Comment by Peter McLarty [ 2015-03-16 ]

Oracle has tables as part of a tablespace and the tablespace is able to be anywhere on disk the tables are within a tablespace and oracle you can place a tablespace in backup mode, all data within that tablespace is kept consistent. Oracle writes logs of activity to be changed in that tablespace until the lock is released. This might be a step too far but may help formulate a way forward to get this to work. Tablespaces probably need to be implemented within the engine but could still be kept in the mariadb variables.
This would allow the concept of then managing data with a lock that doesn't necessarily affect the total server instance only the tablespace/s you wish to back up in that window for consistency.
For example now I don't necessarily need to lock the mysql database at the same time as other databases and not have a consistent databases. Often we can have different applications without interaction and can be backed up separately without the data becoming inconsistent even when those backups are at different times. Lots of sites do this now with mysqldump for their server backups.

Comment by guo feng [ 2015-11-17 ]

Hope backup locks come to MariaDB soon~

Comment by Vladislav Vaintroub [ 2018-03-27 ]

Perhaps we should already adopt 8.0 syntax, not sure about it https://dev.mysql.com/doc/refman/8.0/en/lock-instance-for-backup.html

Comment by Sergei Golubchik [ 2018-09-17 ]

Notes:

  • stage 1 assumes that InnoDB logs RENAME in the redo log
    Currenty it doesn't, but there's a plan to fix it.
  • stage 2 assumes that "block writes" also blocks RENAME.
    it's a bit tricky, as thr_lock isn't taken for rename, and mdl is taken too early, before the table is opened
  • everywhere "transactional" means "InnoDB"
    or "InnoDB-like engine with redo log that can lock redo purges, logs RENAME, and can be copied without locks by an outside process"
  • RocksDB is "non-transactional" in this context
    copied in the stage 3 using see MDEV-13122
  • InnoDB inplace ALTER, probably, not handled. Final frm rename happens when the alter is completely finished and committed internally. If frm rename is blocked, the backup will be inconsistent.
Comment by Sergei Golubchik [ 2018-09-17 ]

UI Notes. Better to avoid explicit stage numbers. Ideas, in the order from small to big

  • LOCK FOR BACKUP NEXT STAGE (this completely avoids the question of specifying stage numbers out of order, but still allows an early unlock)
  • LOCK FOR BACKUP, LOCK FOR BACKUP ... — every such statement returns in a Note what happens now, for example "3 stages left" or "now, copy innodb data files" etc. The final LOCK FOR BACKUP says "Note: backup finished all locks removed".
  • It's confusing that LOCK FOR BACKUP will unlock, so don't call it LOCK, but BACKUP (or BACKUP START/BACKUP CONTINUE etc). This allows future extensions for non-xtrabackup backups, like filesystem snapshot (BACKUP ... ALGORITHM=snapshot) etc.
  • Make BACKUP statement to return a useful result set, like, file names to copy or tables that were write-locked or created, so that mariabackup wouldn't need to scan the datadir over and over again. This extends nicely to a BACKUP statement that returns actual data, if we'd ever decide to implement that too.
Comment by Marko Mäkelä [ 2018-09-17 ]

Stage 1 is using confusing terminology:

Block purge of redo files

In InnoDB, redo log files can only be removed at startup, when creating new redo log files due to size mismatch, upgrade, or a change of the parameter innodb_encrypt_log. The InnoDB purge is associated with undo log records, which reside in data pages in the system tablespace or undo tablespaces. These data pages are redo-logged just like any other data pages.

For stage 1, InnoDB could issue a redo log checkpoint. This would require new handler API.

Stage 3 would block TRUNCATE TABLE among other things. I believe that this is unnecessary after the MDEV-13564 TRUNCATE, which is present in the latest 10.3, 10.4, and has also been successfully tested on top of 10.2, but not pushed yet.

As a prerequisite of MDEV-13564, we would have transactional rename operations inside InnoDB (MDEV-14717). With that, I believe that the only thing that needs to be blocked are operations on .frm files.

Note: For ALTER TABLE…ALGORITHM=INPLACE, the .frm file is replaced after the operation has been committed in the storage engine. The commit inside InnoDB cannot be blocked, because data dictionary transactions are special. So, the block would have to be somewhere before the call to handler::commit_inplace_alter_table(commit=true).

Note: If ALTER TABLE is blocked before the final rename operations, then the backup can end up containing tables whose name starts with #sql. Inside InnoDB, such tables would be dropped at startup by MDEV-14585 (available starting with 10.3; would be backported to 10.2 as part of the TRUNCATE fix). The #sql*.frm files would not be removed by InnoDB; those would have to be removed or omitted by the backup program.

Stage 4 would again be ignored by InnoDB. InnoDB can basically write redo and data files at any time, so the ib_logfile* would have to be copied until the very end, which marks the logical time of the backup.

I strongly believe that the TRUNCATE fix (MDEV-13564) should be pushed to 10.2, even though the result is that Percona Xtrabackup will no longer work with MariaDB Server 10.2. Without that, we would have an issue with orphan #sql tables if any table-rebuilding operations were blocked between stage 3 and the end of the backup.

Comment by Sergey Vojtovich [ 2018-09-19 ]

From what I understand this framework aims to be engine independent. If so it should be reflected in description, so that we can think of other storage engines impact in advance.

Stage 1
-------
Should we implement additional handlerton method, like preapre_for_backup()? Then we need symmetric method to restore state on UNLOCK.

Stage 2
-------

FLUSH all changes for not active non transactional tables. Mark the tables as closed.

At low level it means: close unused non-transactional TABLE instances and mark their TABLE_SHARE's flushed? Anything else?

BLOCK all new write locks for the above non transactional tables.

Are we going to block all new write locks for all unused non-transactional tables, or only for just flushed tables?

Are we going to block write locks, but not DDL?

Is it going to be implemented via MDL or every non-transactional storage engine has to implement this functionality? Or some other way?

How do we pass list of used non-transactional tables to the client (that is tables that it should skip at this point)?

Stage 3
-------
There seem to be a deadlock at least with RENAME TABLE here:

backup thread: writes to non-transactional tables are blocked
user thread: RENAME TABLE t1 TO t2;
user thread: takes MDL_EXCLUSIVE for t1 and t2
user thread: attempts to write to non-transactional mysql.table_stats and mysql.index_stats, which are blocked
user thread: waits for backup thread
backup thread: attempts to block RENAME TABLE
backup thread: waits for user thread

Comment by Sergei Golubchik [ 2018-09-19 ]

Another possible deadlock:

  1. user thread: locks some table
  2. stage 2: block writes to all unused tables
  3. stage 3: wait for statements using other tables to end
  4. user thread: tries to write-lock into until-now-unused table

With MyISAM tables one can solve a deadlock with fallback-and-retry approach, as MyISAM tables should not be locked in the middle of the statement. But non-InnoDB transactional tables are locked till the end of the transaction, so fallback-and-retry doesn't work for then. And RocksDB is one of these "non-InnoDB transactional tables", currently it's backed up in one-shot action under FTWRL (see MDEV-13122)

Comment by Michael Widenius [ 2018-09-19 ]

Comment to Sergei's deadlock:

  • user thread: locks some table
  • stage 2: block writes to all unused tables
  • stage 3: wait for statements using other tables to end
  • user thread: tries to write-lock into until-now-unused table

This can be solved as we do with FTWRL where we abort any transaction that is waiting for a lock on non transnational table. As transactional tables will not be locked, user thread will either be able to continue (not blocked in thr_lock) or if locked in thr_lock, FTWRL will abort the thr_lock and stage 3 can continue.

Comment by Vladislav Vaintroub [ 2018-09-19 ]

serg monty : Why it is called "BACKUP STAGE X", if server only takes locks. The SQL command does not do backup.
We have a plan for such command, but this one is not that.

Comment by Vladislav Vaintroub [ 2018-09-20 ]

we do not always need BACKUP stage 4.
taking binlog current position is optional in mariabackup. Why should we block the commits, if we can rely on Innodb to ensure proper recovery if transactions did not finish.

Comment by Sergey Vojtovich [ 2018-09-21 ]

Why do we need to start "service to log changed tables" at STAGE 1?

Why this service has to log changes for all storage engines, given that "mariabackup can detect DDL's for InnoDB that are stored in the redo log"?

Blocking purge of redo files (aria specific) doesn't also seem to be required at STAGE 1. However if it is instant and doesn't produce measurable overhead, it is probably fine.

Edge cases:
What if storage engine is unloaded using UNINSTALL PLUGIN between prepare_for_backup() and end_backup()?

What if storage engine is loaded using INSTALL PLUGIN between prepare_for_backup() and end_backup()?

Comment by Sergey Vojtovich [ 2018-11-15 ]

Just for the record: previous review base before force push https://github.com/MariaDB/server/commits/6bb480cc58ab137d675497205e0d3566cbb39dd0

Comment by Michael Widenius [ 2018-11-23 ]

matthias, the above assert is fixed in the current code. I just added a test case to cover this case.

Comment by Michael Widenius [ 2018-11-27 ]

Will fix the privileges to only allow root to execute backup stages;
Statement timeouts works with backup stages; This is tested in backup_locks
SET STATEMENT max_statement_time=1 FOR backup stage block_ddl;
I have now also added the above test case to backup_locks.test

Comment by Vladislav Vaintroub [ 2018-11-27 ]

If you only allow "root" for the BACKUP STAGE statements, then users, who were able to run backup in 10.3, would not be able to do so in 10.4 . If this is by design, it should be documented.
10.3 only needed privileges necessary for FTWRL (RELOAD? LOCK TABLES?)

Comment by Michael Widenius [ 2018-11-27 ]

I will ensure it will use same privileges as FTWRL

Comment by VAROQUI Stephane [ 2019-01-25 ]

Quick question on myisam: BLOCK all new write row locks for all non transactional tables
(except statistics and log tables), can this not be done by introducing a binlog copy in row based service, each non transactional would start reporting binlog copy into backup-binlog.000001 get locked one by one and report binlog copy until the end of the backup, restore would apply the binlog flahback copy ?

Comment by Michael Widenius [ 2019-02-04 ]

Discussions to implement usage of backup stages in mariabackup

Generated at Thu Feb 08 07:03:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.