Details
-
Task
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
Description
The purpose of this task is to ensure that mariabackup will be able to copy
any table from a local disk-based storage engine with a minimum of server performance
impact and minimum of locks. Main data for transactional tables will be copied
without any locks. Non transactional tables will be copied under a lock, but with
less waiting than the current FLUSH TABLES WITH READ LOCK.
Instead of using FLUSH TABLES WITH READ LOCK in mariabackup
introduce a new "BACKUP LOCK" that will not flush (ie close) InnoDB
tables and only block InnoDB commits, new DDL's and the final rename
that is part of ALTER TABLE.
- Taking the BACKUP LOCK's should be “instant” in almost all cases (when
using InnoDB or other crash safe storage handlers) as it has only to
wait for the transaction at commit stage to complete.
- BACKUP LOCK's shouldn't have to wait for running transactions that are using
InnoDB. ALTER TABLE'S that are running will also not block BACKUP
LOCK's.
- At the last stage writing to the binlog and new commits should be blocked.
- This lock will also solve the problem with
MDEV-15636(killing
running queries that conflicts with FLUSH) as the backup locks will
not conflict with other DDL locks.
- Log tables (general log and slow log) and statistics tables should
not be locked until the last stage (BLOCK COMMIT), but we would need a separate phase
to lock and copy log tables in the last copy phase to ensure the
tables are consistent.
- Percona backup locks doesn't block any SELECT. This will cause
backed up MyISAM and Aria tables to be regarded as not closed. If we
do the same, we should as part of backup run aria_check --fast
--update-state and myisamchk --fast --update-state on all myisam and
Aria files.
https://www.percona.com/blog/2014/03/11/introducing-backup-locks-percona-server-2/.
The proposed solution will not have this problem.
With the above in mind, here is a detailed description of how the BACKUP STAGE's
should work:
- Introduce a new "log changed tables" service that will log all DDL's
on tables: CREATE, RENAME, DROP, TRUNCATE, ALTER. This is needed to be
able to detect DDL's done during the backup for all storage engine
during phase BLOCK_COMMIT. The current mariabackup can only detect DDL's for
InnoDB that are stored in the redo log, not DDL on any other type of tables.
The current idea is to create a file in mariadb_data/backup_ddl.log
- In the following text, transactional means InnoDB or "InnoDB-like
engine with redo log that can lock redo purges and can be copied
without locks by an outside process".
- MyRocs is "non-transactional" in this context copied in the stage BLOCK COMMIT.
- During the backup, any files with a prefix of "#sql-" should be ignored.
BACKUP STAGE START
- Start service to log changed tables.
- Block purge of redo files (needed at least for Aria, not needed for
InnoDB as InnoDB redo logs are created at startup). Requires new
handler call. - Make a checkpoint for all transactional tables (to speed up recovery of
backup). Requires new handler call. Note that the checkpoint is not critical,
just a minor optimization. - Both of the above can be done with a 'prepare_for_backup()' handler call.
mariabackup can now copy all transactional tables and aria_log_control, aria_log.# and
other engines redo logs.
Next stage is to be done after all copying is done.
BACKUP STAGE FLUSH
- FLUSH all changes for not active non transactional tables, except for statistics and log
tables. Close the all tables that are not in use, to ensure they are marked as closed for
the backup. One can get a list of all in use tables with "SHOW OPEN TABLES".
- BLOCK all new write row locks for all non transactional tables
(except statistics and log tables)
- Mark all active non transactional tables (except statistics and log
tables) to be flushed and closed at end of statement. When last
instance of a table is flushed (and the table is marked as read only
by all users, we should call handler->extra(EXTRA_MARK_CLOSED). This
is needed to handle the case that someone opens a tables as read only
while the table is still in use, in which case the table would never
have been closed by everyone.
- DDL's doesn't have to be blocked yet at this stage as they can't set the table in a
non consistent state. CREATE ... SELECT may be blocked, will know more when
doing the actual implementation.
Next lock can be taken directly after this lock. While waiting for the
next lock mariabackup can start copying all non transactional tables that are
not in use. This list of used tables can be found in information schema with
"SHOW OPEN TABLES".
mariabackup can also copy all new changes to the aria_log.# tables.
BACKUP STAGE BLOCK_DDL
- Wait for all statements using write locked non-transactional tables to end. This should
be done as we do with FTWRL, which aborts any current locks. This solves the deadlock
that Sergei commented upon. - While waiting it could report to the client non-transactional tables as soon as they
become unused, so that the client could copy them while waiting for other tables. - Block TRUNCATE TABLE, CREATE TABLE, DROP TABLE and RENAME TABLE. Block
also start of a new ALTER TABLE and the final rename phase of ALTER TABLE. - Running ALTER TABLES are not blocked.
- Running Algorithm=INPLACE ALTER TABLE'S should be blocked just before copying is completed.
This may require a callback from the InnoDB code.
Next lock can be taken directly after this lock. While waiting for the
next lock mariabackup tool can start copying:
- The rest of the non-transactional tables (as found from information schema)
- All .frm, .trn and other system files,
- New tables created before BLOCK DDL. The file names can be read from the
new changed tables service. This log also allow the backup to do renames
of tables on which RENAME's where done instead of copying them. - Copy changes to system log tables (this is easy as these are append only)
- Copy changes to aria_log.# tables (this is easy as these are append only)
- If there is a lot of new tables to copy (found be examining the backup ddl log) before going to
BACKUP STAGE BLOCK_COMMIT, one could do a second loop and copy these before
going to BLOCK_COMMIT as this would allow DDL's to proceed while copying.
BACKUP STAGE BLOCK_COMMIT
- Lock the binary log and commit/rollback to ensure that no changes are
committed to any tables. If there are active committ's or data to be copied to
the binary log this will be allowed to finish before the lock is granted. - This doesn't lock temporary tables that are not used by replication. However
these will be blocked when it's time to write to binary log. - Lock system log tables and statistics tables and close them.
When stage BLOCK_COMMITs returns, this is the 'backup time'.
Everything committed will be in the backup and everything not committed will roll back.
Transactional engines will continue to do changes to the redo log
during stage BLOCK COMMIT, but this is not important as all of these will roll
back later as the changes will not be committed.
mariabackup can now copy the last changes to the redo files for InnoDB
and Aria (aria_log.#), and the part of the binary log that was not copied before.
MyRocks files can also be hard linked
End of system log tables (slow_log and general_log) and all statistics tables (table_stats, column_stats and index_stats) should also be copied.
BACKUP STAGE END
- Unlocks all BACKUP LOCKS
- Call new handler call 'end_backup()' handler call, which will enable
purge of redo files.
After this one can potentially copy the MyRocks files as long as on doesn't
copy anything new that happened after BACKUP STAGE END.
Other things:
- Only one connection can run BACKUP STAGE START. If a second one tries, it will wait until the first one has executed BACKUP STAGE END.
- If the user skips a BACKUP STAGE, all intermediate backup stages will automatically be run. This will allow us to add new BACKUP STAGE's in the future with even more precise locks without causing problems for tools using an earlier version of BACKUP STAGE's
Attachments
Issue Links
- causes
-
MDEV-18067 Server crash in backup_end or assertion failure `ticket->m_duration == MDL_EXPLICIT' upon BACKUP STAGE END after FLUSH TABLE with locks
- Closed
-
MDEV-18068 Assertion `this == ticket->get_ctx()' failed in MDL_context::release_lock upon BACKUP STAGE END
- Closed
-
MDEV-18069 Server hang or crash in MDL_lock::incompatible_granted_types_bitmap or ASAN heap-use-after-free in MDL_ticket::has_stronger_or_equal_type
- Closed
-
MDEV-18213 Unexpected ER_LOCK_DEADLOCK upon BACKUP STAGE BLOCK_COMMIT
- Open
-
MDEV-19749 MDL scalability regression after backup locks
- Open
-
MDEV-20945 BACKUP UNLOCK + FTWRL assertion failure | SIGSEGV in I_P_List from MDL_context::release_lock on INSERT w/ BACKUP LOCK (on optimized builds) | Assertion `ticket->m_duration == MDL_EXPLICIT' failed.
- Closed
-
MDEV-20946 Hard FTWRL deadlock under user level locks
- Closed
- duplicates
-
MDEV-8436 Implement "Backup Locks" such as "LOCK TABLES FOR BACKUP" and "LOCK BINLOG FOR BACKUP"
- Closed
-
MDEV-12031 BACKUP LOCKS
- Closed
- includes
-
MDEV-17308 BACKUP: mariabackup support
- Closed
-
MDEV-17309 BACKUP LOCK: DDL locking of tables during backup
- Closed
- relates to
-
MDEV-11803 perfschema.stage_mdl_global fails in buildbot on very slow builders
- Open
-
MDEV-14992 BACKUP: in-server backup
- Open
-
MDEV-17310 BACKUP: Aria
- Closed
-
MDEV-17311 BACKUP: RocksDB
- Open
-
MDEV-17312 BACKUP: track and report DDLs
- Closed
-
MDEV-17772 3 way lock : ALTER, MDL, BACKUP STAGE BLOCK_DDL
- Closed
-
MDEV-18023 Document Implement LOCK FOR BACKUP
- Closed
-
MDEV-18465 Logging of DDL statements during backup
- Closed
-
MDEV-21546 main.backup_stages occasionally fails with lock wait timeout
- Closed
-
MDEV-24845 Oddities around innodb_fatal_semaphore_wait_threshold and global.innodb_disallow_writes
- Closed
-
MDEV-31393 partial server freeze after LOCK TABLES ...; FLUSH TABLES;
- Stalled
-
MDEV-32970 mariabackup.data_directory test does not delete a directory with backup and it causes failure of the next test
- Closed
-
MDEV-34871 BACKUP STAGE BLOCK_COMMIT had better return an InnoDB LSN
- Open
-
MDEV-34876 The table mysql.general_log is not protected during backup
- Open
-
MDEV-15636 mariabackup --lock-ddl-per-table hangs in FLUSH TABLES due to MDL conflict if ALTER TABLE issued
- Closed
-
MDEV-18917 Don't create xtrabackup_binlog_pos_innodb with Mariabackup
- Closed
-
MDEV-19712 Backup stage queries commented out in mariabackup's backup_mysql.cc
- Closed
-
MDEV-21546 main.backup_stages occasionally fails with lock wait timeout
- Closed
-
MDEV-25899 intermediate files operations are not protected by backup locks
- Closed
-
MDEV-32932 port backup features from MariaDB Enterprise
- Closed