[MDEV-22965] Mariabackup does not lock sql_thread on slaves Created: 2020-06-19  Updated: 2022-01-25  Resolved: 2021-02-15

Status: Closed
Project: MariaDB Server
Component/s: mariabackup, Replication
Affects Version/s: 10.3
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Todd Stoffel (Inactive) Assignee: Todd Stoffel (Inactive)
Resolution: Incomplete Votes: 1
Labels: need_feedback


 Description   

When using a replica as a donor with mariabackup, the sql_thread is not stopped and queries coming from the master are allowed to continue. This creates a dirty backup and duplication when the joiner is started.

Yes I used the words donor and joiner (Galera words). But I do so just to set the context, this is not a Galera system.



 Comments   
Comment by Vladislav Lesin [ 2020-12-10 ]

toddstoffel, mariabackup 10.3 uses "FLUSH TABLE WITH READ LOCK" at some point of backup process to get slave info. This lock prevent slave applier thread from committing. The following test proves it:

--source include/have_innodb.inc                                                
--source include/master-slave.inc                                               
                                                                                
--connection master                                                             
                                                                                
CREATE TABLE t(i INT) ENGINE INNODB;                                            
                                                                                
--sync_slave_with_master                                                        
--connection slave                                                              
FLUSH TABLES WITH READ LOCK;                                                    
set global lock_wait_timeout=1;                                                 
                                                                                                                                                                
--connection master                                                             
INSERT INTO t VALUES(1);                                                        
                                                                                
--connection slave                                                              
--let $slave_timeout=5                                                          
--source include/sync_with_master.inc                                           
--source include/rpl_end.inc                       

It fails by timeout.

If you are talking about Galera cluster, then there is a case we are currently working on, it's MENT-939, which shows that Galera code bypasses global MDL for at least DDL's.

There is also MDEV-23080 case, which shows some issues in new backup locks during getting slave info, but this does not cause "dirty" backup, and it affects 10.4+.

Could you please unfold your description and explain in details, why do you think that "When using a replica as a donor with mariabackup, the sql_thread is not stopped and queries coming from the master are allowed to continue."

Generated at Thu Feb 08 09:18:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.