[MDEV-27919] mariabackup --log-copy-interval is measured in milliseconds in 10.5 and in microseconds in 10.6 Created: 2022-02-23  Updated: 2022-08-25  Resolved: 2022-04-21

Status: Closed
Project: MariaDB Server
Component/s: mariabackup
Affects Version/s: 10.6.7
Fix Version/s: 10.6.8, 10.7.4, 10.8.3, 10.9.1

Type: Bug Priority: Major
Reporter: Harvey Cooper Assignee: Vladislav Lesin
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Blocks
Relates
relates to MDEV-25765 Mariabackup reduced verbosity option ... Closed

 Description   

Hi, all MariaBackup developers.

We have two MariaDB clusters, which both version 10.0.x

And recently upgraded from 10.0.x to 10.6.2
But I found that the log ">> log scanned up to (...)" print too fast, overwhelming the stdout/stderr. (the log file size is occupied by several gigabytes)

But as I tested, when I upgraded from 10.0.x to 10.5.x, the log ">> log scanned up to (...)" is not so overwhelming. (the log file size is occupied by several megabytes)

FYI, Recovered from the same database backup.

It sounds like a bug in mariabackup, since the --log-copy-interval (default is 1000ms) can't control the log output frequency.

# 10.5 MariaBackup
# One log per second
[00] 2022-02-14 17:50:25 >> log scanned up to (46200984432003)
[00] 2022-02-14 17:50:26 >> log scanned up to (46200984454330)
[00] 2022-02-14 17:50:27 >> log scanned up to (46200984469322)
[00] 2022-02-14 17:50:28 >> log scanned up to (46200984491853)
[00] 2022-02-14 17:50:29 >> log scanned up to (46200984529292)
[00] 2022-02-14 17:50:30 >> log scanned up to (46200984561329)
 
# 10.6 MariaBackup
# --log-copy-interval=1000 is not working for 10.6
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221905582)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221905582)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221905582)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221905582)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221905582)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221905582)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221905582)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221905582)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221905582)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221905582)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221905582)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221906153)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221906805)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221906805)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221906805)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221907586)
[00] 2022-02-22 17:11:48 >> log scanned up to (46276221908255)



 Comments   
Comment by Marko Mäkelä [ 2022-03-25 ]

Our regression tests (mtr) filter out all output of mariabackup or mariadb-backup.

I noticed this spamming when removing the log block structure and log block parser in MDEV-14425 for 10.8, but I did not investigate it very deep.

If this really is a regression in 10.6, then MDEV-21452 could have caused this.

Comment by Vladislav Lesin [ 2022-04-15 ]

In 10.5 polling interval is measured in microseconds:

static os_thread_ret_t DECLARE_THREAD(log_copying_thread)(void*)                
{                                                                               
...                                                                             
        for (;;) {                                                              
                os_event_reset(log_copying_stop);                               
                os_event_wait_time_low(log_copying_stop,                        
                                       xtrabackup_log_copy_interval * 1000U,    
                                       0);                                      
                if (xtrabackup_copy_logfile()) {                                
                        break;                                                  
                }                                                               
...                                                                             
        }                                                                       
...                                                                             
}                                                                               
                                                                                
os_event_wait_time_low(                                                         
/*===================*/                                                         
        os_event_t      event,                  /*!< in/out: event to wait */   
        ulint           time_in_usec,           /*!< in: timeout in             
                                                microseconds, or                
                                                OS_SYNC_INFINITE_TIME */        
        int64_t         reset_sig_count)        /*!< in: zero or the value

While in 10.6 the interval is measured in nanoseconds:

static void log_copying_thread()                                                
{                                                                               
...                                                                             
  while (!xtrabackup_copy_logfile() &&                                          
         (!metadata_to_lsn || metadata_to_lsn > log_copy_scanned_lsn))          
  {                                                                             
    timespec abstime;                                                           
    set_timespec_nsec(abstime, 1000ULL * xtrabackup_log_copy_interval);         
    mysql_cond_timedwait(&log_copying_stop, &log_sys.mutex, &abstime);          
  }                                                                             
...                                                                             
}

The workaround is to increase --log-copy-interval by a thousand times until it's fixed.

Comment by Vladislav Lesin [ 2022-04-19 ]

The bug was caused by 38fd7b7d9170.

Comment by Marko Mäkelä [ 2022-04-19 ]

Thank you for the analysis. I’m glad that it was so trivial after all. I hope that you can figure out some way to test it in mtr.

Generated at Thu Feb 08 09:56:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.