Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29896

mariadb-backup got stuck with --throttle option doing incremental backup

Details

    Description

      1. create a table
      2. make a first part of backup:

      sudo mariadb-backup --backup -utest_user -pct570c3521fCCR# --target-dir=backup1 --throttle=2
      

      3. create another table
      4. try to create a second part of incremental backup

      sudo mariadb-backup --backup -utest_user -pct570c3521fCCR# --target-dir=backup1_2 --incremental-basedir=backup1 --throttle=2
      

      Result:
      process stops forever at:

      [00] 2022-10-27 11:44:33 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
      [00] 2022-10-27 11:44:33 mariabackup: The latest check point (for incremental): '42171'
      mariabackup: Stopping log copying thread.....[00] 2022-10-27 11:44:34 >> log scanned up to (751195)
      

      throttle-1.sh - script to reproduce

      Attachments

        Issue Links

          Activity

            I think that I have something to proceed with.

            marko Marko Mäkelä added a comment - I think that I have something to proceed with.

            I can repeat a hang with the following:

            diff --git a/mysql-test/suite/mariabackup/incremental_backup.test b/mysql-test/suite/mariabackup/incremental_backup.test
            index ec38af4c27d..5db5646ce57 100644
            --- a/mysql-test/suite/mariabackup/incremental_backup.test
            +++ b/mysql-test/suite/mariabackup/incremental_backup.test
            @@ -33,7 +33,7 @@ INSERT INTO t VALUES(1);
             
             echo # Create full backup , modify table, then create incremental/differential backup;
             --disable_result_log
            -exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf  --backup --parallel=10 --target-dir=$basedir;
            +exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf  --backup --parallel=10 --target-dir=$basedir --throttle=2;
             --enable_result_log
             BEGIN;
             INSERT INTO t VALUES(0);
            

            However, it is a different hang: log_copying_thread() would keep invoking xtrabackup_copy_logfile() every 2 seconds, apparently never finishing, even though there should be no log to copy.

            marko Marko Mäkelä added a comment - I can repeat a hang with the following: diff --git a/mysql-test/suite/mariabackup/incremental_backup.test b/mysql-test/suite/mariabackup/incremental_backup.test index ec38af4c27d..5db5646ce57 100644 --- a/mysql-test/suite/mariabackup/incremental_backup.test +++ b/mysql-test/suite/mariabackup/incremental_backup.test @@ -33,7 +33,7 @@ INSERT INTO t VALUES(1); echo # Create full backup , modify table, then create incremental/differential backup; --disable_result_log -exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --parallel=10 --target-dir=$basedir; +exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --parallel=10 --target-dir=$basedir --throttle=2; --enable_result_log BEGIN; INSERT INTO t VALUES(0); However, it is a different hang: log_copying_thread() would keep invoking xtrabackup_copy_logfile() every 2 seconds, apparently never finishing, even though there should be no log to copy.

            My above change to a test does not always reproduce an actual hang. In one case that I debugged (innodb_page_size=32k, innodb_undo_tablespaces=0), it only limited the speed of copying files to about 2 operations per second. When I set a data watchpoint on io_ticket in GDB, small progress was being made, copying some non-InnoDB files. Eventually the test did complete, in 3 to 4 minutes. But, a next test with innodb_undo_tablespaces=3 did hang, with only 2 threads being active: timer_handler waiting for something, and the thread that tries to execute mysql_cond_destroy(&log_copying_stop).

            I suspect that there could be a problem that two threads are executing timed waits on that condition variable. I will try to reduce that to 1 and see what would happen.

            marko Marko Mäkelä added a comment - My above change to a test does not always reproduce an actual hang. In one case that I debugged (innodb_page_size=32k, innodb_undo_tablespaces=0), it only limited the speed of copying files to about 2 operations per second. When I set a data watchpoint on io_ticket in GDB, small progress was being made, copying some non-InnoDB files. Eventually the test did complete, in 3 to 4 minutes. But, a next test with innodb_undo_tablespaces=3 did hang, with only 2 threads being active: timer_handler waiting for something, and the thread that tries to execute mysql_cond_destroy(&log_copying_stop) . I suspect that there could be a problem that two threads are executing timed waits on that condition variable. I will try to reduce that to 1 and see what would happen.

            Finally, I figured it out. The io_watching_thread() was being created before the condition variable log_copying_stop had been initialized in the main thread. So, the first mysql_cond_timedwait() could be executed concurrently with the initialization of the condition variable.

            I would clean up also some other race conditions around this.

            The following test seems to run about as fast with my fix, and rather reliably reproduce the hang (5 of 10 tests would complete in 6 seconds, and 5 would hang when I tried it):

            diff --git a/mysql-test/suite/mariabackup/incremental_backup.test b/mysql-test/suite/mariabackup/incremental_backup.test
            index fed2d5e60a9..62e4f9c6875 100644
            --- a/mysql-test/suite/mariabackup/incremental_backup.test
            +++ b/mysql-test/suite/mariabackup/incremental_backup.test
            @@ -34,7 +34,7 @@ INSERT INTO t VALUES(1);
             
             echo # Create full backup , modify table, then create incremental/differential backup;
             --disable_result_log
            -exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf  --backup --parallel=10 --target-dir=$basedir;
            +exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf  --backup --parallel=10 --target-dir=$basedir --throttle=1000;
             --enable_result_log
             BEGIN;
             INSERT INTO t VALUES(0);
            

            This race condition was caused by MDEV-21452, which made io_watching_thread() depend on the condition variable log_copying_stop.

            marko Marko Mäkelä added a comment - Finally, I figured it out. The io_watching_thread() was being created before the condition variable log_copying_stop had been initialized in the main thread. So, the first mysql_cond_timedwait() could be executed concurrently with the initialization of the condition variable. I would clean up also some other race conditions around this. The following test seems to run about as fast with my fix, and rather reliably reproduce the hang (5 of 10 tests would complete in 6 seconds, and 5 would hang when I tried it): diff --git a/mysql-test/suite/mariabackup/incremental_backup.test b/mysql-test/suite/mariabackup/incremental_backup.test index fed2d5e60a9..62e4f9c6875 100644 --- a/mysql-test/suite/mariabackup/incremental_backup.test +++ b/mysql-test/suite/mariabackup/incremental_backup.test @@ -34,7 +34,7 @@ INSERT INTO t VALUES(1); echo # Create full backup , modify table, then create incremental/differential backup; --disable_result_log -exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --parallel=10 --target-dir=$basedir; +exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --parallel=10 --target-dir=$basedir --throttle=1000; --enable_result_log BEGIN; INSERT INTO t VALUES(0); This race condition was caused by MDEV-21452 , which made io_watching_thread() depend on the condition variable log_copying_stop .

            test is OK. Bug can be closed

            tturenko Timofey Turenko added a comment - test is OK. Bug can be closed

            People

              marko Marko Mäkelä
              tturenko Timofey Turenko
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.