Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31382

SET GLOBAL innodb_undo_log_truncate=ON does not free space when no undo logs exist

Details

    Description

      The following simple test demonstrates that innodb_undo_log_truncate=ON fails to truncate undo tablespaces:

      --source include/have_innodb.inc
      --source include/have_sequence.inc
       
      SET GLOBAL innodb_fast_shutdown=0, innodb_undo_log_truncate=OFF;
       
      CREATE TABLE t(a INT PRIMARY KEY, b INT UNIQUE) ENGINE=InnoDB;
      INSERT INTO t SELECT seq, NULL FROM seq_1_to_500000;
       
      --source include/restart_mysqld.inc
      SET GLOBAL innodb_fast_shutdown=0, innodb_undo_log_truncate=ON;
      --source include/restart_mysqld.inc
       
      DROP TABLE t;
      

      Invocation:

      ./mtr --mysqld=--innodb-undo-tablespaces=2 name_of_test
      wc -c var/mysqld.1/data/undo*
      

      10.5 bb9da13baf5e5a4a435408fc05fd46253a00ea69

      10485760 var/mysqld.1/data/undo001
      13631488 var/mysqld.1/data/undo002
      24117248 total
      

      The expected outcome would be that all undo tablespaces have been truncated to their default soft limit size (innodb_max_undo_log_size=10M). Instead of that, we will observe that one of the undo tablespace files is larger.

      I think that the undo tablespace truncation needs to work also while InnoDB is running (mostly idle, with some writes every now and then) and the parameter innodb_purge_rseg_truncate_frequency caused a call to trx_purge_truncate_history() to be skipped during the last purge batch that made the undo logs logically empty but failed to reclaim the space.

      I originally noticed this when testing an upgrade from a server that is affected by MDEV-31234.

      Attachments

        Issue Links

          Activity

            In 10.5, if I run the test with ./mtr --rr, the second slow shutdown will be so slow that mtr kills the process. In 10.6, the shutdown completes. During the server run that ends in the second shutdown, purge_coordinator_callback() is not being invoked at all. The function trx_sys.history_size() will return 0 both times it was called, both in innodb_preshutdown().

            It looks like the condition in srv_wake_purge_thread_if_not_active() needs to be revised so that it will trigger the purge even if no history exists but undo tablespace truncation is enabled and useful. Similarly, the purge coordinator task needs to invoke trx_purge_truncate_history() once after the history list got empty.

            marko Marko Mäkelä added a comment - In 10.5, if I run the test with ./mtr --rr , the second slow shutdown will be so slow that mtr kills the process. In 10.6, the shutdown completes. During the server run that ends in the second shutdown, purge_coordinator_callback() is not being invoked at all. The function trx_sys.history_size() will return 0 both times it was called, both in innodb_preshutdown() . It looks like the condition in srv_wake_purge_thread_if_not_active() needs to be revised so that it will trigger the purge even if no history exists but undo tablespace truncation is enabled and useful. Similarly, the purge coordinator task needs to invoke trx_purge_truncate_history() once after the history list got empty.

            So far, I got the undo log truncation during slow shutdown to work for my test case. While working on it, I had to revise an unnecessarily strict condition that had originally been added in MDEV-30671:

            @@ -643,7 +644,8 @@ TRANSACTIONAL_TARGET static void trx_purge_truncate_history()
             
                   rseg.latch.rd_lock(SRW_LOCK_CALL);
                   ut_ad(rseg.skip_allocation());
            -      if (rseg.is_referenced() || rseg.needs_purge > head.trx_no)
            +      if (rseg.is_referenced() ||
            +          (rseg.needs_purge > head.trx_no && head.trx_no))
                   {
             not_free:
                     rseg.latch.rd_unlock();
            

            This condition must be revised in MDEV-31355 anyway.

            marko Marko Mäkelä added a comment - So far, I got the undo log truncation during slow shutdown to work for my test case. While working on it, I had to revise an unnecessarily strict condition that had originally been added in MDEV-30671 : @@ -643,7 +644,8 @@ TRANSACTIONAL_TARGET static void trx_purge_truncate_history() rseg.latch.rd_lock(SRW_LOCK_CALL); ut_ad(rseg.skip_allocation()); - if (rseg.is_referenced() || rseg.needs_purge > head.trx_no) + if (rseg.is_referenced() || + (rseg.needs_purge > head.trx_no && head.trx_no)) { not_free: rseg.latch.rd_unlock(); This condition must be revised in MDEV-31355 anyway.

            A call to trx_purge_truncate_history() will attempt to truncate all undo tablespaces whose size exceeds the soft limit innodb_max_undo_log_size.

            I tested my fix also outside shutdown:

            --source include/have_innodb.inc
            --source include/have_sequence.inc
            SET GLOBAL innodb_undo_log_truncate=OFF;
            CREATE TABLE t(a INT PRIMARY KEY, b INT UNIQUE) ENGINE=InnoDB;
            INSERT INTO t SELECT seq, NULL FROM seq_1_to_130000;
            UPDATE t SET b=a;
            DROP TABLE t;
            SET GLOBAL innodb_undo_log_truncate=ON;
            SET GLOBAL innodb_max_purge_lag_wait=0;
            

            My fix will cause SET GLOBAL innodb_undo_log_truncate=ON to wake up the purge coordinator in case it is not running.

            marko Marko Mäkelä added a comment - A call to trx_purge_truncate_history() will attempt to truncate all undo tablespaces whose size exceeds the soft limit innodb_max_undo_log_size . I tested my fix also outside shutdown: --source include/have_innodb.inc --source include/have_sequence.inc SET GLOBAL innodb_undo_log_truncate= OFF ; CREATE TABLE t(a INT PRIMARY KEY , b INT UNIQUE ) ENGINE=InnoDB; INSERT INTO t SELECT seq, NULL FROM seq_1_to_130000; UPDATE t SET b=a; DROP TABLE t; SET GLOBAL innodb_undo_log_truncate= ON ; SET GLOBAL innodb_max_purge_lag_wait=0; My fix will cause SET GLOBAL innodb_undo_log_truncate=ON to wake up the purge coordinator in case it is not running.

            LGTM

            vlad.lesin Vladislav Lesin added a comment - LGTM

            Related to this, I was wondering if it would make sense to change the default value of the confusingly named parameter innodb_purge_rseg_truncate_frequency to 1 (for the maximum frequency), so that undo log pages would be freed more frequently even when using the default setting innodb_undo_log_truncate=OFF. axel tested that and found that it would slightly reduce throughput.

            marko Marko Mäkelä added a comment - Related to this, I was wondering if it would make sense to change the default value of the confusingly named parameter innodb_purge_rseg_truncate_frequency to 1 (for the maximum frequency), so that undo log pages would be freed more frequently even when using the default setting innodb_undo_log_truncate=OFF . axel tested that and found that it would slightly reduce throughput.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.