[MDEV-31382] SET GLOBAL innodb_undo_log_truncate=ON does not free space when no undo logs exist - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL)
Fix Version/s: 10.5.22, 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2
Component/s: Storage Engine - InnoDB
Labels:
- purge

Description

The following simple test demonstrates that innodb_undo_log_truncate=ON fails to truncate undo tablespaces:

--source include/have_innodb.inc

--source include/have_sequence.inc

SET GLOBAL innodb_fast_shutdown=0, innodb_undo_log_truncate=OFF;

CREATE TABLE t(a INT PRIMARY KEY, b INT UNIQUE) ENGINE=InnoDB;

INSERT INTO t SELECT seq, NULL FROM seq_1_to_500000;

--source include/restart_mysqld.inc

SET GLOBAL innodb_fast_shutdown=0, innodb_undo_log_truncate=ON;

--source include/restart_mysqld.inc

DROP TABLE t;

Invocation:

./mtr --mysqld=--innodb-undo-tablespaces=2 name_of_test

wc -c var/mysqld.1/data/undo*

10.5 bb9da13baf5e5a4a435408fc05fd46253a00ea69
10485760 var/mysqld.1/data/undo001
13631488 var/mysqld.1/data/undo002
24117248 total

The expected outcome would be that all undo tablespaces have been truncated to their default soft limit size (innodb_max_undo_log_size=10M). Instead of that, we will observe that one of the undo tablespace files is larger.

I think that the undo tablespace truncation needs to work also while InnoDB is running (mostly idle, with some writes every now and then) and the parameter innodb_purge_rseg_truncate_frequency caused a call to trx_purge_truncate_history() to be skipped during the last purge batch that made the undo logs logically empty but failed to reclaim the space.

I originally noticed this when testing an upgrade from a server that is affected by ~~MDEV-31234~~.

Attachments

Issue Links

is blocked by

MDEV-31355 innodb_undo_log_truncate=ON fails to wait for purge of enough transaction history

Closed

relates to

MDEV-29593 Purge misses a chance to free not-yet-reused undo pages

Closed

MDEV-31234 InnoDB does not free UNDO after the fix of MDEV-30671, thus shared tablespace (ibdata1) may grow indefinitely for no good reason

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2023-06-02 14:18

In 10.5, if I run the test with ./mtr --rr, the second slow shutdown will be so slow that mtr kills the process. In 10.6, the shutdown completes. During the server run that ends in the second shutdown, purge_coordinator_callback() is not being invoked at all. The function trx_sys.history_size() will return 0 both times it was called, both in innodb_preshutdown().

It looks like the condition in srv_wake_purge_thread_if_not_active() needs to be revised so that it will trigger the purge even if no history exists but undo tablespace truncation is enabled and useful. Similarly, the purge coordinator task needs to invoke trx_purge_truncate_history() once after the history list got empty.

Marko Mäkelä added a comment - 2023-06-02 14:18 In 10.5, if I run the test with ./mtr --rr , the second slow shutdown will be so slow that mtr kills the process. In 10.6, the shutdown completes. During the server run that ends in the second shutdown, purge_coordinator_callback() is not being invoked at all. The function trx_sys.history_size() will return 0 both times it was called, both in innodb_preshutdown() . It looks like the condition in srv_wake_purge_thread_if_not_active() needs to be revised so that it will trigger the purge even if no history exists but undo tablespace truncation is enabled and useful. Similarly, the purge coordinator task needs to invoke trx_purge_truncate_history() once after the history list got empty.

Marko Mäkelä added a comment - 2023-06-05 09:17

So far, I got the undo log truncation during slow shutdown to work for my test case. While working on it, I had to revise an unnecessarily strict condition that had originally been added in ~~MDEV-30671~~:

@@ -643,7 +644,8 @@ TRANSACTIONAL_TARGET static void trx_purge_truncate_history()

       rseg.latch.rd_lock(SRW_LOCK_CALL);

       ut_ad(rseg.skip_allocation());

-      if (rseg.is_referenced() || rseg.needs_purge > head.trx_no)

+      if (rseg.is_referenced() ||

+          (rseg.needs_purge > head.trx_no && head.trx_no))

 not_free:

         rseg.latch.rd_unlock();

This condition must be revised in ~~MDEV-31355~~ anyway.

Marko Mäkelä added a comment - 2023-06-05 09:17 So far, I got the undo log truncation during slow shutdown to work for my test case. While working on it, I had to revise an unnecessarily strict condition that had originally been added in MDEV-30671 : @@ -643,7 +644,8 @@ TRANSACTIONAL_TARGET static void trx_purge_truncate_history() rseg.latch.rd_lock(SRW_LOCK_CALL); ut_ad(rseg.skip_allocation()); - if (rseg.is_referenced() || rseg.needs_purge > head.trx_no) + if (rseg.is_referenced() || + (rseg.needs_purge > head.trx_no && head.trx_no)) { not_free: rseg.latch.rd_unlock(); This condition must be revised in MDEV-31355 anyway.

Marko Mäkelä added a comment - 2023-06-05 11:40

A call to trx_purge_truncate_history() will attempt to truncate all undo tablespaces whose size exceeds the soft limit innodb_max_undo_log_size.

I tested my fix also outside shutdown:

--source include/have_innodb.inc

--source include/have_sequence.inc

SET GLOBAL innodb_undo_log_truncate=OFF;

CREATE TABLE t(a INT PRIMARY KEY, b INT UNIQUE) ENGINE=InnoDB;

INSERT INTO t SELECT seq, NULL FROM seq_1_to_130000;

UPDATE t SET b=a;

DROP TABLE t;

SET GLOBAL innodb_undo_log_truncate=ON;

SET GLOBAL innodb_max_purge_lag_wait=0;

My fix will cause SET GLOBAL innodb_undo_log_truncate=ON to wake up the purge coordinator in case it is not running.

Marko Mäkelä added a comment - 2023-06-05 11:40 A call to trx_purge_truncate_history() will attempt to truncate all undo tablespaces whose size exceeds the soft limit innodb_max_undo_log_size . I tested my fix also outside shutdown: --source include/have_innodb.inc --source include/have_sequence.inc SET GLOBAL innodb_undo_log_truncate= OFF ; CREATE TABLE t(a INT PRIMARY KEY , b INT UNIQUE ) ENGINE=InnoDB; INSERT INTO t SELECT seq, NULL FROM seq_1_to_130000; UPDATE t SET b=a; DROP TABLE t; SET GLOBAL innodb_undo_log_truncate= ON ; SET GLOBAL innodb_max_purge_lag_wait=0; My fix will cause SET GLOBAL innodb_undo_log_truncate=ON to wake up the purge coordinator in case it is not running.

Vladislav Lesin added a comment - 2023-06-08 06:14

LGTM

Vladislav Lesin added a comment - 2023-06-08 06:14 LGTM

Marko Mäkelä added a comment - 2023-06-27 08:51

Related to this, I was wondering if it would make sense to change the default value of the confusingly named parameter innodb_purge_rseg_truncate_frequency to 1 (for the maximum frequency), so that undo log pages would be freed more frequently even when using the default setting innodb_undo_log_truncate=OFF. axel tested that and found that it would slightly reduce throughput.

Marko Mäkelä added a comment - 2023-06-27 08:51 Related to this, I was wondering if it would make sense to change the default value of the confusingly named parameter innodb_purge_rseg_truncate_frequency to 1 (for the maximum frequency), so that undo log pages would be freed more frequently even when using the default setting innodb_undo_log_truncate=OFF . axel tested that and found that it would slightly reduce throughput.

MariaDB Server

SET GLOBAL innodb_undo_log_truncate=ON does not free space when no undo logs exist

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration