[MDEV-27812] Allow innodb_log_file_size to change without server restart Created: 2022-02-11  Updated: 2024-02-02  Resolved: 2022-03-02

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Fix Version/s: 10.9.0

Type: Task Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: Preview_10.9, usability

Issue Links:
PartOf
is part of MDEV-28112 prepare 10.9.0 preview releases Closed
Problem/Incident
causes MDEV-31311 The test innodb.log_file_size_online ... Closed
causes MDEV-33361 Excessive delays in SET GLOBAL innodb... Closed
Relates
relates to MDEV-21870 Deprecate and ignore innodb_scrub_log... Closed
relates to MDEV-14425 Change the InnoDB redo log format to ... Closed
relates to MDEV-14992 BACKUP: in-server backup Open
relates to MDEV-27199 Require ib_logfile0 to exist unless i... Closed
relates to MDEV-27772 Performance regression with default c... Open

 Description   

Currently, if the parameters of the redo log change, InnoDB in the MariaDB Community Server will rebuild the redo log at server startup.

MariaDB Enterprise Server 10.5 and 10.6 allow dynamic tuning of the redo log parameters, rebuilding the redo log in a crash-safe manner without restarting the server.

SET GLOBAL innodb_log_file_size=…;

Before MDEV-14425 changed the redo log file format in 10.8, we were be unable to enable or disable innodb_encrypt_log without server restart, because starting with MDEV-12041 in MariaDB 10.4, the encrypted redo log blocks will have 4 bytes less payload per 512-byte log block.

Log resizing is tied to checkpoints. We can start writing a second redo log ib_logfile101 with the requested new size, starting from something close to the last written log sequence number. On log checkpoint completion, we can switch files, provided that the checkpoint LSN was not earlier than the start LSN of the resized log file. When resizing to a small log file during a heavy write workload, multiple checkpoints may be necessary.

While technically it would be possible to rebuild the log for changing innodb_encrypt_log, this task does not implement it, because it would require a non-trivial transformation between the log record streams that are being written to the current log file (ib_logfile0) and the future log file (ib_logfile101 that will replace ib_logfile0).

Rebuilding the log file will obviously cause disruption to mariadb-backup --backup, because the old log file will stop receiving writes once the server has switched to another log file. This could be addressed in MDEV-14992 by letting the server provide a log record stream directly.



 Comments   
Comment by Marko Mäkelä [ 2022-02-23 ]

wlad, please review.

Edit: The MoveFileEx() failure on Windows that prevented log resizing from succeeding was fixed by closing both file handles before the rename operation, and reopening ib_logfile0 afterwards (on Windows only). We already did something similar when resizing the log on server startup.

Comment by Marko Mäkelä [ 2022-02-23 ]

I initiated a non-PMEM test with 30×10000 runs of the test (which does normal restart and kill and restart across log resizing).

bb-10.9-MDEV-27812 ec563aa54435ac4bb32d2a5165eb8652ff8cd293

innodb.log_file_size_online 'encrypted,innodb' w16 [ 1009 pass ]  14198
innodb.log_file_size_online 'innodb,slow' w17 [ 1021 pass ]  13764
innodb.log_file_size_online 'encrypted,innodb' w10 [ 1010 pass ]  14187
innodb.log_file_size_online 'innodb,slow' w4 [ 1021 pass ]  14030
innodb.log_file_size_online 'innodb,slow' w29 [ 1021 pass ]  13921
innodb.log_file_size_online 'innodb,slow' w1 [ 1021 pass ]  14125
innodb.log_file_size_online 'innodb,slow' w8 [ 1021 pass ]  14119
innodb.log_file_size_online 'innodb,slow' w22 [ 1021 pass ]  13931
innodb.log_file_size_online 'innodb,slow' w5 [ 1022 pass ]  14000
innodb.log_file_size_online 'encrypted,innodb' w26 [ 1009 pass ]  14175
innodb.log_file_size_online 'encrypted,innodb' w13 [ 1009 pass ]  13996
innodb.log_file_size_online 'encrypted,innodb' w21 [ 1010 pass ]  14104
innodb.log_file_size_online 'innodb,slow' w11 [ 1022 pass ]  13828
innodb.log_file_size_online 'encrypted,innodb' w20 [ 1010 pass ]  14160
innodb.log_file_size_online 'encrypted,innodb' w15 [ 1010 pass ]  14188
innodb.log_file_size_online 'encrypted,innodb' w18 [ 1010 pass ]  14398
innodb.log_file_size_online 'encrypted,innodb' w12 [ 1010 pass ]  14111

I thought that it was 30×1000, which would have taken a few hours. I aborted it here. To run it on /dev/shm while having libpmem installed, I applied the following patch:

diff a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc
--- a/storage/innobase/log/log0log.cc
+++ b/storage/innobase/log/log0log.cc
@@ -177,7 +177,7 @@ static void *log_mmap(os_file_t file, os_offset_t size)
     my_mmap(0, size_t(size),
             srv_read_only_mode ? PROT_READ : PROT_READ | PROT_WRITE,
             MAP_SHARED_VALIDATE | MAP_SYNC, file, 0);
-#ifdef __linux__
+#if 0
   if (ptr == MAP_FAILED)
   {
     struct stat st;

The PMEM based log resizing was much easier to get working.

I will force-push the branch to address some review comments.

Comment by Marko Mäkelä [ 2022-02-23 ]

Log resizing will as a byproduct perform ‘log scrubbing’, which was an old MariaDB feature that was removed MDEV-21870 because it did not work correctly.

Comment by Vladislav Vaintroub [ 2022-02-24 ]

Looks ok to me

Comment by Marko Mäkelä [ 2022-02-25 ]

wlad, thank you. Since your review, I had to refine the code a bit in the SET GLOBAL innodb_log_file_size update callback. The mtr based test that I conducted a few days ago was only ensuring that the resized log is restart-safe and crash-safe.

mleich produced rr replay traces and some core dumps for several race conditions that occurred when multiple threads were attempting to change the size concurrently, while some of the connections were being killed. I will wait for his final verdict before pushing this.

Comment by Marko Mäkelä [ 2022-02-28 ]

There were some problems with the log file wrap-around. I developed the following test to exercise that code. The idea of this test is to generate varying-length mini-transaction log from 2 DML connections while another connection is alternating the log file size between the two smallest allowed values, to maximize the probability of log buffer wrap-around events.

--source include/have_innodb.inc
 
CREATE TABLE t1(a TINYINT PRIMARY KEY, b INT NOT NULL) ENGINE=InnoDB;
INSERT INTO t1 VALUES(1,1);
 
delimiter //;
create procedure uproc(repeat_count int)
begin
  declare current_num int;
  set current_num = 0;
  while current_num < repeat_count do
    update t1 set b=0;
    update t1 set b=256;
    update t1 set b=65536;
    update t1 set b=16777216;
    set current_num = current_num + 1;
  end while;
end//
 
create procedure sproc(repeat_count int)
begin
  declare current_num int;
  set current_num = 0;
  while current_num < repeat_count do
    SET GLOBAL innodb_log_file_size=4096*1024;
    SET GLOBAL innodb_log_file_size=4096*1025;
    set current_num = current_num + 1;
  end while;
end//
 
delimiter ;//
 
connect (u,localhost,root);
send call uproc(1000000);
connect (v,localhost,root);
send call uproc(1000000);
 
connection default;
call sproc(100000);
 
connection u;
reap;
disconnect u;
connection v;
reap;
disconnect v;
 
connection default;
 
drop table t1;

This test would be killed by mtr after 15 minutes (900 seconds) both with and without PMEM. I was only able to repeat the problems when running multiple instances of the test concurrently, and without using rr record:

./mtr --parallel=auto innodb.MDEV-27812{,,,,,,,,,,,,,,,}

After some fixes, the implementation survived the my tests for 2×15 minutes with PMEM, and another 2×15 minutes without PMEM.

Comment by Marko Mäkelä [ 2022-03-02 ]

Starting, aborting and finishing the log resizing has to be protected by all of flush_lock, write_lock, and exclusive log_sys.latch to avoid race conditions with concurrent log_write_up_to(). Sufficient locking was in place in log_sys.resize_abort() since quite a time. The race conditions in starting and finishing the resizing were fixed today.

Comment by Matthias Leich [ 2022-03-02 ]

The tree
origin/bb-10.9-MDEV-27812 05d1faec3661176b039db6beee60bcdbb3bc00d8 2022-03-02T14:14:47+02:00
behaved well in RQG testing.

Comment by Marko Mäkelä [ 2022-07-26 ]

For the record, MySQL 8.0.30 includes a conceptually similar change to the MariaDB one (fixup 1, 2):
WL#12527 InnoDB: Dynamic configuration of space occupied by redo log files

Generated at Thu Feb 08 09:55:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.