[MDEV-34389] Don't write FILE_CHECKPOINT during early recovery when InnoDB log file in insufficient - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.5, 10.6, 11.1(EOL), 11.2(EOL), 11.4
Fix Version/s: 10.5.26, 10.6.19, 10.11.9, 11.1.6, 11.2.5, 11.4.3
Component/s: Storage Engine - InnoDB
Labels:
None

Description

Noticed this issue while writing the test case for ~~MDEV-34156~~. Problem is that InnoDB
tries to write FILE_CHECKPOINT marker even though Innodb log file size is
insufficient. This could corrupt the log file and make it unrecoverable.

Better to add the condition to check :

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc

index 6b6a686823c..52ad0306d90 100644

--- a/storage/innobase/log/log0recv.cc

+++ b/storage/innobase/log/log0recv.cc

@@ -4728,7 +4728,9 @@ dberr_t recv_recovery_from_checkpoint_start()

 		log_sys.set_buf_free(recv_sys.offset);

 		if (recv_needed_recovery

-	            && srv_operation <= SRV_OPERATION_EXPORT_RESTORED) {

+	            && srv_operation <= SRV_OPERATION_EXPORT_RESTORED

+		    && recv_sys.lsn - log_sys.last_checkpoint_lsn

+		    < log_sys.log_capacity) {

 			/* Write a FILE_CHECKPOINT marker as the first thing,

 			before generating any other redo log. This ensures

 			that subsequent crash recovery will be possible even

Wrote a test case to repeat the error and tried to avoid writing redo log for it.
But log_checkpoint_margin() eventually calls log_checkpoint and it triggers
fil_names_clear() and this could still corrupt the log file again.

Thread 2 hit Breakpoint 1, log_overwrite_warning (lsn=9826465) at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/mtr/mtr0mtr.cc:883

883	  if (log_sys.overwrite_warned)

(rr) where

#0  log_overwrite_warning (lsn=9826465) at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/mtr/mtr0mtr.cc:883

#1  0x000055f5acd4852c in log_close (lsn=9826465) at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/mtr/mtr0mtr.cc:1086

#2  0x000055f5acd5138b in mtr_t::finish_writer<false, false> (mtr=0x7faefbffd6e0, len=118)

    at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/mtr/mtr0mtr.cc:1393

#3  0x000055f5acd4cab9 in mtr_t::finish_write (this=0x7faefbffd6e0, len=118)

    at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/include/mtr0mtr.h:724

#4  0x000055f5acd4795f in mtr_t::commit_files (this=0x7faefbffd6e0, checkpoint_lsn=63744)

    at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/mtr/mtr0mtr.cc:785

#5  0x000055f5acfcd4d8 in fil_names_clear (lsn=63744) at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/fil/fil0fil.cc:3141

#6  0x000055f5acf578d5 in log_checkpoint_low (oldest_lsn=63744, end_lsn=9826347)

    at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/buf/buf0flu.cc:1950

#7  0x000055f5acf588e4 in buf_flush_sync_for_checkpoint (lsn=1112053)

    at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/buf/buf0flu.cc:2195

#8  0x000055f5acf59b8a in buf_flush_page_cleaner () at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/buf/buf0flu.cc:2439

#9  0x000055f5acf5d8c4 in std::__invoke_impl<void, void (*)()> (__f=@0x55f5af294e28: 0x55f5acf59913 <buf_flush_page_cleaner()>)

    at /usr/include/c++/8/bits/invoke.h:60

#10 0x000055f5acf5d5f4 in std::__invoke<void (*)()> (__fn=@0x55f5af294e28: 0x55f5acf59913 <buf_flush_page_cleaner()>)

    at /usr/include/c++/8/bits/invoke.h:95

#11 0x000055f5acf5dbe4 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x55f5af294e28)

    at /usr/include/c++/8/thread:244

#12 0x000055f5acf5dbb6 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x55f5af294e28) at /usr/include/c++/8/thread:253

#13 0x000055f5acf5db96 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x55f5af294e20)

    at /usr/include/c++/8/thread:196

#14 0x00007faf1e6e62b3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6

#15 0x00007faf1e294ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442

#16 0x00007faf1e325bf4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

t 1

[Switching to thread 1 (Thread 269939.269939)]

#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f5adf74f18 <buf_pool+17560>)

    at ./nptl/futex-internal.c:57

57	./nptl/futex-internal.c: No such file or directory.

(rr) where

#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f5adf74f18 <buf_pool+17560>)

    at ./nptl/futex-internal.c:57

#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x55f5adf74f18 <buf_pool+17560>)

    at ./nptl/futex-internal.c:87

#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55f5adf74f18 <buf_pool+17560>, expected=expected@entry=0,

    clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139

#3  0x00007faf1e293a41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55f5adf74da8 <buf_pool+17192>,

    cond=0x55f5adf74ef0 <buf_pool+17520>) at ./nptl/pthread_cond_wait.c:503

#4  ___pthread_cond_wait (cond=0x55f5adf74ef0 <buf_pool+17520>, mutex=0x55f5adf74da8 <buf_pool+17192>) at ./nptl/pthread_cond_wait.c:627

#5  0x000055f5ad0f98d8 in safe_cond_wait (cond=0x55f5adf74ef0 <buf_pool+17520>, mp=0x55f5adf74d80 <buf_pool+17152>,

    file=0x55f5ad808940 "/home/thiru/mariadb_parent/server/10.11-work/storage/innobase/buf/buf0flu.cc", line=2023)

    at /home/thiru/mariadb_parent/server/10.11-work/mysys/thr_mutex.c:489

#6  0x000055f5acf57d1a in buf_flush_wait (lsn=1112053) at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/buf/buf0flu.cc:2022

#7  0x000055f5acf581e2 in buf_flush_wait_flushed (sync_lsn=1112053)

    at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/buf/buf0flu.cc:2080

#8  0x000055f5acd0718f in log_checkpoint_margin () at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/log/log0log.cc:1055

#9  0x000055f5acd072bf in log_free_check () at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/log/log0log.cc:1070

#10 0x000055f5acd9f9e5 in que_run_threads_low (thr=0x55f5af431f28)

    at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/que/que0que.cc:639

#11 0x000055f5acd9fb09 in que_run_threads (thr=0x55f5af431f28)

    at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/que/que0que.cc:664

#12 0x000055f5acd9fce6 in que_eval_sql (info=0x55f5af44c2b8,

    sql=0x55f5ad7841a0 <row_merge_drop_temp_indexes()::sql> "PROCEDURE DROP_TEMP_INDEXES_PROC () IS\nixid CHAR;\nfound INT;\nDECLARE FUNCTION drop_fts;\nDECLARE CURSOR fts_cur IS\n SELECT TABLE_ID,ID FROM SYS_INDEXES\n WHERE TYPE=32 AND SUBSTR(NAME,0,1)='\377'\n FOR UPDA"...,

    trx=0x7faf14720b80) at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/que/que0que.cc:703

#13 0x000055f5acdffd01 in row_merge_drop_temp_indexes ()

    at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/row/row0merge.cc:4330

#14 0x000055f5ace80bf7 in srv_start (create_new_db=false)

    at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/srv/srv0start.cc:1795

#15 0x000055f5acc276fa in innodb_init (p=0x55f5af17ccd8)

    at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/handler/ha_innodb.cc:4230

#16 0x000055f5ac7e8b3f in ha_initialize_handlerton (plugin=0x55f5af0bdfe0)

    at /home/thiru/mariadb_parent/server/10.11-work/sql/handler.cc:655

#17 0x000055f5ac440060 in plugin_do_initialize (plugin=0x55f5af0bdfe0, state=@0x7ffea2adb3f8: 4)

    at /home/thiru/mariadb_parent/server/10.11-work/sql/sql_plugin.cc:1454

#18 0x000055f5ac440401 in plugin_initialize (tmp_root=0x7ffea2adb9b0, plugin=0x55f5af0bdfe0, argc=0x55f5ae16b618 <remaining_argc>,

    argv=0x55f5af00aca8, options_only=false) at /home/thiru/mariadb_parent/server/10.11-work/sql/sql_plugin.cc:1507

Attachments

Activity

Ascending order - Click to sort in descending order

Thirunarayanan Balathandayuthapani added a comment - 2024-06-14 08:32

Test case to repeat the issue:

--source include/have_innodb.inc

--source include/have_sequence.inc

--source include/have_debug.inc

--source include/not_embedded.inc

create table t1(f1 int not null, f2 text)engine=innodb;

create table t2(f1 int not null, f2 text)engine=innodb;

let $shutdown_timeout=;

set global innodb_fast_shutdown=0;

let $restart_parameters=--debug_dbug=+d,ib_log_checkpoint_avoid_hard --innodb_flush_sync=0;

--source include/restart_mysqld.inc

insert into t1 select seq, repeat('b', 8000) from seq_1_to_128;

insert into t2 select seq, repeat('a', 8000) from seq_1_to_6;

delete from t2;

set global innodb_max_purge_lag_wait=0;

insert into t1 select seq, repeat('b', 8000) from seq_1_to_1024;

insert into t2 select seq, repeat('a', 8000) from seq_1_to_32;

let $restart_parameters=;

let $shutdown_timeout=0;

--source include/restart_mysqld.inc

alter table t1 force;

alter table t2 force;

drop table t1, t2;

This gives error during recovery:

2024-06-05 13:46:15 0 [ERROR] InnoDB: Crash recovery is broken due to insufficient innodb_log_file_size; last checkpoint LSN=63493, current LSN=9826481

Thirunarayanan Balathandayuthapani added a comment - 2024-06-14 08:32 Test case to repeat the issue: --source include/have_innodb.inc --source include/have_sequence.inc --source include/have_debug.inc --source include/not_embedded.inc create table t1(f1 int not null, f2 text)engine=innodb; create table t2(f1 int not null, f2 text)engine=innodb; let $shutdown_timeout=; set global innodb_fast_shutdown=0; let $restart_parameters=--debug_dbug=+d,ib_log_checkpoint_avoid_hard --innodb_flush_sync=0; --source include/restart_mysqld.inc insert into t1 select seq, repeat('b', 8000) from seq_1_to_128; insert into t2 select seq, repeat('a', 8000) from seq_1_to_6; delete from t2; set global innodb_max_purge_lag_wait=0; insert into t1 select seq, repeat('b', 8000) from seq_1_to_1024; insert into t2 select seq, repeat('a', 8000) from seq_1_to_32; let $restart_parameters=; let $shutdown_timeout=0; --source include/restart_mysqld.inc alter table t1 force; alter table t2 force; drop table t1, t2; This gives error during recovery: 2024-06-05 13:46:15 0 [ERROR] InnoDB: Crash recovery is broken due to insufficient innodb_log_file_size; last checkpoint LSN=63493, current LSN=9826481

Thirunarayanan Balathandayuthapani added a comment - 2024-06-14 08:33

https://github.com/MariaDB/server/pull/3337

Thirunarayanan Balathandayuthapani added a comment - 2024-06-14 08:33 https://github.com/MariaDB/server/pull/3337

Marko Mäkelä added a comment - 2024-06-17 07:40

The bug title needs to emphasize "during early recovery" because it is not possible or feasible to avoid writing a checkpoint marker later on in log_checkpoint_low(). The point of this fix is that if the server is killed during recovery, we are narrowing down the window of opportunity when the log will not be recoverable. During the log checkpoint at the end of recovery completed, we must already have written out all pending changes to the persistent files. What remains to be done is to write some log records for the checkpoint and to update the checkpoint header. Only if the server were killed between the two writes to the ib_logfile0, the log would be unrecoverable.

Marko Mäkelä added a comment - 2024-06-17 07:40 The bug title needs to emphasize "during early recovery" because it is not possible or feasible to avoid writing a checkpoint marker later on in log_checkpoint_low(). The point of this fix is that if the server is killed during recovery, we are narrowing down the window of opportunity when the log will not be recoverable. During the log checkpoint at the end of recovery completed, we must already have written out all pending changes to the persistent files. What remains to be done is to write some log records for the checkpoint and to update the checkpoint header. Only if the server were killed between the two writes to the ib_logfile0 , the log would be unrecoverable.

MariaDB Server

Don't write FILE_CHECKPOINT during early recovery when InnoDB log file in insufficient

Details

Description

Attachments

Activity

People

Dates

Git Integration