Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34389

Don't write FILE_CHECKPOINT during early recovery when InnoDB log file in insufficient

Details

    Description

      Noticed this issue while writing the test case for MDEV-34156. Problem is that InnoDB
      tries to write FILE_CHECKPOINT marker even though Innodb log file size is
      insufficient. This could corrupt the log file and make it unrecoverable.

      Better to add the condition to check :

      diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
      index 6b6a686823c..52ad0306d90 100644
      --- a/storage/innobase/log/log0recv.cc
      +++ b/storage/innobase/log/log0recv.cc
      @@ -4728,7 +4728,9 @@ dberr_t recv_recovery_from_checkpoint_start()
       		}
       		log_sys.set_buf_free(recv_sys.offset);
       		if (recv_needed_recovery
      -	            && srv_operation <= SRV_OPERATION_EXPORT_RESTORED) {
      +	            && srv_operation <= SRV_OPERATION_EXPORT_RESTORED
      +		    && recv_sys.lsn - log_sys.last_checkpoint_lsn
      +		    < log_sys.log_capacity) {
       			/* Write a FILE_CHECKPOINT marker as the first thing,
       			before generating any other redo log. This ensures
       			that subsequent crash recovery will be possible even
      

      Wrote a test case to repeat the error and tried to avoid writing redo log for it.
      But log_checkpoint_margin() eventually calls log_checkpoint and it triggers
      fil_names_clear() and this could still corrupt the log file again.

      Thread 2 hit Breakpoint 1, log_overwrite_warning (lsn=9826465) at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/mtr/mtr0mtr.cc:883
      883	  if (log_sys.overwrite_warned)
      (rr) where
      #0  log_overwrite_warning (lsn=9826465) at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/mtr/mtr0mtr.cc:883
      #1  0x000055f5acd4852c in log_close (lsn=9826465) at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/mtr/mtr0mtr.cc:1086
      #2  0x000055f5acd5138b in mtr_t::finish_writer<false, false> (mtr=0x7faefbffd6e0, len=118)
          at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/mtr/mtr0mtr.cc:1393
      #3  0x000055f5acd4cab9 in mtr_t::finish_write (this=0x7faefbffd6e0, len=118)
          at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/include/mtr0mtr.h:724
      #4  0x000055f5acd4795f in mtr_t::commit_files (this=0x7faefbffd6e0, checkpoint_lsn=63744)
          at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/mtr/mtr0mtr.cc:785
      #5  0x000055f5acfcd4d8 in fil_names_clear (lsn=63744) at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/fil/fil0fil.cc:3141
      #6  0x000055f5acf578d5 in log_checkpoint_low (oldest_lsn=63744, end_lsn=9826347)
          at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/buf/buf0flu.cc:1950
      #7  0x000055f5acf588e4 in buf_flush_sync_for_checkpoint (lsn=1112053)
          at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/buf/buf0flu.cc:2195
      #8  0x000055f5acf59b8a in buf_flush_page_cleaner () at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/buf/buf0flu.cc:2439
      #9  0x000055f5acf5d8c4 in std::__invoke_impl<void, void (*)()> (__f=@0x55f5af294e28: 0x55f5acf59913 <buf_flush_page_cleaner()>)
          at /usr/include/c++/8/bits/invoke.h:60
      #10 0x000055f5acf5d5f4 in std::__invoke<void (*)()> (__fn=@0x55f5af294e28: 0x55f5acf59913 <buf_flush_page_cleaner()>)
          at /usr/include/c++/8/bits/invoke.h:95
      #11 0x000055f5acf5dbe4 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x55f5af294e28)
          at /usr/include/c++/8/thread:244
      #12 0x000055f5acf5dbb6 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x55f5af294e28) at /usr/include/c++/8/thread:253
      #13 0x000055f5acf5db96 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x55f5af294e20)
          at /usr/include/c++/8/thread:196
      #14 0x00007faf1e6e62b3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
      #15 0x00007faf1e294ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #16 0x00007faf1e325bf4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
       
       
      t 1
      [Switching to thread 1 (Thread 269939.269939)]
      #0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f5adf74f18 <buf_pool+17560>)
          at ./nptl/futex-internal.c:57
      57	./nptl/futex-internal.c: No such file or directory.
      (rr) where
      #0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f5adf74f18 <buf_pool+17560>)
          at ./nptl/futex-internal.c:57
      #1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x55f5adf74f18 <buf_pool+17560>)
          at ./nptl/futex-internal.c:87
      #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55f5adf74f18 <buf_pool+17560>, expected=expected@entry=0, 
          clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
      #3  0x00007faf1e293a41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55f5adf74da8 <buf_pool+17192>, 
          cond=0x55f5adf74ef0 <buf_pool+17520>) at ./nptl/pthread_cond_wait.c:503
      #4  ___pthread_cond_wait (cond=0x55f5adf74ef0 <buf_pool+17520>, mutex=0x55f5adf74da8 <buf_pool+17192>) at ./nptl/pthread_cond_wait.c:627
      #5  0x000055f5ad0f98d8 in safe_cond_wait (cond=0x55f5adf74ef0 <buf_pool+17520>, mp=0x55f5adf74d80 <buf_pool+17152>, 
          file=0x55f5ad808940 "/home/thiru/mariadb_parent/server/10.11-work/storage/innobase/buf/buf0flu.cc", line=2023)
          at /home/thiru/mariadb_parent/server/10.11-work/mysys/thr_mutex.c:489
      #6  0x000055f5acf57d1a in buf_flush_wait (lsn=1112053) at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/buf/buf0flu.cc:2022
      #7  0x000055f5acf581e2 in buf_flush_wait_flushed (sync_lsn=1112053)
          at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/buf/buf0flu.cc:2080
      #8  0x000055f5acd0718f in log_checkpoint_margin () at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/log/log0log.cc:1055
      #9  0x000055f5acd072bf in log_free_check () at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/log/log0log.cc:1070
      #10 0x000055f5acd9f9e5 in que_run_threads_low (thr=0x55f5af431f28)
          at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/que/que0que.cc:639
      #11 0x000055f5acd9fb09 in que_run_threads (thr=0x55f5af431f28)
          at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/que/que0que.cc:664
      #12 0x000055f5acd9fce6 in que_eval_sql (info=0x55f5af44c2b8, 
          sql=0x55f5ad7841a0 <row_merge_drop_temp_indexes()::sql> "PROCEDURE DROP_TEMP_INDEXES_PROC () IS\nixid CHAR;\nfound INT;\nDECLARE FUNCTION drop_fts;\nDECLARE CURSOR fts_cur IS\n SELECT TABLE_ID,ID FROM SYS_INDEXES\n WHERE TYPE=32 AND SUBSTR(NAME,0,1)='\377'\n FOR UPDA"..., 
          trx=0x7faf14720b80) at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/que/que0que.cc:703
      #13 0x000055f5acdffd01 in row_merge_drop_temp_indexes ()
          at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/row/row0merge.cc:4330
      #14 0x000055f5ace80bf7 in srv_start (create_new_db=false)
          at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/srv/srv0start.cc:1795
      #15 0x000055f5acc276fa in innodb_init (p=0x55f5af17ccd8)
          at /home/thiru/mariadb_parent/server/10.11-work/storage/innobase/handler/ha_innodb.cc:4230
      #16 0x000055f5ac7e8b3f in ha_initialize_handlerton (plugin=0x55f5af0bdfe0)
          at /home/thiru/mariadb_parent/server/10.11-work/sql/handler.cc:655
      #17 0x000055f5ac440060 in plugin_do_initialize (plugin=0x55f5af0bdfe0, state=@0x7ffea2adb3f8: 4)
          at /home/thiru/mariadb_parent/server/10.11-work/sql/sql_plugin.cc:1454
      #18 0x000055f5ac440401 in plugin_initialize (tmp_root=0x7ffea2adb9b0, plugin=0x55f5af0bdfe0, argc=0x55f5ae16b618 <remaining_argc>, 
          argv=0x55f5af00aca8, options_only=false) at /home/thiru/mariadb_parent/server/10.11-work/sql/sql_plugin.cc:1507
      

      Attachments

        Activity

          Test case to repeat the issue:

          --source include/have_innodb.inc
          --source include/have_sequence.inc
          --source include/have_debug.inc
          --source include/not_embedded.inc
          create table t1(f1 int not null, f2 text)engine=innodb;
          create table t2(f1 int not null, f2 text)engine=innodb;
          let $shutdown_timeout=;
          set global innodb_fast_shutdown=0;
          let $restart_parameters=--debug_dbug=+d,ib_log_checkpoint_avoid_hard --innodb_flush_sync=0;
          --source include/restart_mysqld.inc
          insert into t1 select seq, repeat('b', 8000) from seq_1_to_128;
          insert into t2 select seq, repeat('a', 8000) from seq_1_to_6;
          delete from t2;
          set global innodb_max_purge_lag_wait=0;
          insert into t1 select seq, repeat('b', 8000) from seq_1_to_1024;
          insert into t2 select seq, repeat('a', 8000) from seq_1_to_32;
          let $restart_parameters=;
          let $shutdown_timeout=0;
          --source include/restart_mysqld.inc
          alter table t1 force;
          alter table t2 force;
          drop table t1, t2;
          

          This gives error during recovery:

          2024-06-05 13:46:15 0 [ERROR] InnoDB: Crash recovery is broken due to insufficient innodb_log_file_size; last checkpoint LSN=63493, current LSN=9826481
          

          thiru Thirunarayanan Balathandayuthapani added a comment - Test case to repeat the issue: --source include/have_innodb.inc --source include/have_sequence.inc --source include/have_debug.inc --source include/not_embedded.inc create table t1(f1 int not null, f2 text)engine=innodb; create table t2(f1 int not null, f2 text)engine=innodb; let $shutdown_timeout=; set global innodb_fast_shutdown=0; let $restart_parameters=--debug_dbug=+d,ib_log_checkpoint_avoid_hard --innodb_flush_sync=0; --source include/restart_mysqld.inc insert into t1 select seq, repeat('b', 8000) from seq_1_to_128; insert into t2 select seq, repeat('a', 8000) from seq_1_to_6; delete from t2; set global innodb_max_purge_lag_wait=0; insert into t1 select seq, repeat('b', 8000) from seq_1_to_1024; insert into t2 select seq, repeat('a', 8000) from seq_1_to_32; let $restart_parameters=; let $shutdown_timeout=0; --source include/restart_mysqld.inc alter table t1 force; alter table t2 force; drop table t1, t2; This gives error during recovery: 2024-06-05 13:46:15 0 [ERROR] InnoDB: Crash recovery is broken due to insufficient innodb_log_file_size; last checkpoint LSN=63493, current LSN=9826481

          The bug title needs to emphasize "during early recovery" because it is not possible or feasible to avoid writing a checkpoint marker later on in log_checkpoint_low(). The point of this fix is that if the server is killed during recovery, we are narrowing down the window of opportunity when the log will not be recoverable. During the log checkpoint at the end of recovery completed, we must already have written out all pending changes to the persistent files. What remains to be done is to write some log records for the checkpoint and to update the checkpoint header. Only if the server were killed between the two writes to the ib_logfile0, the log would be unrecoverable.

          marko Marko Mäkelä added a comment - The bug title needs to emphasize "during early recovery" because it is not possible or feasible to avoid writing a checkpoint marker later on in log_checkpoint_low(). The point of this fix is that if the server is killed during recovery, we are narrowing down the window of opportunity when the log will not be recoverable. During the log checkpoint at the end of recovery completed, we must already have written out all pending changes to the persistent files. What remains to be done is to write some log records for the checkpoint and to update the checkpoint header. Only if the server were killed between the two writes to the ib_logfile0 , the log would be unrecoverable.

          People

            thiru Thirunarayanan Balathandayuthapani
            thiru Thirunarayanan Balathandayuthapani
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.