Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28879

Assertion `l->lsn <= log_sys.get_lsn()' failed around recv_recover_page

Details

    Description

      The MTR test encryption.innodb-redo-nokeys fails with

      mariadbd: /data/Server/10.9/storage/innobase/log/log0recv.cc:2921: buf_block_t* recv_recover_page(buf_block_t*, mtr_t&, const iterator&, fil_space_t*, mlog_init_t::init*): Assertion `l->lsn <= log_sys.get_lsn()' failed.
       
      Thread 2 (Thread 2532084.2532368):
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      #1  0x00007f3c35d10859 in __GI_abort () at abort.c:79
      #2  0x00007f3c35d10729 in __assert_fail_base (fmt=0x7f3c35ea6588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55eb75e86d00 "l->lsn <= log_sys.get_lsn()", file=0x55eb75e82060 "/data/Server/10.9/storage/innobase/log/log0recv.cc", line=2921, function=<optimized out>) at assert.c:92
      #3  0x00007f3c35d21f36 in __GI___assert_fail (assertion=0x55eb75e86d00 "l->lsn <= log_sys.get_lsn()", file=0x55eb75e82060 "/data/Server/10.9/storage/innobase/log/log0recv.cc", line=2921, function=0x55eb75e86940 "buf_block_t* recv_recover_page(buf_block_t*, mtr_t&, const iterator&, fil_space_t*, mlog_init_t::init*)") at assert.c:101
      #4  0x000055eb74b0e7ce in recv_recover_page (block=0x7f3c31ad01a0, mtr=..., p=Traceback (most recent call last):
        File "/lib/x86_64-linux-gnu/../../share/gcc/python/libstdcxx/v6/printers.py", line 1548, in __call__
          return self.lookup[basename].invoke(val)
        File "/lib/x86_64-linux-gnu/../../share/gcc/python/libstdcxx/v6/printers.py", line 1485, in invoke
          return self.function(self.name, value)
        File "/lib/x86_64-linux-gnu/../../share/gcc/python/libstdcxx/v6/printers.py", line 649, in __init__
          self.link_type = nodetype.pointer()
      AttributeError: 'NoneType' object has no attribute 'pointer'
      ..., space=0x612000023a40, init=0x0) at /data/Server/10.9/storage/innobase/log/log0recv.cc:2921
      #5  0x000055eb74b105e6 in recv_recover_page (space=0x612000023a40, bpage=0x7f3c31ad01a0) at /data/Server/10.9/storage/innobase/log/log0recv.cc:3195
      #6  0x000055eb74f0cba5 in buf_page_t::read_complete (this=0x7f3c31ad01a0, node=...) at /data/Server/10.9/storage/innobase/buf/buf0buf.cc:3697
      #7  0x000055eb75027b15 in fil_aio_callback (request=...) at /data/Server/10.9/storage/innobase/fil/fil0fil.cc:2889
      #8  0x000055eb74b9984b in io_callback (cb=0x631000026a58) at /data/Server/10.9/storage/innobase/os/os0file.cc:3512
      #9  0x000055eb75152800 in tpool::task_group::execute (this=0x612000023650, t=0x631000026ab0) at /data/Server/10.9/tpool/task_group.cc:55
      #10 0x000055eb751530c9 in tpool::task::execute (this=0x631000026ab0) at /data/Server/10.9/tpool/task.cc:32
      #11 0x000055eb7513f70b in tpool::thread_pool_generic::worker_main (this=0x618000000880, thread_var=0x63000001fd80) at /data/Server/10.9/tpool/tpool_generic.cc:588
      #12 0x000055eb75152261 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=@0x604000000a68: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x55eb7513f528 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __t=@0x604000000a60: 0x618000000880) at /usr/include/c++/9/bits/invoke.h:73
      --Type <RET> for more, q to quit, c to continue without paging--
      #13 0x000055eb7515200c in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=@0x604000000a68: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x55eb7513f528 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>) at /usr/include/c++/9/bits/invoke.h:95
      #14 0x000055eb75151e9d in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x604000000a58) at /usr/include/c++/9/thread:244
      #15 0x000055eb75151de7 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x604000000a58) at /usr/include/c++/9/thread:251
      #16 0x000055eb75151d4c in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x604000000a50) at /usr/include/c++/9/thread:195
      #17 0x00007f3c3611fde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
      #18 0x00007f3c3623a609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #19 0x00007f3c35e0d293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
       
      pluto:/data/49
      _RR_TRACE_DIR=log/mysqld.1.rr rr replay
       
      origin/10.9 commit baef53a70c675da6d19ac3c7f23c7b8b4ed8458c
       
      
      

      Attachments

        Issue Links

          Activity

            mleich Matthias Leich created issue -
            mleich Matthias Leich added a comment - - edited

            ./mtr --parallel=72 --repeat=64 --rr innodb-redo-nokeys{,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,}   | tee prt 2>&1
            

            mleich Matthias Leich added a comment - - edited ./mtr --parallel=72 --repeat=64 --rr innodb-redo-nokeys{,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,} | tee prt 2>&1
            serg Sergei Golubchik made changes -
            Field Original Value New Value
            Description
            {noformat}
            The MTR test encryption.innodb-redo-nokeys fails with
            mariadbd: /data/Server/10.9/storage/innobase/log/log0recv.cc:2921: buf_block_t* recv_recover_page(buf_block_t*, mtr_t&, const iterator&, fil_space_t*, mlog_init_t::init*): Assertion `l->lsn <= log_sys.get_lsn()' failed.

            Thread 2 (Thread 2532084.2532368):
            #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
            #1 0x00007f3c35d10859 in __GI_abort () at abort.c:79
            #2 0x00007f3c35d10729 in __assert_fail_base (fmt=0x7f3c35ea6588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55eb75e86d00 "l->lsn <= log_sys.get_lsn()", file=0x55eb75e82060 "/data/Server/10.9/storage/innobase/log/log0recv.cc", line=2921, function=<optimized out>) at assert.c:92
            #3 0x00007f3c35d21f36 in __GI___assert_fail (assertion=0x55eb75e86d00 "l->lsn <= log_sys.get_lsn()", file=0x55eb75e82060 "/data/Server/10.9/storage/innobase/log/log0recv.cc", line=2921, function=0x55eb75e86940 "buf_block_t* recv_recover_page(buf_block_t*, mtr_t&, const iterator&, fil_space_t*, mlog_init_t::init*)") at assert.c:101
            #4 0x000055eb74b0e7ce in recv_recover_page (block=0x7f3c31ad01a0, mtr=..., p=Traceback (most recent call last):
              File "/lib/x86_64-linux-gnu/../../share/gcc/python/libstdcxx/v6/printers.py", line 1548, in __call__
                return self.lookup[basename].invoke(val)
              File "/lib/x86_64-linux-gnu/../../share/gcc/python/libstdcxx/v6/printers.py", line 1485, in invoke
                return self.function(self.name, value)
              File "/lib/x86_64-linux-gnu/../../share/gcc/python/libstdcxx/v6/printers.py", line 649, in __init__
                self.link_type = nodetype.pointer()
            AttributeError: 'NoneType' object has no attribute 'pointer'
            ..., space=0x612000023a40, init=0x0) at /data/Server/10.9/storage/innobase/log/log0recv.cc:2921
            #5 0x000055eb74b105e6 in recv_recover_page (space=0x612000023a40, bpage=0x7f3c31ad01a0) at /data/Server/10.9/storage/innobase/log/log0recv.cc:3195
            #6 0x000055eb74f0cba5 in buf_page_t::read_complete (this=0x7f3c31ad01a0, node=...) at /data/Server/10.9/storage/innobase/buf/buf0buf.cc:3697
            #7 0x000055eb75027b15 in fil_aio_callback (request=...) at /data/Server/10.9/storage/innobase/fil/fil0fil.cc:2889
            #8 0x000055eb74b9984b in io_callback (cb=0x631000026a58) at /data/Server/10.9/storage/innobase/os/os0file.cc:3512
            #9 0x000055eb75152800 in tpool::task_group::execute (this=0x612000023650, t=0x631000026ab0) at /data/Server/10.9/tpool/task_group.cc:55
            #10 0x000055eb751530c9 in tpool::task::execute (this=0x631000026ab0) at /data/Server/10.9/tpool/task.cc:32
            #11 0x000055eb7513f70b in tpool::thread_pool_generic::worker_main (this=0x618000000880, thread_var=0x63000001fd80) at /data/Server/10.9/tpool/tpool_generic.cc:588
            #12 0x000055eb75152261 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=@0x604000000a68: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x55eb7513f528 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __t=@0x604000000a60: 0x618000000880) at /usr/include/c++/9/bits/invoke.h:73
            --Type <RET> for more, q to quit, c to continue without paging--
            #13 0x000055eb7515200c in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=@0x604000000a68: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x55eb7513f528 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>) at /usr/include/c++/9/bits/invoke.h:95
            #14 0x000055eb75151e9d in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x604000000a58) at /usr/include/c++/9/thread:244
            #15 0x000055eb75151de7 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x604000000a58) at /usr/include/c++/9/thread:251
            #16 0x000055eb75151d4c in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x604000000a50) at /usr/include/c++/9/thread:195
            #17 0x00007f3c3611fde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
            #18 0x00007f3c3623a609 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #19 0x00007f3c35e0d293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

            pluto:/data/49
            _RR_TRACE_DIR=log/mysqld.1.rr rr replay

            origin/10.9 commit baef53a70c675da6d19ac3c7f23c7b8b4ed8458c


            {noformat}
            The MTR test encryption.innodb-redo-nokeys fails with
            {noformat}
            mariadbd: /data/Server/10.9/storage/innobase/log/log0recv.cc:2921: buf_block_t* recv_recover_page(buf_block_t*, mtr_t&, const iterator&, fil_space_t*, mlog_init_t::init*): Assertion `l->lsn <= log_sys.get_lsn()' failed.

            Thread 2 (Thread 2532084.2532368):
            #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
            #1 0x00007f3c35d10859 in __GI_abort () at abort.c:79
            #2 0x00007f3c35d10729 in __assert_fail_base (fmt=0x7f3c35ea6588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55eb75e86d00 "l->lsn <= log_sys.get_lsn()", file=0x55eb75e82060 "/data/Server/10.9/storage/innobase/log/log0recv.cc", line=2921, function=<optimized out>) at assert.c:92
            #3 0x00007f3c35d21f36 in __GI___assert_fail (assertion=0x55eb75e86d00 "l->lsn <= log_sys.get_lsn()", file=0x55eb75e82060 "/data/Server/10.9/storage/innobase/log/log0recv.cc", line=2921, function=0x55eb75e86940 "buf_block_t* recv_recover_page(buf_block_t*, mtr_t&, const iterator&, fil_space_t*, mlog_init_t::init*)") at assert.c:101
            #4 0x000055eb74b0e7ce in recv_recover_page (block=0x7f3c31ad01a0, mtr=..., p=Traceback (most recent call last):
              File "/lib/x86_64-linux-gnu/../../share/gcc/python/libstdcxx/v6/printers.py", line 1548, in __call__
                return self.lookup[basename].invoke(val)
              File "/lib/x86_64-linux-gnu/../../share/gcc/python/libstdcxx/v6/printers.py", line 1485, in invoke
                return self.function(self.name, value)
              File "/lib/x86_64-linux-gnu/../../share/gcc/python/libstdcxx/v6/printers.py", line 649, in __init__
                self.link_type = nodetype.pointer()
            AttributeError: 'NoneType' object has no attribute 'pointer'
            ..., space=0x612000023a40, init=0x0) at /data/Server/10.9/storage/innobase/log/log0recv.cc:2921
            #5 0x000055eb74b105e6 in recv_recover_page (space=0x612000023a40, bpage=0x7f3c31ad01a0) at /data/Server/10.9/storage/innobase/log/log0recv.cc:3195
            #6 0x000055eb74f0cba5 in buf_page_t::read_complete (this=0x7f3c31ad01a0, node=...) at /data/Server/10.9/storage/innobase/buf/buf0buf.cc:3697
            #7 0x000055eb75027b15 in fil_aio_callback (request=...) at /data/Server/10.9/storage/innobase/fil/fil0fil.cc:2889
            #8 0x000055eb74b9984b in io_callback (cb=0x631000026a58) at /data/Server/10.9/storage/innobase/os/os0file.cc:3512
            #9 0x000055eb75152800 in tpool::task_group::execute (this=0x612000023650, t=0x631000026ab0) at /data/Server/10.9/tpool/task_group.cc:55
            #10 0x000055eb751530c9 in tpool::task::execute (this=0x631000026ab0) at /data/Server/10.9/tpool/task.cc:32
            #11 0x000055eb7513f70b in tpool::thread_pool_generic::worker_main (this=0x618000000880, thread_var=0x63000001fd80) at /data/Server/10.9/tpool/tpool_generic.cc:588
            #12 0x000055eb75152261 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=@0x604000000a68: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x55eb7513f528 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __t=@0x604000000a60: 0x618000000880) at /usr/include/c++/9/bits/invoke.h:73
            --Type <RET> for more, q to quit, c to continue without paging--
            #13 0x000055eb7515200c in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=@0x604000000a68: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x55eb7513f528 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>) at /usr/include/c++/9/bits/invoke.h:95
            #14 0x000055eb75151e9d in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x604000000a58) at /usr/include/c++/9/thread:244
            #15 0x000055eb75151de7 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x604000000a58) at /usr/include/c++/9/thread:251
            #16 0x000055eb75151d4c in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x604000000a50) at /usr/include/c++/9/thread:195
            #17 0x00007f3c3611fde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
            #18 0x00007f3c3623a609 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #19 0x00007f3c35e0d293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

            pluto:/data/49
            _RR_TRACE_DIR=log/mysqld.1.rr rr replay

            origin/10.9 commit baef53a70c675da6d19ac3c7f23c7b8b4ed8458c


            {noformat}
            serg Sergei Golubchik made changes -
            Priority Major [ 3 ] Blocker [ 1 ]

            increased the priority, as it fails in 10.10 now in literally every build

            serg Sergei Golubchik added a comment - increased the priority, as it fails in 10.10 now in literally every build
            marko Marko Mäkelä made changes -

            The scenario involves recovery in multiple batches:

            10.9 20b8e5a07e50cd101b2d9d1d1099dd11577a9cc2

            2022-06-17 14:51:03 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=60381
            2022-06-17 14:51:04 0 [Note] InnoDB: Starting a batch to recover 224 pages from redo log.
            2022-06-17 14:51:04 0 [ERROR] InnoDB: Cannot apply log to [page id: space=5, page number=0] of corrupted file './test/t1.ibd'
            …
            2022-06-17 14:51:04 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages.
            mariadbd: /data/Server/10.9/storage/innobase/log/log0recv.cc:2921: buf_block_t* recv_recover_page(buf_block_t*, mtr_t&, const iterator&, fil_space_t*, mlog_init_t::init*): Assertion `l->lsn <= log_sys.get_lsn()' failed.
            

            The first quoted message says "a batch", not "final batch". The l->lsn is not larger than recv_sys.lsn (the recovered LSN), but it is larger than log_sys.lsn, which was last updated at the start of the recovery batch:

            10.9 20b8e5a07e50cd101b2d9d1d1099dd11577a9cc2

            #1  log_t::set_recovered_lsn (this=0x55eb77bdcac0 <log_sys>, lsn=1837116)
                at /data/Server/10.9/storage/innobase/include/log0log.h:399
            #2  0x000055eb74b14dd5 in recv_scan_log (last_phase=true)
                at /data/Server/10.9/storage/innobase/log/log0recv.cc:3741
            #3  0x000055eb74b187be in recv_recovery_from_checkpoint_start ()
                at /data/Server/10.9/storage/innobase/log/log0recv.cc:4212
            #4  0x000055eb74dacdd8 in srv_start (create_new_db=false)
                at /data/Server/10.9/storage/innobase/srv/srv0start.cc:1192
            #5  0x000055eb74930b17 in innodb_init (p=0x615000003298)
                at /data/Server/10.9/storage/innobase/handler/ha_innodb.cc:4251
            

            For debug builds, the probability of this failure may have been significantly increased by MDEV-18976.

            I think that the assertion is documenting a constraint in buf_page_t::flush() or buf_dblwr_t::flush_buffered_writes_completed() that does not actually exist:

                  log_write_up_to(lsn, true);
            

            The callee would be a no-op during the non-final recovery batch:

            void log_write_up_to(lsn_t lsn, bool durable,
                                 const completion_callback *callback)
            {
              ut_ad(!srv_read_only_mode);
              ut_ad(lsn != LSN_MAX);
             
              if (UNIV_UNLIKELY(recv_no_ibuf_operations))
              {
                /* A non-final batch of recovery is active no writes to the log
                are allowed yet. */
                ut_a(!callback);
                return;
              }
            

            This code would be executed by buf_flush_page_cleaner(), invoked from another thread that runs recv_sys_t::apply(), via buf_flush_sync_batch(lsn). Because of the condition, only recovered data pages and no log will be written. It does not really matter that the recovered FIL_PAGE_LSN is ahead of the log_sys.lsn. It suffices to update log_sys.lsn at the end of parsing the log records for the final recovery batch: log_sys.set_recovered_lsn(recv_sys.lsn) will be called at the end of recv_scan_log(true).

            diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
            index 171afa1f23c..14dad1bf6bb 100644
            --- a/storage/innobase/log/log0recv.cc
            +++ b/storage/innobase/log/log0recv.cc
            @@ -2918,7 +2918,7 @@ static buf_block_t *recv_recover_page(buf_block_t *block, mtr_t &mtr,
             		const log_phys_t* l = static_cast<const log_phys_t*>(recv);
             		ut_ad(l->lsn);
             		ut_ad(end_lsn <= l->lsn);
            -		ut_ad(l->lsn <= log_sys.get_lsn());
            +		ut_ad(l->lsn <= recv_sys.lsn);
             
             		ut_ad(l->start_lsn);
             		ut_ad(recv_start_lsn <= l->start_lsn);
            @@ -3738,7 +3738,6 @@ static bool recv_scan_log(bool last_phase)
                     else
                     {
                       ut_ad(store == STORE_IF_EXISTS);
            -          log_sys.set_recovered_lsn(recv_sys.lsn);
                       recv_sys.apply(false);
                     }
                   }
            

            Before MDEV-14425, the expression of the now failing assertion was l->lsn <= log_sys.log.scanned_lsn. That field was duplicating recv_sys.lsn.

            marko Marko Mäkelä added a comment - The scenario involves recovery in multiple batches: 10.9 20b8e5a07e50cd101b2d9d1d1099dd11577a9cc2 2022-06-17 14:51:03 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=60381 2022-06-17 14:51:04 0 [Note] InnoDB: Starting a batch to recover 224 pages from redo log. 2022-06-17 14:51:04 0 [ERROR] InnoDB: Cannot apply log to [page id: space=5, page number=0] of corrupted file './test/t1.ibd' … 2022-06-17 14:51:04 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages. mariadbd: /data/Server/10.9/storage/innobase/log/log0recv.cc:2921: buf_block_t* recv_recover_page(buf_block_t*, mtr_t&, const iterator&, fil_space_t*, mlog_init_t::init*): Assertion `l->lsn <= log_sys.get_lsn()' failed. The first quoted message says "a batch", not "final batch". The l->lsn is not larger than recv_sys.lsn (the recovered LSN), but it is larger than log_sys.lsn , which was last updated at the start of the recovery batch: 10.9 20b8e5a07e50cd101b2d9d1d1099dd11577a9cc2 #1 log_t::set_recovered_lsn (this=0x55eb77bdcac0 <log_sys>, lsn=1837116) at /data/Server/10.9/storage/innobase/include/log0log.h:399 #2 0x000055eb74b14dd5 in recv_scan_log (last_phase=true) at /data/Server/10.9/storage/innobase/log/log0recv.cc:3741 #3 0x000055eb74b187be in recv_recovery_from_checkpoint_start () at /data/Server/10.9/storage/innobase/log/log0recv.cc:4212 #4 0x000055eb74dacdd8 in srv_start (create_new_db=false) at /data/Server/10.9/storage/innobase/srv/srv0start.cc:1192 #5 0x000055eb74930b17 in innodb_init (p=0x615000003298) at /data/Server/10.9/storage/innobase/handler/ha_innodb.cc:4251 For debug builds, the probability of this failure may have been significantly increased by MDEV-18976 . I think that the assertion is documenting a constraint in buf_page_t::flush() or buf_dblwr_t::flush_buffered_writes_completed() that does not actually exist: log_write_up_to(lsn, true ); The callee would be a no-op during the non-final recovery batch: void log_write_up_to(lsn_t lsn, bool durable, const completion_callback *callback) { ut_ad(!srv_read_only_mode); ut_ad(lsn != LSN_MAX);   if (UNIV_UNLIKELY(recv_no_ibuf_operations)) { /* A non-final batch of recovery is active no writes to the log are allowed yet. */ ut_a(!callback); return ; } This code would be executed by buf_flush_page_cleaner() , invoked from another thread that runs recv_sys_t::apply() , via buf_flush_sync_batch(lsn) . Because of the condition, only recovered data pages and no log will be written. It does not really matter that the recovered FIL_PAGE_LSN is ahead of the log_sys.lsn . It suffices to update log_sys.lsn at the end of parsing the log records for the final recovery batch: log_sys.set_recovered_lsn(recv_sys.lsn) will be called at the end of recv_scan_log(true) . diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 171afa1f23c..14dad1bf6bb 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -2918,7 +2918,7 @@ static buf_block_t *recv_recover_page(buf_block_t *block, mtr_t &mtr, const log_phys_t* l = static_cast<const log_phys_t*>(recv); ut_ad(l->lsn); ut_ad(end_lsn <= l->lsn); - ut_ad(l->lsn <= log_sys.get_lsn()); + ut_ad(l->lsn <= recv_sys.lsn); ut_ad(l->start_lsn); ut_ad(recv_start_lsn <= l->start_lsn); @@ -3738,7 +3738,6 @@ static bool recv_scan_log(bool last_phase) else { ut_ad(store == STORE_IF_EXISTS); - log_sys.set_recovered_lsn(recv_sys.lsn); recv_sys.apply(false); } } Before MDEV-14425 , the expression of the now failing assertion was l->lsn <= log_sys.log.scanned_lsn . That field was duplicating recv_sys.lsn .
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2022-06-20 08:09:22.0 2022-06-20 08:09:22.452
            marko Marko Mäkelä made changes -
            Fix Version/s 10.8.4 [ 27503 ]
            Fix Version/s 10.9.2 [ 27115 ]
            Fix Version/s 10.10.0 [ 27912 ]
            Fix Version/s 10.9 [ 26905 ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.