Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30551

InnoDB recovery hangs when buffer pool ran out of memory

    XMLWordPrintable

Details

    Description

      InnoDB ran out of buffer pool free page during apply phase of recovery.
      At the time, InnoDB flushes LRU pages to get the free page from buffer
      pool. As a part of flushing the LRU page, InnoDB tries to
      flush the redo log buffer to the log file (log_write_up_to()).

      At the time, there is a possibility of hanging between buf_flush_thread()
      and main recovery thread:

       
      (gdb) where
      #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
      #1  0x000055a7bf555e52 in binary_semaphore::wait (this=0x7f1d20296250) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/log/log0sync.cc:127
      #2  0x000055a7bf556879 in group_commit_lock::acquire (this=0x55a7c3294380 <flush_lock>, num=64719439, callback=0x0) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/log/log0sync.cc:273
      #3  0x000055a7bf4e8d2e in log_write_up_to (lsn=64719439, durable=true, callback=0x0) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/log/log0log.cc:931
      #4  0x000055a7bf4e9247 in log_buffer_flush_to_disk (durable=true) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/log/log0log.cc:965
      #5  0x000055a7bf949f35 in buf_flush_LRU (max_n=32) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/buf/buf0flu.cc:1633
      #6  0x000055a7bf95abe6 in buf_LRU_get_free_block (have_mutex=false) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/buf/buf0lru.cc:484
      #7  0x000055a7bf501e80 in recv_sys_t::apply (this=0x55a7c239fa00 <recv_sys>, last_batch=false) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/log/log0recv.cc:3454
      #8  0x000055a7bf5041d2 in recv_scan_log (last_phase=true) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/log/log0recv.cc:3699
      #9  0x000055a7bf507e3c in recv_recovery_from_checkpoint_start () at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/log/log0recv.cc:4169
      #10 0x000055a7bf7be66a in srv_start (create_new_db=false) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/srv/srv0start.cc:1495
      #11 0x000055a7bf3240ff in innodb_init (p=0x616000022308) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/handler/ha_innodb.cc:4157
      #12 0x000055a7be971026 in ha_initialize_handlerton (plugin=0x6210000c1f70) at /home/thiru/mariarepo/server/10.11/10.11-sample/sql/handler.cc:649
      #13 0x000055a7be10598e in plugin_initialize (tmp_root=0x7ffca58170e0, plugin=0x6210000c1f70, argc=0x55a7c29a3c00 <remaining_argc>, argv=0x6170000001f8, options_only=false) at /home/thiru/mariarepo/server/10.11/10.11-sample/sql/sql_plugin.cc:1465
      #14 0x000055a7be107881 in plugin_init (argc=0x55a7c29a3c00 <remaining_argc>, argv=0x6170000001f8, flags=0) at /home/thiru/mariarepo/server/10.11/10.11-sample/sql/sql_plugin.cc:1758
      #15 0x000055a7bdcd2a45 in init_server_components () at /home/thiru/mariarepo/server/10.11/10.11-sample/sql/mysqld.cc:5211
      #16 0x000055a7bdcd4cb8 in mysqld_main (argc=11, argv=0x6170000001f8) at /home/thiru/mariarepo/server/10.11/10.11-sample/sql/mysqld.cc:5833
      #17 0x000055a7bdcbd185 in main (argc=10, argv=0x7ffca5819898) at /home/thiru/mariarepo/server/10.11/10.11-sample/sql/main.cc:34
       
       
      (gdb) t 5
      [Switching to thread 5 (Thread 0x7f1cffc10700 (LWP 12777))]
      #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
      38	../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
      (gdb) where
      #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
      #1  0x000055a7bf7cab4f in srw_mutex_impl<false>::wait (this=0x55a7c3294548 <log_sys+72>, lk=2147483650) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/sync/srw_lock.cc:239
      #2  0x000055a7bf7c9557 in srw_mutex_impl<false>::wait_and_lock (this=0x55a7c3294548 <log_sys+72>) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/sync/srw_lock.cc:326
      #3  0x000055a7bf39b5a9 in srw_mutex_impl<false>::wr_lock (this=0x55a7c3294548 <log_sys+72>) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/include/srw_lock.h:134
      #4  0x000055a7bf39823e in ssux_lock_impl<false>::wr_lock (this=0x55a7c3294548 <log_sys+72>) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/include/srw_lock.h:257
      #5  0x000055a7bf4ed157 in srw_lock_impl<false>::wr_lock (this=0x55a7c3294540 <log_sys+64>, file=0x55a7c0dbe2a0 "/home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/log/log0log.cc", line=942) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/include/srw_lock.h:528
      #6  0x000055a7bf4e8f8b in log_write_up_to (lsn=56396260, durable=true, callback=0x0) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/log/log0log.cc:942
      #7  0x000055a7bf953996 in buf_page_t::flush (this=0x7f1d05184b00, lru=false, space=0x6130000003d8) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/buf/buf0flu.cc:864
      #8  0x000055a7bf948df0 in buf_do_flush_list_batch (max_n=200, lsn=18446744073709551615) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/buf/buf0flu.cc:1421
      #9  0x000055a7bf949772 in buf_flush_list (max_n=200, lsn=18446744073709551615) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/buf/buf0flu.cc:1497
      #10 0x000055a7bf94ea51 in buf_flush_page_cleaner () at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/buf/buf0flu.cc:2456
      #11 0x000055a7bf956c6b in std::__invoke_impl<void, void (*)()> (__f=@0x602000000158: 0x55a7bf94d7d4 <buf_flush_page_cleaner()>) at /usr/include/c++/8/bits/invoke.h:60
      #12 0x000055a7bf956522 in std::__invoke<void (*)()> (__fn=@0x602000000158: 0x55a7bf94d7d4 <buf_flush_page_cleaner()>) at /usr/include/c++/8/bits/invoke.h:95
      #13 0x000055a7bf9574ec in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x602000000158) at /usr/include/c++/8/thread:244
      #14 0x000055a7bf957479 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x602000000158) at /usr/include/c++/8/thread:253
      #15 0x000055a7bf9573e2 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x602000000150) at /usr/include/c++/8/thread:196
      #16 0x00007f1d1e2084c0 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
      #17 0x00007f1d1e74c6db in start_thread (arg=0x7f1cffc10700) at pthread_create.c:463
      #18 0x00007f1d1d8ae61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      
      

      t5 holds flush lock, write lock tries to acquire log_sys.latch in log_write_up_to(). But t1 holds log_sys.latch(in recv_recovery_from_checkpoint_start()) and tries
      to acquire flush lock in log_write_up_to()

      Let me paste the stack trace where t1 acquires log_sys.latch and t5 tries to acquires log_sys.latch:

      Thread 1 "mysqld" hit Hardware watchpoint 3: -location log_sys.latch.lock.readers._M_i
       
      Old value = 0
      New value = 2147483648
      ssux_lock_impl<false>::wr_lock (this=0x55555c444548 <log_sys+72>) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/include/srw_lock.h:264
      264	    if (uint32_t lk= readers.fetch_add(WRITER, std::memory_order_acquire))
      (gdb) where
      #0  ssux_lock_impl<false>::wr_lock (this=0x55555c444548 <log_sys+72>) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/include/srw_lock.h:264
      #1  0x000055555869d157 in srw_lock_impl<false>::wr_lock (this=0x55555c444540 <log_sys+64>, file=0x555559f769c0 "/home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/log/log0recv.cc", line=4059) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/include/srw_lock.h:528
      #2  0x00005555586b73ed in recv_recovery_from_checkpoint_start () at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/log/log0recv.cc:4059
      #3  0x000055555896e66a in srv_start (create_new_db=false) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/srv/srv0start.cc:1495
      #4  0x00005555584d40ff in innodb_init (p=0x616000021d08) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/handler/ha_innodb.cc:4157
      #5  0x0000555557b21026 in ha_initialize_handlerton (plugin=0x6210000c0b70) at /home/thiru/mariarepo/server/10.11/10.11-sample/sql/handler.cc:649
      #6  0x00005555572b598e in plugin_initialize (tmp_root=0x7fffffffaf40, plugin=0x6210000c0b70, argc=0x55555bb53c00 <remaining_argc>, argv=0x6170000001f8, options_only=false) at /home/thiru/mariarepo/server/10.11/10.11-sample/sql/sql_plugin.cc:1465
      #7  0x00005555572b7881 in plugin_init (argc=0x55555bb53c00 <remaining_argc>, argv=0x6170000001f8, flags=0) at /home/thiru/mariarepo/server/10.11/10.11-sample/sql/sql_plugin.cc:1758
      #8  0x0000555556e82a45 in init_server_components () at /home/thiru/mariarepo/server/10.11/10.11-sample/sql/mysqld.cc:5211
      #9  0x0000555556e84cb8 in mysqld_main (argc=9, argv=0x6170000001f8) at /home/thiru/mariarepo/server/10.11/10.11-sample/sql/mysqld.cc:5833
      #10 0x0000555556e6d185 in main (argc=8, argv=0x7fffffffd6f8) at /home/thiru/mariarepo/server/10.11/10.11-sample/sql/main.cc:34
       
      Thread 5 "mysqld" hit Hardware watchpoint 2: -location log_sys.latch.lock.writer.lock._M_i
       
      Old value = 2147483649
      New value = 2147483650
      srw_mutex_impl<false>::wait_and_lock (this=0x55555c444548 <log_sys+72>) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/sync/srw_lock.cc:293
      293	  uint32_t lk= 1 + lock.fetch_add(1, std::memory_order_relaxed);
      (gdb) where
      #0  srw_mutex_impl<false>::wait_and_lock (this=0x55555c444548 <log_sys+72>) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/sync/srw_lock.cc:293
      #1  0x000055555854b5a9 in srw_mutex_impl<false>::wr_lock (this=0x55555c444548 <log_sys+72>) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/include/srw_lock.h:134
      #2  0x000055555854823e in ssux_lock_impl<false>::wr_lock (this=0x55555c444548 <log_sys+72>) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/include/srw_lock.h:257
      #3  0x000055555869d157 in srw_lock_impl<false>::wr_lock (this=0x55555c444540 <log_sys+64>, file=0x555559f6e2a0 "/home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/log/log0log.cc", line=942) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/include/srw_lock.h:528
      #4  0x0000555558698f8b in log_write_up_to (lsn=56396260, durable=true, callback=0x0) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/log/log0log.cc:942
      #5  0x0000555558b03996 in buf_page_t::flush (this=0x7fffdd6faca0, lru=false, space=0x6130000003d8) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/buf/buf0flu.cc:864
      #6  0x0000555558af8df0 in buf_do_flush_list_batch (max_n=200, lsn=18446744073709551615) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/buf/buf0flu.cc:1421
      #7  0x0000555558af9772 in buf_flush_list (max_n=200, lsn=18446744073709551615) at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/buf/buf0flu.cc:1497
      #8  0x0000555558afea51 in buf_flush_page_cleaner () at /home/thiru/mariarepo/server/10.11/10.11-sample/storage/innobase/buf/buf0flu.cc:2456
      #9  0x0000555558b06c6b in std::__invoke_impl<void, void (*)()> (__f=@0x602000000078: 0x555558afd7d4 <buf_flush_page_cleaner()>) at /usr/include/c++/8/bits/invoke.h:60
      #10 0x0000555558b06522 in std::__invoke<void (*)()> (__fn=@0x602000000078: 0x555558afd7d4 <buf_flush_page_cleaner()>) at /usr/include/c++/8/bits/invoke.h:95
      #11 0x0000555558b074ec in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x602000000078) at /usr/include/c++/8/thread:244
      #12 0x0000555558b07479 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x602000000078) at /usr/include/c++/8/thread:253
      #13 0x0000555558b073e2 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x602000000070) at /usr/include/c++/8/thread:196
      #14 0x00007ffff5f424c0 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
      #15 0x00007ffff64866db in start_thread (arg=0x7fffd8186700) at pthread_create.c:463
      #16 0x00007ffff55e861f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      recv_sys.is_memory_exhausted() has been checked only during parsing of redo log records and it makes sure that
      there are 2/3rd of buffer pool exist for data pages to apply the redo logs.

      This problem exist from 10.5+ onwards.

      Attachments

        Issue Links

          Activity

            People

              thiru Thirunarayanan Balathandayuthapani
              thiru Thirunarayanan Balathandayuthapani
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.