Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19176

Do not run out of InnoDB buffer pool during recovery

Details

    Description

      Reduce memory usage during the recovery page.
      1) innodb_gis.rtree_recovery,4k test case - Fails with the following error

      [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 577 OS file reads, 1 OS file writes, 1 OS fsyncs.
      

      Buffer ran out of memory during recovery. Remove recv_n_pool_free_frames hack

      2) MDEV-12699 introduces mlog_reset. Basically, it stores all init records of the page, mlog_index_load of all page and their respective LSN.
      It is done to avoid I/O of mlog_init_ record pages during recovery.

      Remove mlog_reset and store this information as a part of recv_addr.

      Attachments

        Issue Links

          Activity

            thiru Thirunarayanan Balathandayuthapani created issue -
            thiru Thirunarayanan Balathandayuthapani made changes -
            Field Original Value New Value
            Status Open [ 1 ] Confirmed [ 10101 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Description Reduce memory usage during the recovery page.
            1) *innodb_gis.rtree_recovery,4k* test case - Fails with the following error

            {code}
            [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 577 OS file reads, 1 OS file writes, 1 OS fsyncs.
            {code}

            Buffer ran out of memory during recovery. Remove *recv_n_pool_free_frames* hack

            2) *MDEV-12699* introduces _mlog_reset_. Basically, it stores all init records of the page, _mlog_index_load_ of all page and their respective LSN. It is done to avoid
            I/O.

            Remove mlog_reset and store this information as a part of recv_addr.

            Reduce memory usage during the recovery page.
            1) *innodb_gis.rtree_recovery,4k* test case - Fails with the following error

            {code}
            [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 577 OS file reads, 1 OS file writes, 1 OS fsyncs.
            {code}

            Buffer ran out of memory during recovery. Remove *recv_n_pool_free_frames* hack

            2) *MDEV-12699* introduces _mlog_reset_. Basically, it stores all init records of the page, _mlog_index_load_ of all page and their respective LSN.
            It is done to avoid I/O of *mlog_init_* record pages during recovery.

            Remove mlog_reset and store this information as a part of recv_addr.

            elenst Elena Stepanova made changes -
            Labels affects-tests

            I got this on 10.4:

            innodb.undo_truncate_recover '2,4k,innodb,strict_crc32' [ fail ]
                    Test ended at 2019-08-22 13:08:44
             
            CURRENT_TEST: innodb.undo_truncate_recover
            mysqltest: In included file "./include/wait_until_connected_again.inc": 
            included from ./include/start_mysqld.inc at line 42:
            included from /mariadb/10.4/mysql-test/suite/innodb/t/undo_truncate_recover.test at line 55:
            At line 34: Server failed to restart
            

            The cause was at the end of the server error log:

            2019-08-22 13:00:20 0 [Note] InnoDB: Opened 2 undo tablespaces
            2019-08-22 13:00:20 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces
            2019-08-22 13:00:20 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=19044026
            2019-08-22 13:00:20 0 [Note] InnoDB: Recovered page [page id: space=1, page number=1068] from the doublewrite buffer.
            2019-08-22 13:00:20 0 [Note] InnoDB: Recovered page [page id: space=1, page number=1008] from the doublewrite buffer.
            2019-08-22 13:00:20 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 1211 OS file reads, 61 OS file writes, 2 OS fsyncs.
            

            marko Marko Mäkelä added a comment - I got this on 10.4: innodb.undo_truncate_recover '2,4k,innodb,strict_crc32' [ fail ] Test ended at 2019-08-22 13:08:44   CURRENT_TEST: innodb.undo_truncate_recover mysqltest: In included file "./include/wait_until_connected_again.inc": included from ./include/start_mysqld.inc at line 42: included from /mariadb/10.4/mysql-test/suite/innodb/t/undo_truncate_recover.test at line 55: At line 34: Server failed to restart The cause was at the end of the server error log: 2019-08-22 13:00:20 0 [Note] InnoDB: Opened 2 undo tablespaces 2019-08-22 13:00:20 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces 2019-08-22 13:00:20 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=19044026 2019-08-22 13:00:20 0 [Note] InnoDB: Recovered page [page id: space=1, page number=1068] from the doublewrite buffer. … 2019-08-22 13:00:20 0 [Note] InnoDB: Recovered page [page id: space=1, page number=1008] from the doublewrite buffer. 2019-08-22 13:00:20 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 1211 OS file reads, 61 OS file writes, 2 OS fsyncs.
            elenst Elena Stepanova made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä added a comment - - edited

            I think that we should do the following:

            • Remove the fixed-size 2MiB buffer recv_sys.buf, and instead of it, use log_sys.buf (whose minimum size is 16MiB) for reading from the redo log file.
            • Remove recv_t::len and recv_data_t. Directly store pointers to records.
            • If possible, avoid copying records from the ‘read buffer’, as currently done by recv_add_to_hash_table() or recv_sys.add().
              • Only copy or move records when the ‘read buffer’ fills up. Algorithm: Traverse recv_sys->addr_hash or recv_sys.pages. For any pointer that is within the read buffer, determine the length to copy, and then copy the log to another buffer. At the end of the traversal, the entire read buffer can be freed.
              • Note: New entries may be appended to log_sys.buf during recovery, especially by change buffer merge (before MDEV-19514) or by arbitrary operations (after MDEV-14481). We must be prepared to free up space from log_sys.buf at any time.
            • Remove recv_sys_t::heap and MEM_HEAP_FOR_RECV_SYS.
              • Allocate the recv_sys.pages and its elements directly by the system allocator.
              • Allocate buffer pool blocks directly by buf_block_alloc(). Use block->page.list for keeping track of allocated blocks.
              • Remove mem_heap_t::total_size that was introduced for speeding up mem_heap_get_size() calls during recovery.
              • Append log records to the last block, until the block->frame is filled up. (Similar to the current recv_sys.heap.)
              • Remove recv_data_copy_to_buf(). Multi-block records should be extremely rare; I observed them in the test innodb.blob-crash only. Without the mem_heap_t storage overhead, maybe all records will always fit in block->frame.
              • Repurpose buf_block_t::modify_clock to count the number of pointers from recv_sys.pages, so that we will know when a block is safe to be freed.
              • Feel free to repurpose other fields, such as buf_page_t::oldest_modification, for storing state information. These were never used on block->page.state == BUF_BLOCK_MEMORY blocks.
            • Implement robustness inside our buf_block_alloc() invocation above. If we cannot find a free block, do one of the following:
              • Flush and evict modified pages. (This is already being done by the current code.)
              • Apply buffered records to pages. Because of the reference counting (see buf_block_t::modify_clock above), this should free up blocks that held redo log records.
              • When allocating a block for redo log records, refuse to allocate the very last block(s). This will ensure that there will be memory available for allocating data pages for applying log to them.

            The last point is what would remove the infinite loop at recovery, with messages like

            2019-08-22 13:00:20 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 1211 OS file reads, 61 OS file writes, 2 OS fsyncs.
            

            marko Marko Mäkelä added a comment - - edited I think that we should do the following: Remove the fixed-size 2MiB buffer recv_sys.buf , and instead of it, use log_sys.buf (whose minimum size is 16MiB) for reading from the redo log file. Remove recv_t::len and recv_data_t . Directly store pointers to records. If possible, avoid copying records from the ‘read buffer’, as currently done by recv_add_to_hash_table() or recv_sys.add() . Only copy or move records when the ‘read buffer’ fills up. Algorithm: Traverse recv_sys->addr_hash or recv_sys.pages . For any pointer that is within the read buffer, determine the length to copy, and then copy the log to another buffer. At the end of the traversal, the entire read buffer can be freed. Note: New entries may be appended to log_sys.buf during recovery, especially by change buffer merge (before MDEV-19514 ) or by arbitrary operations (after MDEV-14481 ). We must be prepared to free up space from log_sys.buf at any time. Remove recv_sys_t::heap and MEM_HEAP_FOR_RECV_SYS . Allocate the recv_sys.pages and its elements directly by the system allocator. Allocate buffer pool blocks directly by buf_block_alloc() . Use block->page.list for keeping track of allocated blocks. Remove mem_heap_t::total_size that was introduced for speeding up mem_heap_get_size() calls during recovery. Append log records to the last block, until the block->frame is filled up. (Similar to the current recv_sys.heap .) Remove recv_data_copy_to_buf() . Multi-block records should be extremely rare; I observed them in the test innodb.blob-crash only. Without the mem_heap_t storage overhead, maybe all records will always fit in block->frame . Repurpose buf_block_t::modify_clock to count the number of pointers from recv_sys.pages , so that we will know when a block is safe to be freed. Feel free to repurpose other fields, such as buf_page_t::oldest_modification , for storing state information. These were never used on block->page.state == BUF_BLOCK_MEMORY blocks. Implement robustness inside our buf_block_alloc() invocation above. If we cannot find a free block, do one of the following: Flush and evict modified pages. (This is already being done by the current code.) Apply buffered records to pages. Because of the reference counting (see buf_block_t::modify_clock above), this should free up blocks that held redo log records. When allocating a block for redo log records, refuse to allocate the very last block(s). This will ensure that there will be memory available for allocating data pages for applying log to them. The last point is what would remove the infinite loop at recovery, with messages like 2019-08-22 13:00:20 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 1211 OS file reads, 61 OS file writes, 2 OS fsyncs.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Attachment 0001-WIP-Remove-recv_sys.buf.patch [ 49347 ]

            0001-WIP-Remove-recv_sys.buf.patch against 10.5 51fb39bf175b275c91468ea6cd1cc82c1395c16d is a very early step towards this. It does not even compile.

            marko Marko Mäkelä added a comment - 0001-WIP-Remove-recv_sys.buf.patch against 10.5 51fb39bf175b275c91468ea6cd1cc82c1395c16d is a very early step towards this. It does not even compile.
            marko Marko Mäkelä made changes -
            Attachment 0001-WIP-Remove-recv_sys.buf.patch [ 49347 ]
            marko Marko Mäkelä made changes -
            Attachment 0001-WIP-Remove-recv_sys.buf.patch [ 49348 ]

            We are seeing a case when this is happening on 10.1.41 running on Ubuntu 18.04

            2019-11-07  4:00:12 140454363481856 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 0 f
            ailed attempts to flush a page!
            2019-11-07  4:00:12 140454363481856 [Note] InnoDB: Consider increasing the buffer pool size.
            2019-11-07  4:00:12 140454363481856 [Note] InnoDB: Pending flushes (fsync) log: 0 buffer pool: 0 OS file reads: 4579804 OS file wr
            ites: 34359 OS fsyncs: 13461
            

            Any solution yet?

            wagnerbianchi Wagner Bianchi (Inactive) added a comment - We are seeing a case when this is happening on 10.1.41 running on Ubuntu 18.04 2019 - 11 - 07 4 : 00 : 12 140454363481856 [Warning] InnoDB: Difficult to find free blocks in the buffer pool ( 21 search iterations)! 0 f ailed attempts to flush a page! 2019 - 11 - 07 4 : 00 : 12 140454363481856 [Note] InnoDB: Consider increasing the buffer pool size. 2019 - 11 - 07 4 : 00 : 12 140454363481856 [Note] InnoDB: Pending flushes (fsync) log: 0 buffer pool: 0 OS file reads: 4579804 OS file wr ites: 34359 OS fsyncs: 13461 Any solution yet?
            wagnerbianchi Wagner Bianchi (Inactive) made changes -
            Affects Version/s 10.1.41 [ 23406 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]

            I can repeat the issue with the following patch:

            diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
            index 46ab1702419..afba50217b4 100644
            --- a/storage/innobase/log/log0recv.cc
            +++ b/storage/innobase/log/log0recv.cc
            @@ -3172,6 +3172,8 @@ recv_group_scan_log_recs(
                    lsn_t   end_lsn;
                    store_t store_to_hash   = recv_sys.mlog_checkpoint_lsn == 0
                            ? STORE_NO : (last_phase ? STORE_IF_EXISTS : STORE_YES);
            +
            +       recv_n_pool_free_frames = 64;
                    ulint   available_mem   = srv_page_size
                            * (buf_pool_get_n_pages()
                               - (recv_n_pool_free_frames * srv_buf_pool_instances));
            

            Basically it sets recv_n_pool_free_frames as 64. (lesser value)

            The following test case demostrates the issue:

            --source include/have_innodb.inc
            --source include/have_sequence.inc
            CREATE TABLE t1(f1 char(200), f2 char(200), f3 char(200),
                            f4 char(200), f5 char(200), f6 char(200),
                            f7 char(200), f8 char(200))ENGINE=InnoDB;
            INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
            INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
            INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
            INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
            INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
            INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
            INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
            INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
            let $shutdown_timeout=0;
            --source include/restart_mysqld.inc
            INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_16384;
            drop table t1;
             
            .opt file:
            ==========
            --innodb_page_size=4k
            --innodb_buffer_pool_size=5M
            

            Run the above test case with -mysqld=-debug=d,ib_log_checkpoint_avoid. So that, while
            recovery we can have more pages to recover.

            Log file contains the following message:
            2019-12-05 17:34:44 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 216 OS file reads, 0 OS file writes, 0 OS fsyncs.

            thiru Thirunarayanan Balathandayuthapani added a comment - I can repeat the issue with the following patch: diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 46ab1702419..afba50217b4 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -3172,6 +3172,8 @@ recv_group_scan_log_recs( lsn_t end_lsn; store_t store_to_hash = recv_sys.mlog_checkpoint_lsn == 0 ? STORE_NO : (last_phase ? STORE_IF_EXISTS : STORE_YES); + + recv_n_pool_free_frames = 64; ulint available_mem = srv_page_size * (buf_pool_get_n_pages() - (recv_n_pool_free_frames * srv_buf_pool_instances)); Basically it sets recv_n_pool_free_frames as 64. (lesser value) The following test case demostrates the issue: --source include/have_innodb.inc --source include/have_sequence.inc CREATE TABLE t1(f1 char(200), f2 char(200), f3 char(200), f4 char(200), f5 char(200), f6 char(200), f7 char(200), f8 char(200))ENGINE=InnoDB; INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536; INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536; INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536; INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536; INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536; INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536; INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536; INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536; let $shutdown_timeout=0; --source include/restart_mysqld.inc INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_16384; drop table t1;   .opt file: ========== --innodb_page_size=4k --innodb_buffer_pool_size=5M Run the above test case with - mysqld= -debug=d,ib_log_checkpoint_avoid. So that, while recovery we can have more pages to recover. Log file contains the following message: 2019-12-05 17:34:44 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 216 OS file reads, 0 OS file writes, 0 OS fsyncs.

            Result of RQG testing on
            - bb-10.2-MDEV-19176 commit 40771b642681fa2c005cc71ad6e45979899b5186 2019-12-10
            - RQG YY grammars running DML and/or FLUSH ... or LOCK ... only  (but no table modifying DDL)
            - have the load on the server + SIGKILL server process + try restart invoking recovery + check consistency
              or
              have the main server all time up and under load,  run frequent mariabackup, try to restart from these backup data a second server,
              check consistency/compare content of both servers,  remove the second server and his data
            1.  No server crash/assert and never rotten data or similar
            2.  Especially the variant SIGKILL server process + try restart ... behaved excellent
            3. The variant with Mariabackup made sometimes trouble (failed from RQG's builtin point of view)
                 a) failing "mariabackup --backup" because one or more of the concurrent sessions fiddled with FLUSH TABLES + LOCK ... in parallel
                      IMHO some known problem which does not belong to the current MDEV.
                 b) starting a server on the data generated via mariabackup ..... or similar failed because a port was already occupied or not yet freed or similar
                       The reason might be in some weakness of some RQG tool (like do not wait long enough till some process is gone) or known issues
                       with servers around shutdown.  IMHO again some mostly known problem which does not belong to the current MDEV.
            4. The test results obtained do not allow to judge if the memory usage during recovery has changed or not.
            
            

            mleich Matthias Leich added a comment - Result of RQG testing on - bb-10.2-MDEV-19176 commit 40771b642681fa2c005cc71ad6e45979899b5186 2019-12-10 - RQG YY grammars running DML and/or FLUSH ... or LOCK ... only (but no table modifying DDL) - have the load on the server + SIGKILL server process + try restart invoking recovery + check consistency or have the main server all time up and under load, run frequent mariabackup, try to restart from these backup data a second server, check consistency/compare content of both servers, remove the second server and his data 1. No server crash/assert and never rotten data or similar 2. Especially the variant SIGKILL server process + try restart ... behaved excellent 3. The variant with Mariabackup made sometimes trouble (failed from RQG's builtin point of view) a) failing "mariabackup --backup" because one or more of the concurrent sessions fiddled with FLUSH TABLES + LOCK ... in parallel IMHO some known problem which does not belong to the current MDEV. b) starting a server on the data generated via mariabackup ..... or similar failed because a port was already occupied or not yet freed or similar The reason might be in some weakness of some RQG tool (like do not wait long enough till some process is gone) or known issues with servers around shutdown. IMHO again some mostly known problem which does not belong to the current MDEV. 4. The test results obtained do not allow to judge if the memory usage during recovery has changed or not.
            marko Marko Mäkelä made changes -
            Fix Version/s 10.2.31 [ 24017 ]
            Fix Version/s 10.3.22 [ 24018 ]
            Fix Version/s 10.4.12 [ 24019 ]
            Fix Version/s 10.5.1 [ 24029 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Summary Reduce the memory usage during recovery Do not run out of InnoDB buffer pool during recovery

            Some follow-up work to reduce the memory usage will be performed in MDEV-21351.

            marko Marko Mäkelä added a comment - Some follow-up work to reduce the memory usage will be performed in MDEV-21351 .
            marko Marko Mäkelä made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            thiru Thirunarayanan Balathandayuthapani made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels affects-tests ServiceNow affects-tests
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels ServiceNow affects-tests 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z affects-tests
            serg Sergei Golubchik made changes -
            Labels 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z affects-tests affects-tests
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 94254 ] MariaDB v4 [ 156035 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 201658
            Zendesk active tickets 201658

            People

              thiru Thirunarayanan Balathandayuthapani
              thiru Thirunarayanan Balathandayuthapani
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.