|
I think that we should do the following:
- Remove the fixed-size 2MiB buffer recv_sys.buf, and instead of it, use log_sys.buf (whose minimum size is 16MiB) for reading from the redo log file.
- Remove recv_t::len and recv_data_t. Directly store pointers to records.
- If possible, avoid copying records from the ‘read buffer’, as currently done by recv_add_to_hash_table() or recv_sys.add().
- Only copy or move records when the ‘read buffer’ fills up. Algorithm: Traverse recv_sys->addr_hash or recv_sys.pages. For any pointer that is within the read buffer, determine the length to copy, and then copy the log to another buffer. At the end of the traversal, the entire read buffer can be freed.
- Note: New entries may be appended to log_sys.buf during recovery, especially by change buffer merge (before
MDEV-19514) or by arbitrary operations (after MDEV-14481). We must be prepared to free up space from log_sys.buf at any time.
- Remove recv_sys_t::heap and MEM_HEAP_FOR_RECV_SYS.
- Allocate the recv_sys.pages and its elements directly by the system allocator.
- Allocate buffer pool blocks directly by buf_block_alloc(). Use block->page.list for keeping track of allocated blocks.
- Remove mem_heap_t::total_size that was introduced for speeding up mem_heap_get_size() calls during recovery.
- Append log records to the last block, until the block->frame is filled up. (Similar to the current recv_sys.heap.)
- Remove recv_data_copy_to_buf(). Multi-block records should be extremely rare; I observed them in the test innodb.blob-crash only. Without the mem_heap_t storage overhead, maybe all records will always fit in block->frame.
- Repurpose buf_block_t::modify_clock to count the number of pointers from recv_sys.pages, so that we will know when a block is safe to be freed.
- Feel free to repurpose other fields, such as buf_page_t::oldest_modification, for storing state information. These were never used on block->page.state == BUF_BLOCK_MEMORY blocks.
- Implement robustness inside our buf_block_alloc() invocation above. If we cannot find a free block, do one of the following:
- Flush and evict modified pages. (This is already being done by the current code.)
- Apply buffered records to pages. Because of the reference counting (see buf_block_t::modify_clock above), this should free up blocks that held redo log records.
- When allocating a block for redo log records, refuse to allocate the very last block(s). This will ensure that there will be memory available for allocating data pages for applying log to them.
The last point is what would remove the infinite loop at recovery, with messages like
2019-08-22 13:00:20 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 1211 OS file reads, 61 OS file writes, 2 OS fsyncs.
|
|
|
I can repeat the issue with the following patch:
diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
|
index 46ab1702419..afba50217b4 100644
|
--- a/storage/innobase/log/log0recv.cc
|
+++ b/storage/innobase/log/log0recv.cc
|
@@ -3172,6 +3172,8 @@ recv_group_scan_log_recs(
|
lsn_t end_lsn;
|
store_t store_to_hash = recv_sys.mlog_checkpoint_lsn == 0
|
? STORE_NO : (last_phase ? STORE_IF_EXISTS : STORE_YES);
|
+
|
+ recv_n_pool_free_frames = 64;
|
ulint available_mem = srv_page_size
|
* (buf_pool_get_n_pages()
|
- (recv_n_pool_free_frames * srv_buf_pool_instances));
|
Basically it sets recv_n_pool_free_frames as 64. (lesser value)
The following test case demostrates the issue:
--source include/have_innodb.inc
|
--source include/have_sequence.inc
|
CREATE TABLE t1(f1 char(200), f2 char(200), f3 char(200),
|
f4 char(200), f5 char(200), f6 char(200),
|
f7 char(200), f8 char(200))ENGINE=InnoDB;
|
INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
|
INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
|
INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
|
INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
|
INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
|
INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
|
INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
|
INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_65536;
|
let $shutdown_timeout=0;
|
--source include/restart_mysqld.inc
|
INSERT INTO t1 SELECT '','','','','','','','' FROM seq_1_to_16384;
|
drop table t1;
|
|
.opt file:
|
==========
|
--innodb_page_size=4k
|
--innodb_buffer_pool_size=5M
|
Run the above test case with -mysqld=-debug=d,ib_log_checkpoint_avoid. So that, while
recovery we can have more pages to recover.
Log file contains the following message:
2019-12-05 17:34:44 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 216 OS file reads, 0 OS file writes, 0 OS fsyncs.
|