[MDEV-23380] InnoDB reads a page from disk despite parsing MLOG_INIT_FILE_PAGE2 record Created: 2020-08-03  Updated: 2020-10-06  Resolved: 2020-08-18

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.32, 10.3.23, 10.4.13
Fix Version/s: 10.2.35, 10.3.26, 10.4.16

Type: Bug Priority: Major
Reporter: Thirunarayanan Balathandayuthapani Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: not-10.5, rr-profile-analyzed

Issue Links:
Problem/Incident
is caused by MDEV-21572 buf_page_get_gen() should apply buffe... Closed
Relates
relates to MDEV-19514 Defer change buffer merge until pages... Closed
relates to MDEV-19586 Replace recv_sys_t::addr_hash with a ... Closed
relates to MDEV-21572 buf_page_get_gen() should apply buffe... Closed

 Description   

During recovery, InnoDB tries to fetch the page from disk even though
it has MLOG_INIT redo log for the page. I think buf_page_read_low()
should handle recovery scenario also.

The following stack trace tries to fetch the page even it has
MLOG_INIT_PAGE for it.

#0  buf_page_init (buf_pool=buf_pool@entry=0x61c000000080, page_id=..., 
    zip_size=zip_size@entry=0, block=block@entry=0x6400007da1c0)
    at /home/mleich/bb-10.4-MDEV-11799/storage/innobase/buf/buf0buf.cc:5280
#1  0x000055768a1b38f1 in buf_page_init_for_read (
    err=err@entry=0x7fed16d7f6c0, mode=mode@entry=132, page_id=..., 
    zip_size=zip_size@entry=0, unzip=unzip@entry=false)
    at /home/mleich/bb-10.4-MDEV-11799/storage/innobase/buf/buf0buf.cc:5422
#2  0x000055768a238f15 in buf_read_page_low (err=err@entry=0x7fed16d7f6c0, 
    sync=sync@entry=false, type=type@entry=192, mode=mode@entry=132, 
    page_id=..., zip_size=zip_size@entry=0, unzip=<optimized out>, 
    ignore_missing_space=<optimized out>)
    at /home/mleich/bb-10.4-MDEV-11799/storage/innobase/buf/buf0rea.cc:152
#3  0x000055768a23f482 in buf_read_page_background (page_id=..., 
    zip_size=zip_size@entry=0, sync=sync@entry=false)
    at /home/mleich/bb-10.4-MDEV-11799/storage/innobase/buf/buf0rea.cc:436
#4  0x000055768a0fcda7 in btr_cur_prefetch_siblings (
    block=block@entry=0x6400007d9f70)
    at /home/mleich/bb-10.4-MDEV-11799/storage/innobase/btr/btr0cur.cc:3353
#5  0x000055768a1355a3 in btr_cur_optimistic_delete_func (
    cursor=cursor@entry=0x7fed16d80260, flags=flags@entry=0, 
    mtr=mtr@entry=0x7fed16d805a0)
    at /home/mleich/bb-10.4-MDEV-11799/storage/innobase/btr/btr0cur.cc:5855
#6  0x0000557689b9b393 in ibuf_delete_rec (space=67, page_no=67, 
    pcur=pcur@entry=0x7fed16d80260, 
    search_tuple=search_tuple@entry=0x616000060108, 
    mtr=mtr@entry=0x7fed16d805a0)
    at /home/mleich/bb-10.4-MDEV-11799/storage/innobase/ibuf/ibuf0ibuf.cc:4194
#7  0x0000557689ba0915 in ibuf_merge_or_delete_for_page (
    block=block@entry=0x6400007d9880, page_id=..., zip_size=<optimized out>, 
    update_ibuf_bitmap=<optimized out>, update_ibuf_bitmap@entry=true)
    at /home/mleich/bb-10.4-MDEV-11799/storage/innobase/ibuf/ibuf0ibuf.cc:4636
#8  0x000055768a1c0d05 in buf_page_io_complete (bpage=0x6400007d9880, 
    dblwr=dblwr@entry=true, evict=evict@entry=false)
    at /home/mleich/bb-10.4-MDEV-11799/storage/innobase/buf/buf0buf.cc:6194
#9  0x000055768a354ba6 in fil_aio_wait (segment=segment@entry=3)
    at /home/mleich/bb-10.4-MDEV-11799/storage/innobase/fil/fil0fil.cc:4372
#10 0x0000557689f54f94 in io_handler_thread (arg=<optimized out>)
    at /home/mleich/bb-10.4-MDEV-11799/storage/innobase/srv/srv0start.cc:324
#11 0x00007fa71843f6db in start_thread (arg=0x7fed16d82700)
    at pthread_create.c:463
---Type <return> to continue, or q <return> to quit---
#12 0x0000000069502a3f in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

It leads to infinite loop of buf_page_create() and recv_recovery_create_page_low()
and eventually leads to failure of the server startup.



 Comments   
Comment by Marko Mäkelä [ 2020-08-03 ]

thiru, thank you for the analysis. I think that the stack trace clearly demonstrates the problem.

The immediate problem appears to have been solved in the 10.5 release, mostly by MDEV-19514: we should no longer be merging the change buffer during recovery.

On a related note, are we already disabling both the linear and the random read-ahead during recovery? That is, let recovery only read pages via recv_read_in_area()? In MDEV-19586 (also 10.5), the reads are being issued in ascending order of page number.

Of course, disabling the background read-ahead will not address the problem that is demonstrated by the stack trace, because that read request occurs in the foreground, by btr_cur_prefetch_siblings(). Maybe we should extend buf_read_page_background() with similar checks as those that were to buf_page_get_gen() in MDEV-21572.

Note: even though 10.5 might not be affected by this problem, future major versions could be, after we implement MDEV-14481 (redo log apply in the background).

Comment by Thirunarayanan Balathandayuthapani [ 2020-08-03 ]

buf_read_ahead_random(), buf_read_ahead_linear() has the following check:

        if (srv_startup_is_before_trx_rollback_phase) {
                /* No read-ahead to avoid thread deadlocks */
                return(0);
        }

srv_startup_is_before_trx_rollback_phase is being set to FALSE after applying redo logs.

Comment by Thirunarayanan Balathandayuthapani [ 2020-08-03 ]

The following patch solves the issue:

diff --git a/storage/innobase/buf/buf0rea.cc b/storage/innobase/buf/buf0rea.cc
index d6f096fc80e..ae2b20e7fd3 100644
--- a/storage/innobase/buf/buf0rea.cc
+++ b/storage/innobase/buf/buf0rea.cc
@@ -456,6 +456,12 @@ buf_read_page_background(
        const page_size_t&      page_size,
        bool                    sync)
 {
+
+       if (srv_startup_is_before_trx_rollback_phase) {
+               /* No background read before applying the redo logs */
+               return(0);
+       }
+
        ulint           count;
        dberr_t         err;

Comment by Marko Mäkelä [ 2020-08-03 ]

thiru, thank you, that patch looks like it should be safe.

A possibly better fix might be to avoid invoking btr_cur_prefetch_siblings() for the change buffer tree. The SX-locking rules should not really apply to that tree, and operations on the non-leaf pages of the change buffer should be covered by ibuf_pessimistic_insert_mutex(). Here is an untested patch for 10.4:

diff --git a/storage/innobase/btr/btr0cur.cc b/storage/innobase/btr/btr0cur.cc
index 5696d184386..73409b922ec 100644
--- a/storage/innobase/btr/btr0cur.cc
+++ b/storage/innobase/btr/btr0cur.cc
@@ -3497,7 +3497,7 @@ btr_cur_optimistic_insert(
 
 		/* prefetch siblings of the leaf for the pessimistic
 		operation, if the page is leaf. */
-		if (page_is_leaf(page)) {
+		if (page_is_leaf(page) && !index->is_ibuf()) {
 			btr_cur_prefetch_siblings(block);
 		}
 fail_err:
@@ -4579,6 +4579,7 @@ btr_cur_optimistic_update(
 
 	if (rec_offs_any_extern(*offsets)) {
 any_extern:
+		ut_ad(!index->is_ibuf());
 		/* Externally stored fields are treated in pessimistic
 		update */
 
@@ -4775,7 +4776,7 @@ btr_cur_optimistic_update(
 		}
 	}
 
-	if (err != DB_SUCCESS) {
+	if (err != DB_SUCCESS && !index->is_ibuf()) {
 		/* prefetch siblings of the leaf for the pessimistic
 		operation. */
 		btr_cur_prefetch_siblings(block);
@@ -5866,7 +5867,7 @@ btr_cur_optimistic_delete_func(
 				ibuf_update_free_bits_low(block, max_ins, mtr);
 			}
 		}
-	} else {
+	} else if (!index->is_ibuf()) {
 		/* prefetch siblings of the leaf for the pessimistic
 		operation. */
 		btr_cur_prefetch_siblings(block);

Comment by Thirunarayanan Balathandayuthapani [ 2020-08-04 ]

Patch is in bb-10.2-MDEV-23380

Comment by Matthias Leich [ 2020-08-13 ]

The tree commit 935415a9cdecf87a66166921e3f7594bf80fce77 (HEAD, origin/bb-10.2-thiru)
containing fixes for MDEV-22934 and MDEV-23380 performed well during RQG testing.

Generated at Thu Feb 08 09:21:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.