[MDEV-24448] mariabackup: storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed. Created: 2020-12-18  Updated: 2020-12-22  Resolved: 2020-12-21

Status: Closed
Project: MariaDB Server
Component/s: mariabackup, Storage Engine - InnoDB
Affects Version/s: 10.6.0
Fix Version/s: 10.5.9, 10.6.0

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-24449 Corruption of system tablespace or la... Closed

 Description   

mariabackup.apply-log-only-incr 'innodb' w19 [ 2 pass ]  65328
mariabackup.apply-log-only-incr 'innodb' w8 [ 2 fail ]
        Test ended at 2020-12-17 19:37:54
 
CURRENT_TEST: mariabackup.apply-log-only-incr
...
/data/Server_bin/10.6C_debug/bin/mariabackup based on MariaDB server 10.6.0-MariaDB Linux (x86_64)
[00] 2020-12-17 19:36:51 incremental backup from 62952 is enabled.
[00] 2020-12-17 19:36:51 cd to /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup/
[00] 2020-12-17 19:36:56 This target seems to be already prepared.
[00] 2020-12-17 19:36:57 mariabackup: using the following InnoDB configuration for recovery:
[00] 2020-12-17 19:36:57 innodb_data_home_dir = .
[00] 2020-12-17 19:36:57 innodb_data_file_path = ibdata1:12M:autoextend
[00] 2020-12-17 19:36:57 innodb_log_group_home_dir = /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1/
[00] 2020-12-17 19:36:57 InnoDB: Using Linux native AIO
[00] 2020-12-17 19:36:57 mariabackup: Generating a list of tablespaces
[00] 2020-12-17 19:36:57 page size for /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//ibdata1.delta is 16384 bytes
[00] 2020-12-17 19:36:57 Applying /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//ibdata1.delta to ./ibdata1...
[00] 2020-12-17 19:36:57 page size for /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//test/t.ibd.delta is 16384 bytes
[00] 2020-12-17 19:36:57 Applying /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//test/t.ibd.delta to ./test/t.ibd...
[00] 2020-12-17 19:36:57 page size for /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//mysql/transaction_registry.ibd.delta is 16384 bytes
[00] 2020-12-17 19:36:57 Applying /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//mysql/transaction_registry.ibd.delta to ./mysql/transaction_registry.ibd...
[00] 2020-12-17 19:36:57 page size for /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//mysql/innodb_table_stats.ibd.delta is 16384 bytes
[00] 2020-12-17 19:36:57 Applying /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//mysql/innodb_table_stats.ibd.delta to ./mysql/innodb_table_stats.ibd...
[00] 2020-12-17 19:36:57 page size for /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//mysql/innodb_index_stats.ibd.delta is 16384 bytes
[00] 2020-12-17 19:36:57 Applying /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//mysql/innodb_index_stats.ibd.delta to ./mysql/innodb_index_stats.ibd...
[00] 2020-12-17 19:36:57 mariabackup: using the following InnoDB configuration for recovery:
[00] 2020-12-17 19:36:57 innodb_data_home_dir = .
[00] 2020-12-17 19:36:57 innodb_data_file_path = ibdata1:12M:autoextend
[00] 2020-12-17 19:36:57 innodb_log_group_home_dir = /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1/
[00] 2020-12-17 19:36:57 InnoDB: Using Linux native AIO
[00] 2020-12-17 19:36:57 Starting InnoDB instance for recovery.
[00] 2020-12-17 19:36:57 mariabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
2020-12-17 19:36:57 0 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
2020-12-17 19:36:57 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-12-17 19:36:57 0 [Note] InnoDB: Number of pools: 1
2020-12-17 19:36:57 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2020-12-17 19:36:57 0 [Warning] InnoDB: Linux Native AIO disabled.
2020-12-17 19:36:57 0 [Note] InnoDB: Initializing buffer pool, total size = 104857600, chunk size = 104857600
2020-12-17 19:36:57 0 [Note] InnoDB: Completed initialization of buffer pool
2020-12-17 19:36:57 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=62995,67272
FILE_CHECKPOINT(62995) read at 67272
FILE_CHECKPOINT(62995) reread at 67272
2020-12-17 19:36:57 0 [Note] InnoDB: Starting final batch to recover 4 pages from redo log.
2020-12-17 19:36:57 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-12-17 19:36:57 0 [Note] InnoDB: Applying log to page [page id: space=0, page number=312]
2020-12-17 19:36:57 0 [Note] InnoDB: Applying log to page [page id: space=0, page number=313]
2020-12-17 19:36:57 0 [Note] InnoDB: Applying log to page [page id: space=6, page number=3]
2020-12-17 19:36:59 0 [Note] InnoDB: Applying log to page [page id: space=0, page number=51]
mariabackup: /Server/10.6C/storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed.
 
#11 <signal handler called>
#12 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#13 0x0000319e1b0b9859 in __GI_abort () at abort.c:79
#14 0x0000319e1b0b9729 in __assert_fail_base (fmt=0x319e1b24f588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x564864f8dc66 "!buf_pool.any_io_pending()", file=0x564864f8dbc0 "/Server/10.6C/storage/innobase/srv/srv0start.cc", line=1582, function=<optimized out>) at assert.c:92
#15 0x0000319e1b0caf36 in __GI___assert_fail (assertion=0x564864f8dc66 "!buf_pool.any_io_pending()", file=0x564864f8dbc0 "/Server/10.6C/storage/innobase/srv/srv0start.cc", line=1582, function=0x564864f8e7a9 "dberr_t srv_start(bool)") at assert.c:101
#16 0x000056486481d5d6 in srv_start (create_new_db=false) at /Server/10.6C/storage/innobase/srv/srv0start.cc:1582
#17 0x0000564863ba1544 in innodb_init () at /Server/10.6C/extra/mariabackup/xtrabackup.cc:2175
#18 0x0000564863bac6be in xtrabackup_prepare_func (argv=0x51906fb7d308) at /Server/10.6C/extra/mariabackup/xtrabackup.cc:5739
#19 0x0000564863baf094 in main_low (argv=0x51906fb7d308) at /Server/10.6C/extra/mariabackup/xtrabackup.cc:6643
#20 0x0000564863bae8a7 in main (argc=5, argv=0x7ffe969c70e8) at /Server/10.6C/extra/mariabackup/xtrabackup.cc:6435
 
origin/bb-10.6-MDEV-24426, origin/HEAD, origin/10.6, 10.6 c36a2a0d1c03900796dc35c01a745bec8b1b54e2 2020-12-17T14:56:08+02:00



 Comments   
Comment by Matthias Leich [ 2020-12-18 ]

pluto:/data/Server_bin/10.6C_debug/mysql-test/rr_trace_dir
_RR_TRACE_DIR="." rr replay --mark-stdio -p 1026397

Comment by Marko Mäkelä [ 2020-12-18 ]

I was concurrently filing MDEV-24449 for this same issue.

Comment by Marko Mäkelä [ 2020-12-19 ]

At the time of the assertion failure, we have the following thread executing. It would have decremented the buf_pool.n_pend_reads from 1 to 0. Apparently, no log needed to be applied to any of the 4 pages for which redo log existed during this run (the pages were already up to date).

10.6 c36a2a0d1c03900796dc35c01a745bec8b1b54e2

#3  0x00005648646d259f in inline_mysql_mutex_unlock (that=0x5648655b0380 <recv_sys>, src_file=0x564864f17af0 "/Server/10.6C/storage/innobase/log/log0recv.cc", src_line=2515) at /Server/10.6C/include/mysql/psi/mysql_thread.h:831
#4  0x00005648646db2f2 in recv_recover_page (space=0x51906fbab318, bpage=0xbc0250f52a0) at /Server/10.6C/storage/innobase/log/log0recv.cc:2515
#5  0x000056486452a746 in buf_page_read_complete (bpage=0xbc0250f52a0, node=...) at /Server/10.6C/storage/innobase/buf/buf0buf.cc:4068
#6  0x00005648645a8936 in fil_aio_callback (request=...) at /Server/10.6C/storage/innobase/fil/fil0fil.cc:3366
#7  0x0000564864700820 in io_callback (cb=0x51906fc30050) at /Server/10.6C/storage/innobase/os/os0file.cc:3879
#8  0x000056486491aa2f in tpool::simulated_aio::simulated_aio_callback (param=0x51906fc30050) at /Server/10.6C/tpool/aio_simulated.cc:162

The execution of recv_recover_page() is practically over:

	mysql_mutex_unlock(&recv_sys.mutex);
	ut_ad(mtr.has_committed());
}

At the end of the calling function buf_page_read_complete(), we have the following code:

  if (recv_recovery_is_on())
    recv_recover_page(node.space, bpage);
 
  if (bpage->state() == BUF_BLOCK_FILE_PAGE && !recv_no_ibuf_operations &&
      (!id.space() || !is_predefined_tablespace(id.space())) &&
      fil_page_get_type(frame) == FIL_PAGE_INDEX &&
      page_is_leaf(frame))
    bpage->ibuf_exist= true;
 
  if (UNIV_UNLIKELY(MONITOR_IS_ON(MONITOR_MODULE_BUF_PAGE)))
    buf_page_monitor(bpage, BUF_IO_READ);
  DBUG_PRINT("ib_buf", ("read page %u:%u",
                        id.space(), id.page_no()));
 
  /* Because this thread which does the unlocking might not be the same that
  did the locking, we use a pass value != 0 in unlock, which simply
  removes the newest lock debug record, without checking the thread id. */
  if (bpage->state() == BUF_BLOCK_FILE_PAGE)
  {
    buf_block_t *block= reinterpret_cast<buf_block_t*>(bpage);
    block->lock.x_unlock(true);
  }
  bpage->io_unfix();
 
  ut_d(auto n=) buf_pool.n_pend_reads--;
  ut_ad(n > 0);
  buf_pool.stat.n_pages_read++;
 
  return DB_SUCCESS;
}

Contrary to what I initially wrote in MDEV-24449, I no longer think that this could result in redo log apply being skipped. The function recv_sys_t::apply() will wait for recv_sys.pages.empty() to hold. Any other potential bad effect (in the code quoted above) seems unlikely. Any concurrent access to the block (for allocating it for something else) should be prevented by the exclusive page latch that is being held until the end. Even a concurrent call to buf_pool_invalidate() would not seem to pose a real problem.

I was unable to reproduce this failure on my local system, on either 10.5 or 10.6, with or without the fix, using 25×100 repetitions. On buildbot.askmonty.org, the failures have not occurred before the custom InnoDB mutex implementation was eliminated (MDEV-21452). The failure seems to be very sensitive to timing and to compilation options.

Comment by Marko Mäkelä [ 2020-12-21 ]

I made one more attempt of reproducing the assertion failure on 10.5, but it failed to fail for me even with this help:

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index fe691a4b256..4d669472970 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -2480,7 +2480,9 @@ void recv_recover_page(fil_space_t* space, buf_page_t* bpage)
 
 	mtr.commit();
 func_exit:
+	const bool sleep= recv_sys.pages.empty();
 	mutex_exit(&recv_sys.mutex);
+	if (sleep) os_thread_sleep(1000);
 	ut_ad(mtr.has_committed());
 }
 
@@ -2610,7 +2612,7 @@ void recv_sys_t::apply(bool last_batch)
     if (abort)
       return;
 
-    os_thread_sleep(500000);
+    os_thread_sleep(1000);
     mutex_enter(&mutex);
   }
 
@@ -2709,7 +2711,7 @@ void recv_sys_t::apply(bool last_batch)
 
       if (abort)
         return;
-      os_thread_sleep(500000);
+      os_thread_sleep(1000);
       mutex_enter(&mutex);
     }
   }

Comment by Marko Mäkelä [ 2020-12-21 ]

One more test joins the club:

10.6 30dc4287ec3d46bae7593f56383b9f3738e3c4e6

CURRENT_TEST: mariabackup.system_versioning
mysqltest: At line 14: exec of '/mnt/buildbot/build/mariadb-10.6.0/extra/mariabackup/mariabackup  --prepare --target-dir=/mnt/buildbot/build/mariadb-10.6.0/mysql-test/var/4/tmp/backup 2>&1' failed, error: 34304, status: 134, errno: 11
2020-12-19  9:38:01 0 [Note] InnoDB: Starting final batch to recover 6 pages from redo log.
mariabackup: /home/buildbot/buildbot/build/mariadb-10.6.0/storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed.

Comment by Marko Mäkelä [ 2020-12-21 ]

While this was not repeatable on 10.5, I pushed a fix there as well.

Generated at Thu Feb 08 09:30:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.