[MDEV-16092] Crash in encryption.create_or_replace Created: 2018-05-04  Updated: 2018-05-04  Resolved: 2018-05-04

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.1, 10.2, 10.3
Fix Version/s: 10.1.33, 10.2.15, 10.3.7

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: crash, encryption


 Description   

The test encryption.create_or_replace can crash because a non-existent page is being requested:

10.1 74abc32d308cd4f9a23c4f897a76ea75c85a18c9

2018-04-30 16:54:47 140562482276096 [ERROR] InnoDB: Unable to read tablespace 913 page no 4 into the buffer pool after 100 attempts. The most probable cause of this error may be that the table has been corrupted. You can try to fix this problem by using innodb_force_recovery. Please see http://dev.mysql.com/doc/refman/5.6/en/ for more details. Aborting...

The stack trace in the error log does not disclose the caller of buf_page_get_gen(), but jplindst suggested that it could be the function fil_crypt_get_page_throttle_func(). Indeed, there seems to be a race condition between DROP TABLE and that function:

	buf_block_t* block = buf_page_try_get_func(space->id, offset, RW_X_LATCH,
						   true,
						   file, line, mtr);
	if (block != NULL) {
		/* page was in buffer pool */
		state->crypt_stat.pages_read_from_cache++;
		return block;
	}
 
	/* Before reading from tablespace we need to make sure that
	tablespace exists and is not is just being dropped. */
	if (space->is_stopping()) {
		return NULL;
	}
 
	state->crypt_stat.pages_read_from_disk++;
 
	ullint start = ut_time_us(NULL);
	block = buf_page_get_gen(space->id, zip_size, offset,
				 RW_X_LATCH,
				 NULL, BUF_GET_POSSIBLY_FREED,
				 file, line, mtr);

If the tablespace is dropped or truncated after the space->is_stopping() check, we would still proceed to request the page, and hit the error.

It seems that the simplest action would be to skip the retry logic when the block cannot be found in this case. One other caller would have to be adjusted for NULL return values:

diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
index e3c2337659e..cd9f08b96e8 100644
--- a/storage/innobase/buf/buf0buf.cc
+++ b/storage/innobase/buf/buf0buf.cc
@@ -3189,6 +3189,11 @@ buf_page_get_gen(
 					      ibuf_inside(mtr));
 
 			retries = 0;
+		} else if (mode == BUF_GET_POSSIBLY_FREED) {
+			if (err) {
+				*err = local_err;
+			}
+			return NULL;
 		} else if (retries < BUF_PAGE_READ_MAX_RETRIES) {
 			++retries;
 
diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc
index 12c0051d09f..441fcace7c3 100644
--- a/storage/innobase/lock/lock0lock.cc
+++ b/storage/innobase/lock/lock0lock.cc
@@ -6895,10 +6895,10 @@ lock_rec_block_validate(
 			page_no, RW_X_LATCH, NULL,
 			BUF_GET_POSSIBLY_FREED,
 			__FILE__, __LINE__, &mtr);
-
-		buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK);
-
-		ut_ad(lock_rec_validate_page(block));
+		if (block) {
+			buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK);
+			ut_ad(lock_rec_validate_page(block));
+		}
 		mtr_commit(&mtr);
 
 		fil_space_release(space);


Generated at Thu Feb 08 08:26:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.