Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.5
Description
The fix of MDEV-24569 and MDEV-24695 introduced a race condition when a table is being rebuilt or altered during the check. The following patch illustrates it in the MDEV-24695 case:
diff --git a/storage/innobase/fil/fil0crypt.cc b/storage/innobase/fil/fil0crypt.cc
|
index 2a8fa875e90..f9ee1c0619d 100644
|
--- a/storage/innobase/fil/fil0crypt.cc
|
+++ b/storage/innobase/fil/fil0crypt.cc
|
@@ -1771,6 +1771,8 @@ fil_crypt_get_page_throttle_func(
|
return NULL;
|
}
|
|
+ os_thread_sleep(10000);
|
+
|
if (fseg_page_is_free(space, state->offset)) {
|
/* page is already freed */
|
return NULL; |
With this patch, the server will crash more easily during a test. Here is one example (it can also crash at other points of the test):
10.5 7fb528d7226819063046c6a70a21511b3620e844 |
CURRENT_TEST: encryption.create_or_replace
|
mysqltest: At line 68: query 'reap;' failed: 2013: Lost connection to MySQL server during query
|
…
|
----------SERVER LOG START-----------
|
2021-02-15 8:50:22 10 [Note] InnoDB: Creating #1 encryption thread id 140015958992640 total threads 4.
|
2021-02-15 8:50:22 10 [Note] InnoDB: Creating #2 encryption thread id 140015967385344 total threads 4.
|
2021-02-15 8:50:22 10 [Note] InnoDB: Creating #3 encryption thread id 140015396976384 total threads 4.
|
2021-02-15 8:50:22 10 [Note] InnoDB: Creating #4 encryption thread id 140015950599936 total threads 4.
|
2021-02-15 8:50:23 0 [Note] InnoDB: trying to read page [page id: space=559, page number=0] in nonexisting or being-dropped tablespace
|
…
|
2021-02-15 8:50:23 0 [Note] InnoDB: trying to read page [page id: space=559, page number=0] in nonexisting or being-dropped tablespace
|
2021-02-15 8:50:23 0 [ERROR] [FATAL] InnoDB: Unable to read page [page id: space=559, page number=0] into the buffer pool after 100. The most probable cause of this error may be that the table has been corrupted. See https://mariadb.com/kb/en/library/innodb-recovery-modes/
|
210215 8:50:23 [ERROR] mysqld got signal 6 ;
|
…
|
#5 0x00007f581d3ce537 in __GI_abort () at abort.c:79
|
#6 0x000056019735deca in ib::fatal::~fatal (this=0x7f57dfffe120) at /mariadb/10.5m/storage/innobase/ut/ut0ut.cc:576
|
#7 0x00005601973d1ad0 in buf_page_get_low (page_id={m_id = 2400886718464}, zip_size=0, zip_size@entry=140011974888008, rw_latch=1, rw_latch@entry=140011974887864, guess=<optimized out>, guess@entry=0x22f00000000, mode=<optimized out>, mode@entry=0, file=0x56019788eee0 "/mariadb/10.5m/storage/innobase/fsp/fsp0fsp.cc", file@entry=0x7f57dfffe368 "\001", line=414, mtr=0x7f57dfffe368, err=0x0, allow_ibuf_merge=<optimized out>) at /mariadb/10.5m/storage/innobase/buf/buf0buf.cc:3075
|
#8 0x00005601973d37d5 in buf_page_get_gen (page_id=<optimized out>, zip_size=<optimized out>, zip_size@entry=0, rw_latch=<optimized out>, rw_latch@entry=1, guess=<optimized out>, guess@entry=0x0, mode=<optimized out>, mode@entry=10, file=<optimized out>, line=16, mtr=0x7f57dfffe388, err=0x7f57dfffe388, allow_ibuf_merge=<optimized out>) at /mariadb/10.5m/storage/innobase/buf/buf0buf.cc:3479
|
#9 0x0000560197473ea8 in xdes_get_descriptor_const (space=0x7f57140709b8, page=<optimized out>, offset=4, mtr=<optimized out>) at /mariadb/10.5m/storage/innobase/fsp/fsp0fsp.cc:413
|
#10 fseg_page_is_free (space=space@entry=0x7f57140709b8, page=4) at /mariadb/10.5m/storage/innobase/fsp/fsp0fsp.cc:2576
|
#11 0x00005601974698c3 in fil_crypt_get_page_throttle_func (state=<optimized out>, offset=<optimized out>, mtr=<optimized out>, sleeptime_ms=<optimized out>, file=<optimized out>, line=1849) at /mariadb/10.5m/storage/innobase/fil/fil0crypt.cc:1776
|
#12 fil_crypt_rotate_page (key_state=<optimized out>, state=<optimized out>) at /mariadb/10.5m/storage/innobase/fil/fil0crypt.cc:1847
|
#13 fil_crypt_rotate_pages (key_state=<optimized out>, state=<optimized out>) at /mariadb/10.5m/storage/innobase/fil/fil0crypt.cc:1978
|
#14 fil_crypt_thread () at /mariadb/10.5m/storage/innobase/fil/fil0crypt.cc:2169
|
The fix (provided by thiru) is simple: Do not insist in fseg_page_is_free() that the tablespace is still ‘open for business’:
diff --git a/storage/innobase/fsp/fsp0fsp.cc b/storage/innobase/fsp/fsp0fsp.cc
|
index 650bf99695e..93b2744c6b7 100644
|
--- a/storage/innobase/fsp/fsp0fsp.cc
|
+++ b/storage/innobase/fsp/fsp0fsp.cc
|
@@ -410,8 +410,11 @@ xdes_get_descriptor_const(
|
|
const ulint zip_size = space->zip_size();
|
|
- if (buf_block_t* block = buf_page_get(page_id_t(space->id, page),
|
- zip_size, RW_S_LATCH, mtr)) {
|
+ if (buf_block_t* block = buf_page_get_gen(
|
+ page_id_t(space->id, page),
|
+ zip_size, RW_S_LATCH, NULL,
|
+ BUF_GET_POSSIBLY_FREED, __FILE__, __LINE__,
|
+ mtr)) {
|
buf_block_dbg_add_level(block, SYNC_FSP_PAGE);
|
|
ut_ad(page != 0 || space->free_limit == mach_read_from_4( |
The function xdes_get_descriptor_const() is only invoked by fseg_page_is_free(). Hence, the change cannot break any other code. With the fix and the above sleep, the test encryption.create_or_replace has passed for more than 6,500 times.
The crash should be much easier to encounter in the MDEV-24695 case. The crash in the change buffer is much harder to hit, but it should be possible in the MDEV-24569 scenario when ibuf_read_merge_pages() is triggered and ibuf_merge_or_delete_for_page() will be invoked on a tablespace that is being rebuilt, truncated, or dropped.
This problem (that the fil_space_t::STOPPING flag can be set during concurrent access) only affects code that is not protected by MDL, that is, the change buffer and the fil_crypt_thread(). All other access seems to be fine, except fil_crypt_read_crypt_data() is missing a BUF_GET_POSSIBLY_FREED flag:
diff --git a/storage/innobase/fil/fil0crypt.cc b/storage/innobase/fil/fil0crypt.cc
|
--- a/storage/innobase/fil/fil0crypt.cc
|
+++ b/storage/innobase/fil/fil0crypt.cc
|
@@ -991,10 +991,13 @@ fil_crypt_read_crypt_data(fil_space_t* space)
|
const ulint zip_size = space->zip_size();
|
mtr_t mtr;
|
mtr.start();
|
- if (buf_block_t* block = buf_page_get(page_id_t(space->id, 0),
|
- zip_size, RW_S_LATCH, &mtr)) {
|
+ if (buf_block_t* block = buf_page_get_gen(page_id_t(space->id, 0),
|
+ zip_size, RW_S_LATCH,
|
+ nullptr,
|
+ BUF_GET_POSSIBLY_FREED,
|
+ __FILE__, __LINE__, &mtr)) {
|
mutex_enter(&fil_system.mutex);
|
- if (!space->crypt_data) {
|
+ if (!space->crypt_data && !space->is_stopping()) {
|
space->crypt_data = fil_space_read_crypt_data(
|
zip_size, block->frame);
|
} |
Attachments
Issue Links
- is caused by
-
MDEV-24569 Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed in log_phys_t::apply
- Closed
-
MDEV-24695 Encryption is modifying a freed page
- Closed