Details
-
Bug
-
Status: In Testing (View Workflow)
-
Critical
-
Resolution: Unresolved
-
13.0, 13.0.1
-
Can result in data loss
Description
This issue was reported during the testing of the work-in-progress MDEV-14992. There appears to be a race condition between log_t::set_archive() and log_t::write_checkpoint() that allows log_sys.next_checkpoint_no to be set to an inconsistent value.
I tested the following patch on my local system:
diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
|
index cd3fc58c830..437691ffc20 100644
|
--- a/storage/innobase/buf/buf0flu.cc
|
+++ b/storage/innobase/buf/buf0flu.cc
|
@@ -2004,8 +2004,6 @@ inline lsn_t log_t::write_checkpoint(lsn_t checkpoint, lsn_t end_lsn) noexcept
|
(C[-1] && my_betoh64(C[-1]) < d));
|
ut_ad(!*C || offset + o == START_OFFSET - 8);
|
*C= my_htobe64(d);
|
- if (resize_log.m_file == OS_FILE_CLOSED)
|
- resize_log= log; /* Block concurrent set_archive() */
|
goto write_checkpoint;
|
}
|
|
@@ -2040,6 +2038,8 @@ inline lsn_t log_t::write_checkpoint(lsn_t checkpoint, lsn_t end_lsn) noexcept
|
{
|
write_checkpoint:
|
ut_ad(!is_mmap());
|
+ if (resize_log.m_file == OS_FILE_CLOSED)
|
+ resize_log= log; /* Block concurrent set_archive() */
|
latch.wr_unlock();
|
log_write_and_flush_prepare();
|
resizing= resize_lsn.load(std::memory_order_relaxed);
|
@@ -2073,6 +2073,15 @@ inline lsn_t log_t::write_checkpoint(lsn_t checkpoint, lsn_t end_lsn) noexcept
|
{
|
archived_checkpoint= checkpoint;
|
archived_lsn= end_lsn;
|
+ checkpoint_completed:
|
+ if (resize_log.m_file == log.m_file)
|
+ {
|
+ /* We may have assigned resize_log= log to keep set_archived() out. */
|
+#ifdef HAVE_PMEM
|
+ ut_ad(!is_mmap());
|
+#endif
|
+ resize_log.m_file= OS_FILE_CLOSED;
|
+ }
|
}
|
else if (archive_header_was_reset)
|
{
|
@@ -2095,14 +2104,8 @@ inline lsn_t log_t::write_checkpoint(lsn_t checkpoint, lsn_t end_lsn) noexcept
|
#endif
|
innodb_backup_checkpoint();
|
}
|
- else if (resize_log.m_file == log.m_file)
|
- {
|
- /* We may have assigned resize_log= log to keep set_archived() out. */
|
-#ifdef HAVE_PMEM
|
- ut_ad(!is_mmap());
|
-#endif
|
- resize_log.m_file= OS_FILE_CLOSED;
|
- }
|
+ else
|
+ goto checkpoint_completed;
|
|
DBUG_PRINT("ib_log", ("checkpoint ended at " LSN_PF ", flushed to " LSN_PF,
|
checkpoint, get_flushed_lsn())); |
With this patch and cmake -DWITH_INNODB_PMEM=OFF (to exercise this code path), the default regression test suites are passing for me.
I believe that to assess the correctness of this change, it will have to be tested on top of the same work-in-progress MDEV-14992 change for which the original failure was repeated, while executing a BACKUP SERVER statement that is being implemented in that branch. It will also have to be tested on top of the 13.0 branch, by executing the following concurrently with a write heavy workload, preferrably while using the minimum innodb_log_file_size=4m or something close to it:
SET GLOBAL innodb_log_archive=ON; |
SET GLOBAL innodb_log_archive=OFF; |
This bug only affects the pwrite based code path for log writes. By default (WITH_INNODB_PMEM=ON, if the log is stored in /dev/shm, it will be written via a memory mapping.
Attachments
Issue Links
- blocks
-
MDEV-14992 BACKUP SERVER to mounted file system
-
- In Progress
-
- is caused by
-
MDEV-37949 Implement innodb_log_archive
-
- Closed
-