Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.8.1, 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL), 11.4
Description
The failure is observed in st-10.11-MDEV-33361. Since the assert looks unrelated to the change in the branch, the issue must be from the base branch.
The assert is introduced in MDEV-14425 and likely to have introduced the issue. I am setting the affected version as 10.8.1 when MDEV-14425 was merged.
https://buildbot.mariadb.org/#/builders/234/builds/26196/steps/9/logs/stdio
Test: innodb.import_corrupted
Branch: st-10.11-MDEV-33361
Commit: 93189df44
Platform: amd64-windows
innodb.import_corrupted 'innodb' w6 [ fail ] |
Test ended at 2024-02-02 10:01:21
|
|
CURRENT_TEST: innodb.import_corrupted
|
mysqltest: At line 65: query 'ALTER TABLE t2 IMPORT TABLESPACE' failed with wrong errno <Unknown> (2013): 'Lost connection to server during query', instead of ER_NOT_KEYFILE (1034)... |
Server [mysqld.1 - pid: 47680, winpid: 47680, exit: 0] failed during test run |
Server log from this test: |
----------SERVER LOG START-----------
|
2024-02-02 10:01:20 124 [Note] InnoDB: Sync to disk of `test`.`t1` started. |
2024-02-02 10:01:20 124 [Note] InnoDB: Stopping purge |
2024-02-02 10:01:21 124 [Note] InnoDB: Writing table metadata to './test/t1.cfg' |
2024-02-02 10:01:21 124 [Note] InnoDB: Table `test`.`t1` flushed to disk |
2024-02-02 10:01:21 124 [Note] InnoDB: Deleting the meta-data file './test/t1.cfg' |
2024-02-02 10:01:21 124 [Note] InnoDB: Resuming purge |
2024-02-02 10:01:21 124 [Note] InnoDB: Importing tablespace for table 'test/t1' that was exported from host 'WIN-5PHL1LR4M8G' |
2024-02-02 10:01:21 124 [Note] InnoDB: Phase I - Update all pages |
2024-02-02 10:01:21 124 [Note] InnoDB: Sync to disk |
2024-02-02 10:01:21 124 [Note] InnoDB: Sync to disk - done! |
2024-02-02 10:01:21 124 [Note] InnoDB: Phase II - Purge records from index `PRIMARY` |
Assertion failed: oldest_lsn > log_sys.last_checkpoint_lsn, file C:\Buildbot\amd64-windows\build\storage\innobase\buf\buf0flu.cc, line 1982 |
server.dll!my_sigabrt_handler()[my_thr_init.c:465] |
ucrtbase.dll!raise()
|
ucrtbase.dll!abort()
|
ucrtbase.dll!_get_wpgmptr()
|
ucrtbase.dll!_wassert()
|
server.dll!log_checkpoint_low()[buf0flu.cc:1982] |
server.dll!log_checkpoint()[buf0flu.cc:2042] |
server.dll!buf_flush_page_cleaner()[buf0flu.cc:2509] |
Attachments
Issue Links
- relates to
-
MDEV-22027 Assertion oldest_lsn >= log_sys.last_checkpoint_lsn failed in log_checkpoint()
-
- Closed
-
-
MDEV-14425 Change the InnoDB redo log format to reduce write amplification
-
- Closed
-
-
MDEV-32269 InnoDB after ALTER TABLE…IMPORT TABLESPACE may not be crash safe
-
- Closed
-
Here is some code up to the failing assertion in log_checkpoint_low():
10.11 93189df44eeb431110bf9ca759bafa856ae90c82
log_sys.latch.wr_unlock();
}
ut_ad(!recv_no_log_write);
ut_ad(oldest_lsn > log_sys.last_checkpoint_lsn);
This would seem to guarantee that the equality cannot hold. So, we must have had an oldest_lsn that is smaller than the log_sys.last_checkpoint_lsn. That would seem to be a sign of a severe error in the checkpoint logic.
The parameter oldest_lsn was computed by the following in log_checkpoint():
10.11 93189df44eeb431110bf9ca759bafa856ae90c82
log_sys.latch.wr_lock(SRW_LOCK_CALL);
mysql_mutex_lock(&buf_pool.flush_list_mutex);
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
The field log_sys.last_checkpoint_lsn is protected by log_sys.latch (and write_lock and flush_lock) when log_t::write_checkpoint() is modifying it. Also the end_lsn=log_sys.lsn is protected by the exclusive log_sys.latch. Here are some local variables from the stack trace:
10.11 93189df44eeb431110bf9ca759bafa856ae90c82
ucrtbase!wassert
Unable to enumerate locals, Win32 error 0n87
Private symbols (symbols.pri) are required for locals.
Type ".hh dbgerr005" for details.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!log_checkpoint_low [C:\Buildbot\amd64-windows\build\storage\innobase\buf\buf0flu.cc @ 1982]
unsigned int64 oldest_lsn = 0xdbbfe9
unsigned int64 end_lsn = 0xdc0a71
unsigned int64 flush_lsn = 0x00000216`d35ab7b0
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!log_checkpoint [C:\Buildbot\amd64-windows\build\storage\innobase\buf\buf0flu.cc @ 2042]
unsigned int64 oldest_lsn = 0xdbbfe9
unsigned int64 end_lsn = 0xdc0a71
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!buf_flush_page_cleaner [C:\Buildbot\amd64-windows\build\storage\innobase\buf\buf0flu.cc @ 2509]
unsigned int64 n = 0
unsigned int64 dirty_blocks = 0x1b5
unsigned int64 n_flushed = 0
unsigned int64 oldest_lsn = 0xdbbfe9
double pct_lwm = 0
unsigned int64 soft_lsn_limit = 0
double dirty_pct = 87.2255489021956123
unsigned int64 lsn_limit = 0
struct timespec abstime = struct timespec
unsigned int64 last_pages = 0x71a
unsigned int64 last_activity_count = 0
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
The variable flush_lsn, whose scope had not started before the failing assertion was evaluated, is garbage, but the rest looks reasonable.
The oldest modified block in the buf_pool.flush_list must have an buf_page_t::oldest_modification_ of 0xdbbfe9. For some reason, the latest checkpoint LSN (log_sys.last_checkpoint_lsn) is larger than this. This means that the log checkpoint had been advanced prematurely on a previous invocation of log_checkpoint_low(), while some older page modifications had not yet been written back from the buffer pool.