Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.8.0, 10.5.9, 10.6.0, 10.7.0, 10.5, 10.6, 10.7(EOL), 10.8(EOL)
Description
On our CI systems, on builders that run on real storage and not RAM disk, we see occasional failures of the IMPORT TABLESPACE tests, because a wait for a log checkpoint is hanging with a stack trace like this:
buf_flush_wait_flushed
|
log_make_checkpoint
|
row_import_cleanup
|
row_import_for_mysql
|
Actually, the checkpoint there should be unnecessary, but that is not the main point here.
I was able to reproduce this on ext4fs on storage with 4096-byte physical block size using a RelWithDebInfo build. Previous attempts on an NVMe with 512-byte physical block size failed.
innodb.innodb-wl5522 'innodb,strict_crc32' w11 [ 7 fail ] timeout after 900 seconds
|
The test invocation was:
./mtr --parallel=100 --repeat=100 {innodb.innodb-wl5522{,-1},innodb_zip.wl5522_zip}{,,,,,,,,,,,,,,,,,,} |
I think that applying the first commit of MDEV-26827 (to invoke buf_flush_list() from fewer threads) might fix this. That commit is also removing the log_make_checkpoint() call from row_import_cleanup(), but when testing the fix we obviously must retain that call, because our test case cannot fail if that call is not present.
Attachments
Issue Links
- causes
-
MDEV-27499 tps regression with 10.8 (w.r.t to 10.6.5)
-
- Closed
-
- is caused by
-
MDEV-24278 InnoDB page cleaner keeps waking up on idle server
-
- Closed
-
Unfortunately, the code cleanup did not fix this:
innodb.innodb-wl5522 'innodb,strict_crc32' w52 [ 5 fail ] timeout after 900 seconds
…
#3 0x000055a52aed8b8b in buf_flush_wait (lsn=2592590) at /home/marko/server/storage/innobase/buf/buf0flu.cc:1825
#4 0x000055a52a7f16b7 in buf_flush_wait_flushed (sync_lsn=2592590) at /home/marko/server/storage/innobase/buf/buf0flu.cc:1881
#5 0x000055a52a7f171f in log_make_checkpoint () at /usr/include/c++/9/bits/atomic_base.h:413
#6 0x000055a52a7d9853 in row_import_cleanup (prebuilt=0x7f174809d2a8, err=DB_ERROR) at /home/marko/server/storage/innobase/row/row0import.cc:2192
The core dump excludes the buffer pool, so I will have to revert
MDEV-10814and try again (tomorrow).