Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
N/A
Description
In MDEV-27774, the log_sys.mutex and log_sys.flush_order_mutex were replaced with a shared log_sys.latch. This means that concurrently executing mtr_t::commit() may insert records to the buf_pool.flush_list roughly concurrently. Each insertion is still protected by buf_pool.flush_list_mutex.
In buf_pool_t::insert_into_flush_list() we attempted to compensate for this by not unconditionally inserting blocks first in buf_pool.flush_list but by searching for an appropriate insert position. This compensation does not appear to be working at all times. With the following command, I was able to reproduce a crash on my system:
./mtr --parallel=auto --repeat=100 encryption.innodb_encryption_filekeys{,,,,}{,,,}
|
10.8 8251a9fb93075a72074bd7fd10faee5165014b7f |
encryption.innodb_encryption_filekeys 'cbc,innodb' w1 [ 26 fail ]
|
Test ended at 2022-02-17 10:28:31
|
|
CURRENT_TEST: encryption.innodb_encryption_filekeys
|
|
|
Server [mysqld.1 - pid: 488142, winpid: 488142, exit: 256] failed during test run
|
Server log from this test:
|
----------SERVER LOG START-----------
|
2022-02-17 10:28:30 104 [Note] InnoDB: Creating #1 encryption thread id 140526397404736 total threads 4.
|
2022-02-17 10:28:30 104 [Note] InnoDB: Creating #2 encryption thread id 140526389012032 total threads 4.
|
2022-02-17 10:28:30 104 [Note] InnoDB: Creating #3 encryption thread id 140526405797440 total threads 4.
|
2022-02-17 10:28:30 104 [Note] InnoDB: Creating #4 encryption thread id 140526414190144 total threads 4.
|
mariadbd: /mariadb/10.8/storage/innobase/buf/buf0flu.cc:2538: void buf_flush_validate_low(): Assertion `om == 1 || !bpage || __builtin_expect(recv_sys.recovery_on, (0)) || om >= bpage->oldest_modification()' failed.
|
The assertion reports that buf_pool.flush_list is not ordered by buf_page_t::oldest_modification(), like it must be.
The impact of this bug is that log checkpoints and thus crash recovery and backup may work incorrectly.
Attachments
Issue Links
- causes
-
MDEV-28708 Increased congestion on buf_pool.flush_list_mutex
- Closed
- is caused by
-
MDEV-27774 Reduce scalability bottlenecks in mtr_t::commit()
- Closed
- relates to
-
MDEV-25113 Reduce effect of parallel background flush on select workload
- Closed