Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
N/A
Description
In MDEV-27774, the log_sys.mutex and log_sys.flush_order_mutex were replaced with a shared log_sys.latch. This means that concurrently executing mtr_t::commit() may insert records to the buf_pool.flush_list roughly concurrently. Each insertion is still protected by buf_pool.flush_list_mutex.
In buf_pool_t::insert_into_flush_list() we attempted to compensate for this by not unconditionally inserting blocks first in buf_pool.flush_list but by searching for an appropriate insert position. This compensation does not appear to be working at all times. With the following command, I was able to reproduce a crash on my system:
./mtr --parallel=auto --repeat=100 encryption.innodb_encryption_filekeys{,,,,}{,,,}
|
10.8 8251a9fb93075a72074bd7fd10faee5165014b7f |
encryption.innodb_encryption_filekeys 'cbc,innodb' w1 [ 26 fail ]
|
Test ended at 2022-02-17 10:28:31
|
|
CURRENT_TEST: encryption.innodb_encryption_filekeys
|
|
|
Server [mysqld.1 - pid: 488142, winpid: 488142, exit: 256] failed during test run
|
Server log from this test:
|
----------SERVER LOG START-----------
|
2022-02-17 10:28:30 104 [Note] InnoDB: Creating #1 encryption thread id 140526397404736 total threads 4.
|
2022-02-17 10:28:30 104 [Note] InnoDB: Creating #2 encryption thread id 140526389012032 total threads 4.
|
2022-02-17 10:28:30 104 [Note] InnoDB: Creating #3 encryption thread id 140526405797440 total threads 4.
|
2022-02-17 10:28:30 104 [Note] InnoDB: Creating #4 encryption thread id 140526414190144 total threads 4.
|
mariadbd: /mariadb/10.8/storage/innobase/buf/buf0flu.cc:2538: void buf_flush_validate_low(): Assertion `om == 1 || !bpage || __builtin_expect(recv_sys.recovery_on, (0)) || om >= bpage->oldest_modification()' failed.
|
The assertion reports that buf_pool.flush_list is not ordered by buf_page_t::oldest_modification(), like it must be.
The impact of this bug is that log checkpoints and thus crash recovery and backup may work incorrectly.
Attachments
Issue Links
- causes
-
MDEV-28708 Increased congestion on buf_pool.flush_list_mutex
-
- Closed
-
- is caused by
-
MDEV-27774 Reduce scalability bottlenecks in mtr_t::commit()
-
- Closed
-
- relates to
-
MDEV-25113 Reduce effect of parallel background flush on select workload
-
- Closed
-
Another failure:
CURRENT_TEST: encryption.innodb-checksum-algorithm
mysqltest: At line 53: query 'ALTER TABLE tc DISCARD TABLESPACE' failed: <Unknown> (2013): Lost connection to server during query
…
buf/buf0flu.cc:2585(buf_flush_validate_low())[0x55d918032422]
buf/buf0flu.cc:112(buf_flush_validate_skip())[0x55d9180367fd]
buf/buf0flu.cc:220(buf_pool_t::delete_from_flush_list(buf_page_t*, bool))[0x55d918036a4c]
buf/buf0flu.cc:254(buf_flush_remove_pages(unsigned int))[0x55d917ef81d5]
row/row0mysql.cc:2543(row_discard_tablespace_for_mysql(dict_table_t*, trx_t*))[0x55d917d74fc3]