[MDEV-28731] Race condition on log checkpoint Created: 2022-06-02  Updated: 2022-06-03  Resolved: 2022-06-02

Status: Closed
Project: MariaDB Server
Component/s: Backup, Storage Engine - InnoDB
Affects Version/s: 10.5.3, 10.6.0, 10.5, 10.6, 10.7, 10.8, 10.9
Fix Version/s: 10.5.17, 10.6.9, 10.7.5, 10.8.4, 10.9.2

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: corruption, recovery, rr-profile-analyzed

Issue Links:
Blocks
Problem/Incident
is caused by MDEV-12353 Efficient InnoDB redo log record format Closed
is caused by MDEV-22107 Restore accidentally orphaned MTR_MEM... Closed

 Description   

This was found while testing MDEV-13542.

mysqld: /data/Server/bb-10.6-MDEV-13542L/storage/innobase/buf/buf0flu.cc:175: void buf_pool_t::insert_into_flush_list(buf_block_t*, lsn_t): Assertion `((&(&log_sys.flush_order_mutex)->m_mutex)->count > 0 && pthread_equal(pthread_self(), (&(&log_sys.flush_order_mutex)->m_mutex)->thread))' failed.

The corresponding source code is:

void buf_pool_t::insert_into_flush_list(buf_block_t *block, lsn_t lsn)
{
  mysql_mutex_assert_not_owner(&mutex);
  mysql_mutex_assert_owner(&log_sys.flush_order_mutex);

The assertion fails, because log_sys.flush_order_mutex is not being held, even though it is supposed to be.

This could lead to incorrect log checkpoints being written, that is, invalid crash recovery or backup. The fix is simple:

diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc
index 2feb5a0583f..f13b677742b 100644
--- a/storage/innobase/mtr/mtr0mtr.cc
+++ b/storage/innobase/mtr/mtr0mtr.cc
@@ -1263,4 +1263,6 @@ void mtr_t::modify(const buf_block_t &block)
   }
   iteration.functor.found->type= static_cast<mtr_memo_type_t>
     (iteration.functor.found->type | MTR_MEMO_MODIFY);
+  if (is_block_dirtied(&block))
+    m_made_dirty= true;
 }

The flag mtr_t::m_made_dirty will inform mtr_t::commit() to acquire the mutex.



 Comments   
Comment by Marko Mäkelä [ 2022-06-03 ]

Even after this fix, we got a failure on FreeBSD 13.0 that I was not able to repeat:

  • In a FreeBSD 13.0 virtual machine with a CMAKE_BUILD_TYPE=RelWithDebInfo build
  • On Linux (liburing or innodb_use_native_aio=0) on ext4fs on NVMe, both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=RelWithDebInfo

The failure looks like this:

10.6 8920a6f80b91592f3bb072d0707ff21362702144

encryption.tempfiles_encrypted 'ctr,innodb' w4 [ pass ]  20339
worker[4] > Restart [mysqld.1 - pid: 9200, winpid: 9200] - running with different options '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --plugin-load-add=file_key_management.so --loose-file-key-management --loose-file-key-management-filename=/usr/local/share/mysql-test/std_data/keys.txt --file-key-management-encryption-algorithm=aes_cbc --skip-innodb-read-only-compressed --innodb-change-buffering=none --innodb-default-encryption-key-id=20' != '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --plugin-load-add=file_key_management.so --loose-file-key-management --loose-file-key-management-filename=/usr/local/share/mysql-test/std_data/keys.txt --file-key-management-encryption-algorithm=aes_ctr --skip-innodb-read-only-compressed --loose-sequence --loose-sequence --encrypt-tmp_files=ON'
encryption.innodb-redo-nokeys 'cbc,innodb' w4 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2022-06-02 15:22:32
line
2022-06-02 15:22:30 0 [ERROR] InnoDB: Missing FILE_CHECKPOINT at 1166935 between the checkpoint 51825 and the end 1232384.
^ Found warnings in /tmp/var/4/log/mysqld.1.err

The test is not encrypting the log; it is testing recovery of encrypted tables when the keys are missing.

Generated at Thu Feb 08 10:03:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.