[MDEV-31253] Freed data pages are not always being scrubbed Created: 2023-05-12  Updated: 2023-06-07  Resolved: 2023-05-15

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1
Fix Version/s: 10.5.22, 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: security

Issue Links:
Problem/Incident
is caused by MDEV-8139 Fix scrubbing Closed
Relates
relates to MDEV-31234 InnoDB does not free UNDO after the f... Closed

 Description   

The initial implementation of MDEV-8139 introduced a condition to prevent violations of the write-ahead-logging protocol:

static void buf_flush_freed_pages(fil_space_t *space)
{
  ut_ad(space != NULL);
  if (!srv_immediate_scrub_data_uncompressed && !space->is_compressed())
    return;
  lsn_t flush_to_disk_lsn= log_sys.get_flushed_lsn();
 
  std::unique_lock<std::mutex> freed_lock(space->freed_range_mutex);
  if (space->freed_ranges.empty()
      || flush_to_disk_lsn < space->get_last_freed_lsn())
  {
    freed_lock.unlock();
    return;
  }

But, there is no logic around log checkpoints that would ensure that all fil_space_t::freed_ranges will be scrubbed before the checkpoint is advanced.

The simplest fix would be to invoke log_write_up_to(flush_to_disk_lsn, true) while not holding any mutex. This should not incur much of a performance regression, given that this code is not enabled by default.

It would be very hard to create a regression test for this, because there is no easy way to control log writes, especially after improvements like MDEV-24341 and MDEV-27774. I found this bug while diagnosing MDEV-31234.

I checked failures of the test innodb.innodb_scrub on the CI system. The majority of them are related to hangs while waiting for all history to be purged. I did not find an example where scrubbing a data page would have been missed.



 Comments   
Comment by Matthias Leich [ 2023-05-15 ]

origin/bb-10.5-MDEV-31254 c9eff1a144ba44846373660a30d342d3f0dc91a5 2023-05-12T15:04:50+03:00
which contains the fix for the current MDEV performed well in RQG testing.

Generated at Thu Feb 08 10:22:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.