[MDEV-8139] Fix scrubbing Created: 2015-05-11  Updated: 2023-09-12  Resolved: 2020-07-20

Status: Closed
Project: MariaDB Server
Component/s: Encryption, Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.1.4, 10.2.0, 10.3.0, 10.4.0, 10.5.0
Fix Version/s: 10.5.5

Type: Bug Priority: Major
Reporter: Sergei Golubchik Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Blocks
blocks MDEV-17596 Assertion `block->page.flush_observer... Closed
Duplicate
is duplicated by MDEV-20476 Assertion `!"wrong page type"' failed... Closed
Problem/Incident
causes MDEV-22911 Fix the Valgrind & MSAN instrumentati... Closed
causes MDEV-22931 mtr_t::mtr_t() allocates some memory Closed
causes MDEV-22970 Possible corruption of page_compresse... Closed
causes MDEV-24445 Using innodb_undo_tablespaces corrupt... Closed
causes MDEV-24765 fseg_free_extent fails to call buf_pa... Closed
causes MDEV-31253 Freed data pages are not always being... Closed
Relates
relates to MDEV-11802 innodb.innodb_bug14676111 fails in bu... Closed
relates to MDEV-12288 Reset DB_TRX_ID when the history is r... Closed
relates to MDEV-13536 DB_TRX_ID is not actually being reset... Closed
relates to MDEV-14425 Change the InnoDB redo log format to ... Closed
relates to MDEV-15528 Avoid writing freed InnoDB pages Closed
relates to MDEV-32151 innodb_immediate_scrub_data_uncompres... Closed
relates to MDEV-11068 Review which innodb_compression_algor... Closed
relates to MDEV-15949 InnoDB: Failing assertion: space->n_p... Closed
relates to MDEV-20813 Assertion `!srv_safe_truncate || !new... Closed
relates to MDEV-24780 [FATAL] InnoDB: Trying to read page n... Closed
Sprint: 10.1.6-1, 10.1.6-2, 10.1.21

 Description   

Scrubbing tests somehow got encryption enabled in their opt files. After removing --innodb-encrypt-tables=ON and --innodb-encrypt-log=ON from innodb_scrub.opt, innodb_scrub_background.opt, innodb_scrub_compressed.opt, these tests start to fail.

Now that the problematic ‘background scrubbing’ and all scrubbing code that incorrectly skipped redo logging has been removed from MariaDB Server 10.5.2, we should continue where MDEV-15528 left off, and finally enable the scrubbing tests to prove that the theoretically sound way of scrubbing actually works:

  1. Remove the invalidate parameter of btr_free_root()
  2. Introduce a data structure that allows us to mark a range of data pages of a tablespace as freed, even when those pages are not in the buffer pool. This could be attached to fil_space_t.
  3. Ensure that crash recovery will replay the FREE_PAGE records (extend the above mentioned ranges of freed pages), and punch holes in page_compressed tables, or overwrite blocks with zeroes if innodb_immediate_scrub_data_uncompressed is set.
  4. On page flush, try to combine large punch_hole or zero-initialization requests to a small one.
  5. Optional: when freeing the last pages of a tablespace, write a TRIM_PAGES record, and on page flush, invoke ftruncate() to shrink the file. (Do this on any file, when neither scrubbing nor page_compressed is being used.)


 Comments   
Comment by Jan Lindström (Inactive) [ 2015-05-27 ]

innodb_scrub_background done. Not yet sure what is wrong with two others.

Comment by Jan Lindström (Inactive) [ 2015-07-01 ]

I do not know, sometimes test passes for 5 times and then fails. Sometimes less.

Comment by Marko Mäkelä [ 2017-01-04 ]

encryption.innodb_scrub fails to do a slow shutdown, and it unnecessarily does multiple shutdowns while it could easily test all combinations of row_format and operations with a single restart.
Once I fixed that, I get a test failure once in 10 to 20 runs. I believe that that is due to the shutdown bug MDEV-11638.

Comment by Marko Mäkelä [ 2017-01-04 ]

While cleaning up encryption.innodb_scrub_background I noticed some problems with the background scrubbing itself:

  1. Undo log pages fail to get scrubbed. The data for INDEX(b) is clearly visible. This is repeatable in encryption.innodb_scrub by adding INDEX(b) to each table.
  2. Sometimes, background scrubbing fails also for tables.
Comment by Marko Mäkelä [ 2017-01-04 ]

bb-10.1-mdev-8139

Comment by Jan Lindström (Inactive) [ 2017-01-05 ]

ok to push

Comment by Marko Mäkelä [ 2017-01-05 ]

The test encryption.innodb_scrub is now enabled in MariaDB 10.1.21. It is still disabled in MariaDB 10.2.4, because the end of page 2 (inode list page) of delete_3.ibd randomly gets corrupted with the string "repairmanun". That string should only have been written to B-tree pages, not to an allocation metadata page. This looks like a sign of a serious corruption bug in MariaDB 10.2.

The test encryption.innodb_scrub_background demonstrates that more work is needed in the scrubbing. Clearly, undo pages are not being scrubbed as they should, and background scrubbing of data files does not work reliably.

Even though I pushed some commits related to the tests, this issue must remain open until all unresolved issues have been addressed.

Comment by Marko Mäkelä [ 2017-01-05 ]

The test below (reduced from encryption.innodb_scrub) demonstrates 2 problems in 10.2.4:

  1. despite ROW_FORMAT=COMPRESSED, delete_3.ibd contains the strings "infimum" and "supremum" which should not be present in that format (see page_zip_compress())
  2. At byte offset 0x9ff5 to 0x9fff (when using innodb_page_size=16k) we sometimes get the string "repairmanun". This seems to be at the very end of a page where we should have some page trailer (at the minimum, 4 bytes for the LSN). Index pages would additionally have a directory of records at the end of a page. It is as if some bytes were written to the wrong page.

-- source include/have_innodb.inc
-- source include/not_embedded.inc
-- source include/have_example_key_management_plugin.inc
 
let $MYSQLD_DATADIR=`select @@datadir`;
let $t= delete_3;
 
eval create table $t (
  a int auto_increment primary key,
  b varchar(256),
  c text) engine = innodb row_format=compressed;
 
let $numinserts = 500;
--disable_query_log
begin;
while ($numinserts)
{
  dec $numinserts;
  eval insert into $t(b,c) values ('repairman', repeat('unicycle', 1000));
}
commit;
--enable_query_log
eval delete from $t;
SET GLOBAL innodb_fast_shutdown=0;
-- source include/shutdown_mysqld.inc
 
let SEARCH_ABORT= FOUND;
let SEARCH_PATTERN= (un|b|tr)icycle|(repair|breakhu|wonderwo)man;
let SEARCH_RANGE= 12582912;
let SEARCH_FILE= $MYSQLD_DATADIR/test/$t.ibd;
-- source include/search_pattern_in_file.inc
-- source include/start_mysqld.inc
eval drop table $t;

Note: Remember to copy encryption/t/innodb_scrub.opt if you write the above test into a new file.

Comment by Marko Mäkelä [ 2017-01-05 ]

I am able to repeat the "repairmanun" problem on MariaDB 10.1.21. I can also see "infimum" and "supremum" strings in the file. FSP_SPACE_FLAGS contains 0x29 both in 10.1 and 10.2.

Comment by Marko Mäkelä [ 2017-09-13 ]

I think that the scrubbing feature should be an integral part of the purge subsystem. Alas, as noted in MDEV-11802, purge does not always run when there is work to do.

We should redesign this in a later version (10.3 or 10.4) where MDEV-12288 and MDEV-13536 are available.

Comment by Marko Mäkelä [ 2018-03-14 ]

I believe that the scrubbing as it is implemented in MariaDB 10.1 to 10.3 is incomplete and cannot be fixed easily.

I think that the correct way to implement scrubbing would be to introduce a new redo log record MLOG_PAGE_FREE for marking a page freed. It would be written in the same mini-transaction that marks the page freed in the allocation bitmap.

We’d then modify the page flushing logic so that it would (when scrubbing is enabled) write zeroes or (for page_compressed=1 files) punch holes for the freed pages.

Additionally, we should consider getting rid of the mechanism that delays the deallocation and allocation of undo pages. Currently, there effectively is a separate page allocator for undo log pages, which does not zero out any preceding garbage on the pages when they are reused.

Comment by Marko Mäkelä [ 2018-08-09 ]

I suspect that there could also be a race condition between log scrubbing and writing. Specifically, I do not see where log_scrub_thread() would ensure that it is only overwriting log until the latest checkpoint. The log scrubbing could be implemented better after MDEV-14425, perhaps simply by punching a hole on the no-longer-needed portion of the write-ahead-log.

In MariaDB Server 10.5.2, MDEV-21870 deprecated and ignored the parameter innodb_scrub_log.

Comment by Marko Mäkelä [ 2019-04-08 ]

The function btr_page_free_low() was scrubbing pages in an inefficent and backup-unsafe way. Starting with MariaDB Server 10.2.24 and 10.3.15, it will write an appropriate redo log record for initializing the page.

MDEV-15528 in MariaDB 10.4 or later could make scrubbing mostly obsolete.

Comment by Marko Mäkelä [ 2020-03-10 ]

The scope of MDEV-15528 was reduced in order to get the necessary redo log format changes tested before the MariaDB Server 10.5 feature freeze. The rest of it will have to be done as part of this ticket (see the updated Description), to fix the scrubbing feature.

Comment by Marko Mäkelä [ 2020-05-28 ]

The shrinking of a file by ftruncate() (when all last pages of the file are freed) might block all concurrent access to the file. I think that we must do the ftruncate() when scrubbing is enabled. When scrubbing is disabled, we may choose to not shrink files, depending on what benchmarks indicate.

Comment by Marko Mäkelä [ 2020-06-03 ]

A quick review:

  • mtr.set_named_space(rseg->space); is not needed, because the file names of undo tablespaces are hard-coded.
  • clear_freed_ranges() should be merged to release_resources(), and we should also clear m_freed_in_system_tablespace.
  • mtr_t::commit_files() must assert ut_ad(m_freed_ranges.empty()); ut_ad(!m_freed_in_system_tablespace);
  • Do we really need the btr_invalidate_index_id() function? I do not think it is needed in btr_create() at all. In btr_free_if_exists(), we must skip that call for page_compressed tables (because hole-punching will be done) or if we know that scrubbing will be enabled when the page is flushed.

The interesting part of the patch is the call buf_flush_freed_pages(space, buf_flush_get_freed_pages(space)), which I would suggest to write as a single function. The preceding comment confused me, because it mentioned neighbor flushing, which was disabled on SSD by MDEV-17380, and I would expect HDD to represent a minority of the users soon. I think that we should perform that call irrespective of flush_type, no matter if we traverse buf_pool.LRU or buf_pool.flush_list. (Single-page flushing is an exception, because it should always be followed by page eviction.)

Also, we should invoke ftruncate() (after writing a TRIM_PAGES log record) when the last pages of a tablespace are freed. The recovery of TRIM_PAGES must be extended to cover all tablespaces, not only undo tablespaces. Related to this, a TODO comment about PUNCH_HOLE must be removed from trx_purge_truncate_history().

Comment by Matthias Leich [ 2020-06-11 ]

origin/bb-10.5-MDEV-8139 fd2c45631a65d3d0803a2445b965d51ecc147a34 2020-06-10T19:16:25+05:30
behaved very well when the RQG test battery for broad range functional coverage was executed.
A quite small fraction of tests failed because of already problems which are known
and affect actual 10.5 too.

Comment by Thirunarayanan Balathandayuthapani [ 2020-06-12 ]

Pushed the patch in

commit c92f7e287fc0e21dc1b181284b1f8e2139d1c331 (HEAD -> 10.5, origin/10.5, bb-10.5-MDEV-8139)
Author: Thirunarayanan Balathandayuthapani <thiru@mariadb.com>
Date:   Thu Jun 11 22:52:47 2020 +0530
 
    MDEV-8139 Fix Scrubbing

Comment by Marko Mäkelä [ 2020-06-22 ]

The feature was disabled in the MariaDB 10.5.4 release due to concerns over data corruption (MDEV-22970).

Comment by Matthias Leich [ 2020-07-07 ]

Bad effect 3. from above replays on actual 10.5 too.

Comment by Thirunarayanan Balathandayuthapani [ 2020-07-13 ]

Patch is in bb-10.5-thiru. Please take a look

Comment by Marko Mäkelä [ 2020-07-17 ]

I posted some review comments. It looks OK, except that crash recovery fails to scrub freed pages in page_compressed tables when innodb_immediate_scrub_data_uncompressed=OFF (which is the default setting).

OK to push after addressing my comments.

Comment by Matthias Leich [ 2020-07-20 ]

The tree
commit 4a47938d9c36605aa9fcbc7d3bbd34bea4e7f5f7 (HEAD, origin/bb-10.5-thiru)
Author: Thirunarayanan Balathandayuthapani <thiru@mariadb.com>
Date:   Fri Jul 17 21:03:50 2020 +0530
    - check empty container for range_set_t::iterator::find
commit 90c308c5e795ceda9c0f644bbde693507b597b74
Author: Thirunarayanan Balathandayuthapani <thiru@mariadb.com>
Date:   Fri Jul 17 19:56:33 2020 +0530
    MDEV-22970 Possible corruption of page_compressed tables, or
               when scrubbing is enabled
performed well during RQG testing.  The issues seen are in actual 10.5 too.

Generated at Thu Feb 08 07:24:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.