[MDEV-29694] Remove the InnoDB change buffer Created: 2022-10-04  Updated: 2023-09-19  Resolved: 2023-01-11

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Fix Version/s: 11.0.1

Type: Task Priority: Blocker
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: Preview_11.0, corruption, performance

Issue Links:
Blocks
blocks MDEV-11658 Simpler, faster IMPORT of InnoDB tables Open
blocks MDEV-32042 Special handling of crash recovery in... Closed
is blocked by MDEV-27734 Set innodb_change_buffering=none by d... Closed
is blocked by MDEV-27735 Deprecate the parameter innodb_change... Closed
is blocked by MDEV-30106 InnoDB fails to validate the change b... Closed
Problem/Incident
causes MDEV-30895 Assertion `btr_cur->rtr_info->thr || ... Closed
causes MDEV-31443 assert [FATAL] InnoDB: Unable to find... Closed
Relates
relates to MDEV-12700 Allow innodb_read_only startup withou... Closed
relates to MDEV-23755 safety for upgrades (innodb) and pack... Closed
relates to MDEV-25267 Reported latching order violation in ... Open
relates to MDEV-30009 InnoDB shutdown hangs when the change... Closed
relates to MDEV-30134 buf_page_t::unfix(): Assertion `!((f ... Closed
relates to MDEV-30148 Race condition between non-persistent... Closed
relates to MDEV-30216 Read-ahead unnecessarily allocates an... Closed
relates to MDEV-31088 Server freeze due to innodb_change_bu... Closed
relates to MDEV-32044 Mariadb crash after upgrading to 11.0... Closed
relates to MDEV-32132 DROP INDEX followed by CREATE INDEX m... Closed
relates to MDEV-11634 Improve the InnoDB change buffer Closed
relates to MDEV-24449 Corruption of system tablespace or la... Closed
relates to MDEV-26917 InnoDB: Clustered record for sec rec ... Closed
relates to MDEV-30009 InnoDB shutdown hangs when the change... Closed
relates to MDEV-30819 InnoDB fails to start up after downgr... Closed

 Description   

MDEV-27734 disabled the problematic InnoDB change buffer by default, and MDEV-27735 deprecated the parameter in MariaDB Server 10.9.

The next logical step is to remove the parameter altogether, along with any code that would create the change buffer or add records to it. An attempt to downgrade to an earlier version will be caught and prevented by MDEV-30106.

In order to be able to simplify the buffer pool interfaces, upgrading will be split in two parts: First, check if an upgrade is needed. If it is, we will apply all redo log, possibly upgrade the redo log to the current format, and then attempt to upgrade the change buffer:

  1. Apply any buffered changes and clear the corresponding change buffer bitmap bits.
  2. Create an empty the change buffer root page.
  3. Free any pages that were allocated to the change buffer.
  4. Reset the change buffer root page, to mark the upgrade as completed and to prevent a downgrade to an earlier version.

If an error occurs during before the final upgrade step, it should be possible to downgrade to MariaDB Server 10.8 or a later version (using the currently latest redo log format that was changed in MDEV-14425).

As noted in MDEV-11634, the change buffer may improve performance in some cases, but it is also missing a lot of potential. The main motivation for removing the change buffer in its current form is that difficult-to-reproduce corruption bugs keep popping up, such as MDEV-24449, MDEV-26917, MDEV-30009.



 Comments   
Comment by Marko Mäkelä [ 2022-11-30 ]

The code to upgrade the MySQL 3.23 or MySQL 4.0 change buffer (before the introduction of innodb_file_per_table) was removed in MySQL 5.6.5 and in MariaDB 10.0.11.

Comment by Marko Mäkelä [ 2022-12-01 ]

mleich, I only tested the upgrade of an empty change buffer.

Can you please provide a copy of a 10.6 data directory where the change buffer is not empty, for some additional testing by me? Use innodb_change_buffering_debug=1 and update an indexed secondary index field, and shut down normally, without innodb_fast_shutdown=0. The current implementation is missing an upgrade of the redo log format before upgrading the change buffer.

Please pay special attention to testing SPATIAL INDEX, because as part of removing the change buffer, the field btr_cur_t::thr was removed. The innodb_gis test suite passed (including the disabled sporadically failing tests), but I think that we need some more testing there.

Comment by Marko Mäkelä [ 2022-12-02 ]

The upgrade from a nonempty change buffer can be tested with the following patch applied to an earlier version:

diff --git a/mysql-test/suite/innodb/t/ibuf_not_empty.test b/mysql-test/suite/innodb/t/ibuf_not_empty.test
index 9362f8daffa..84854607be7 100644
--- a/mysql-test/suite/innodb/t/ibuf_not_empty.test
+++ b/mysql-test/suite/innodb/t/ibuf_not_empty.test
@@ -76,6 +76,7 @@ EOF
 check table t1;
 
 --source include/shutdown_mysqld.inc
+stop here;
 
 # Truncate the file to 5 pages, as if it were empty
 perl;

The test will fail like this:

CURRENT_TEST: innodb.ibuf_not_empty
mysqltest: At line 79: query 'stop here' failed: <Unknown> (2006): Server has gone away

After that, we can find a data directory in mysql-test/var/log/innodb.ibuf_not_empty*/mysqld.1/data on which we can test the upgrade:

…/mariadbd --innodb-page-size=4k --datadir=/…/data

This will actually crash with the current code (trying an upgrade from 10.6):

2022-12-02  8:54:46 0 [Note] InnoDB: Upgrading the change buffer
221202  8:54:46 [ERROR] mysqld got signal 11 ;

The upgrade needs to be revised so that the log will be upgraded to the most recent format (MDEV-14425) first. An attempt to upgrade from 10.8 (or later) will not crash, but the index will be reported corrupted:

MariaDB [test]> check table t1;
+---------+-------+----------+---------------------------------------------------------+
| Table   | Op    | Msg_type | Msg_text                                                |
+---------+-------+----------+---------------------------------------------------------+
| test.t1 | check | Warning  | InnoDB: Index 'b' contains 990 entries, should be 1024. |
| test.t1 | check | error    | Corrupt                                                 |
+---------+-------+----------+---------------------------------------------------------+

What if we upgraded before corrupting the change buffer bitmap?

diff --git a/mysql-test/suite/innodb/t/ibuf_not_empty.test b/mysql-test/suite/innodb/t/ibuf_not_empty.test
index 9362f8daffa..50fe70d3b8a 100644
--- a/mysql-test/suite/innodb/t/ibuf_not_empty.test
+++ b/mysql-test/suite/innodb/t/ibuf_not_empty.test
@@ -37,6 +37,7 @@ let MYSQLD_DATADIR=`select @@datadir`;
 let PAGE_SIZE=`select @@innodb_page_size`;
 
 --source include/shutdown_mysqld.inc
+stop already here;
 
 # Corrupt the change buffer bitmap, to claim that pages are clean
 perl;

Unfortunately, the result is the same. I will fix these issues. This should not block the testing of the bulk of the changes (that everything still works during normal operation).

Comment by Marko Mäkelä [ 2022-12-02 ]

The change buffer contains 34 buffered insert operations, all for a single index page. That coincides with the reported mismatch. I fixed a few things, but I still have to add the opening of data files, so that the buffered changes can be merged. I think that we can find the tablespaces by scanning the clustered index of SYS_TABLES. Accessing the secondary indes on SYS_TABLES.ID is out of the question, because there may exist buffered changes for that index. The upgrade must follow a particular order:

  1. If the change buffer does not exist, continue with normal recovery (to not make life harder for MDEV-14481).
  2. If the change buffer exists:
    1. Apply all redo log records.
    2. If the redo log needs to be upgraded, upgrade it.
    3. Open the tablespaces for all tables found in SYS_TABLES.
    4. For each buffered change, in an atomic mini-transaction, apply the change and delete-mark the change buffer record.
    5. Empty the change buffer root page.
    6. Free the change buffer pages.
    7. Initialize the change buffer root page.
    8. Continue with normal recovery.
Comment by Marko Mäkelä [ 2022-12-02 ]

MDEV-30148 was found while testing this.

Comment by Marko Mäkelä [ 2022-12-02 ]

I fixed the upgrade from 10.6 and 10.8 using the above test. It only covers buffered inserts, not delete-mark or purge operations. And there were buffered inserts only to one index page.

Additional testing of the upgrades will be needed.

Comment by Marko Mäkelä [ 2022-12-05 ]

Sysbench oltp_update_index did not use the change buffer, no matter what I tried (yes, it included innodb_change_buffering_debug=1. I suppose that the test is searching via the indexed column, which would force change buffer merges.

I was a little more successful with the following test:

diff --git a/mysql-test/suite/innodb/t/ibuf_not_empty.test b/mysql-test/suite/innodb/t/ibuf_not_empty.test
index 9362f8daffa..46cd5bbce99 100644
--- a/mysql-test/suite/innodb/t/ibuf_not_empty.test
+++ b/mysql-test/suite/innodb/t/ibuf_not_empty.test
@@ -33,10 +33,14 @@ SET GLOBAL innodb_change_buffering=all;
 # used for modifying the secondary index page. There must be multiple
 # index pages, because changes to the root page are never buffered.
 INSERT INTO t1 SELECT 0,'x',1 FROM seq_1_to_1024;
+UPDATE t1 SET b='y';
+UPDATE t1 SET b='x';
+
 let MYSQLD_DATADIR=`select @@datadir`;
 let PAGE_SIZE=`select @@innodb_page_size`;
 
 --source include/shutdown_mysqld.inc
+stop already here;
 
 # Corrupt the change buffer bitmap, to claim that pages are clean
 perl;

As far as I can tell, this may not trigger purge buffering, but it still is better than my initial upgrade test, which only covered insert buffering.

Comment by Marko Mäkelä [ 2022-12-05 ]

The second UPDATE appears to trigger a change buffer merge. With the following, I got 79 INSERT operations and 3700 delete-mark operations buffered for 90 pages:

diff --git a/mysql-test/suite/innodb/t/ibuf_not_empty.test b/mysql-test/suite/innodb/t/ibuf_not_empty.test
index 9362f8daffa..00d766cbe72 100644
--- a/mysql-test/suite/innodb/t/ibuf_not_empty.test
+++ b/mysql-test/suite/innodb/t/ibuf_not_empty.test
@@ -32,11 +32,14 @@ SET GLOBAL innodb_change_buffering=all;
 # Create enough rows for the table, so that the change buffer will be
 # used for modifying the secondary index page. There must be multiple
 # index pages, because changes to the root page are never buffered.
-INSERT INTO t1 SELECT 0,'x',1 FROM seq_1_to_1024;
+INSERT INTO t1 SELECT 0,'x',1 FROM seq_1_to_4096;
+UPDATE t1 SET b='y';
+
 let MYSQLD_DATADIR=`select @@datadir`;
 let PAGE_SIZE=`select @@innodb_page_size`;
 
 --source include/shutdown_mysqld.inc
+stop already here;
 
 # Corrupt the change buffer bitmap, to claim that pages are clean
 perl;

The upgrade was successful, and CHECK TABLE did not report any errors. One more test with 65,536 rows: 193 pages, 33,426 delete-mark operations, 7 insert operations, no purge operations.

Comment by Matthias Leich [ 2022-12-06 ]

Scenario:
1. Start the server with the combination of innodb_fast_shutdown and innodb_change_buffering values mentioned.
2. Create a InnoDB table and fill it with records
3. Shutdown+Restart
4. CHECK TABLE test.t1 EXTENDED
 
OLD_MARIADB: origin/10.6 commit 6d40274f65b8d145fbf496e9b1b1d46f258de227 (HEAD, origin/10.6)
NEW_MARIADB: origin/bb-10.11-MDEV-29694 commit acc6ac8f6fc460df3d551f61fcdff5f5d0276cac
FAST_SHUTDOWN: 0 -- CHANGE_BUFFERING: 'inserts'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 0 -- CHANGE_BUFFERING: 'none'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 0 -- CHANGE_BUFFERING: 'deletes'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 0 -- CHANGE_BUFFERING: 'purges'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 0 -- CHANGE_BUFFERING: 'changes'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 0 -- CHANGE_BUFFERING: 'all'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 1 -- CHANGE_BUFFERING: 'inserts'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 1 -- CHANGE_BUFFERING: 'none'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 1 -- CHANGE_BUFFERING: 'deletes'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 1 -- CHANGE_BUFFERING: 'purges'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 1 -- CHANGE_BUFFERING: 'changes'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 1 -- CHANGE_BUFFERING: 'all'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 2 -- CHANGE_BUFFERING: 'inserts'
Table   Op  Msg_type    Msg_text
test.t1 check   Error   Unknown storage engine 'InnoDB'
test.t1 check   error   Corrupt
2022-12-05 11:15:01 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.6.12. You must start up and shut down MariaDB 10.7 or earlier.
2022-12-05 11:15:01 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1562] with error Generic error
2022-12-05 11:15:01 0 [ERROR] Plugin 'InnoDB' init function returned error.
2022-12-05 11:15:01 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
FAST_SHUTDOWN: 2 -- CHANGE_BUFFERING: 'none'
Table   Op  Msg_type    Msg_text
test.t1 check   Error   Unknown storage engine 'InnoDB'
test.t1 check   error   Corrupt
2022-12-05 11:15:09 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.6.12. You must start up and shut down MariaDB 10.7 or earlier.
2022-12-05 11:15:09 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1562] with error Generic error
2022-12-05 11:15:09 0 [ERROR] Plugin 'InnoDB' init function returned error.
2022-12-05 11:15:09 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
FAST_SHUTDOWN: 2 -- CHANGE_BUFFERING: 'deletes'
Table   Op  Msg_type    Msg_text
test.t1 check   Error   Unknown storage engine 'InnoDB'
test.t1 check   error   Corrupt
2022-12-05 11:15:17 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.6.12. You must start up and shut down MariaDB 10.7 or earlier.
2022-12-05 11:15:17 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1562] with error Generic error
2022-12-05 11:15:17 0 [ERROR] Plugin 'InnoDB' init function returned error.
2022-12-05 11:15:17 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
FAST_SHUTDOWN: 2 -- CHANGE_BUFFERING: 'purges'
Table   Op  Msg_type    Msg_text
test.t1 check   Error   Unknown storage engine 'InnoDB'
test.t1 check   error   Corrupt
2022-12-05 11:15:25 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.6.12. You must start up and shut down MariaDB 10.7 or earlier.
2022-12-05 11:15:25 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1562] with error Generic error
2022-12-05 11:15:25 0 [ERROR] Plugin 'InnoDB' init function returned error.
2022-12-05 11:15:25 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
FAST_SHUTDOWN: 2 -- CHANGE_BUFFERING: 'changes'
Table   Op  Msg_type    Msg_text
test.t1 check   Error   Unknown storage engine 'InnoDB'
test.t1 check   error   Corrupt
2022-12-05 11:15:33 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.6.12. You must start up and shut down MariaDB 10.7 or earlier.
2022-12-05 11:15:33 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1562] with error Generic error
2022-12-05 11:15:33 0 [ERROR] Plugin 'InnoDB' init function returned error.
2022-12-05 11:15:33 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
FAST_SHUTDOWN: 2 -- CHANGE_BUFFERING: 'all'
Table   Op  Msg_type    Msg_text
test.t1 check   Error   Unknown storage engine 'InnoDB'
test.t1 check   error   Corrupt
2022-12-05 11:15:41 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.6.12. You must start up and shut down MariaDB 10.7 or earlier.
2022-12-05 11:15:41 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1562] with error Generic error
2022-12-05 11:15:41 0 [ERROR] Plugin 'InnoDB' init function returned error.
2022-12-05 11:15:41 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
FAST_SHUTDOWN: 3 -- CHANGE_BUFFERING: 'inserts'
able   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 3 -- CHANGE_BUFFERING: 'none'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 3 -- CHANGE_BUFFERING: 'deletes'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 3 -- CHANGE_BUFFERING: 'purges'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 3 -- CHANGE_BUFFERING: 'changes'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK
FAST_SHUTDOWN: 3 -- CHANGE_BUFFERING: 'all'
Table   Op  Msg_type    Msg_text
test.t1 check   status  OK

Comment by Matthias Leich [ 2022-12-13 ]

origin/bb-10.11-new-innodb-defaults c434f870a346fb11a6b5932dbdf860ed7e2d2f74 2022-12-12T10:05:22+02:00
which contains MDEV-29986, MDEV-19506, MDEV-29694, MDEV-30136, MDEV-29983
performed well in RQG testing. No new problems

Comment by Marko Mäkelä [ 2023-01-25 ]

It is worth noting that starting with MDEV-30400, the test innodb.ibuf_not_empty no longer actually uses the change buffer, because the debug parameter innodb_change_buffering_debug lost many ‘teeth’ because of the locking changes. I retested an upgrade from 10.8 right before the merge to 11.0 with MDEV-30400, using my above patch to the 10.8 innodb.ibuf_not_empty test:

2023-01-25  9:58:03 0 [Note] InnoDB: Upgrading the change buffer
2023-01-25  9:58:03 0 [Note] InnoDB: Upgraded the change buffer: 1 tablespaces, 14 pages

Comment by Marko Mäkelä [ 2023-09-08 ]

Users should beware of dormant corruption, such as that one caused by MDEV-32132 a.k.a. MDEV-30009. We got a case of that in MDEV-32044.

greenman, I think that we must amend the recommended upgrade procedure from 10.x to 11.x along the following lines:

  1. If the change buffer had been enabled in the past (MDEV-27734 disabled it by default in 10.5.15, 10.6.7), and
  2. if DROP INDEX and CREATE INDEX had been executed such that buffered changes existed for the index that had been dropped,
  3. it would best to execute a shutdown with innodb_fast_shutdown=0 before upgrading, to prevent corruption of data.

The upgrade procedure ignores the "change buffer bitmaps" and trusts that all records in the change buffer are valid. The slow shutdown procedure of 10.4 (MDEV-20934; revised in MDEV-30009 for 10.5 and later) will discard bogus entries for pages to which there are exist buffered changes according to the change buffer bitmap.

We might choose to revise the upgrade logic in 11.0 to work in a similar fashion. Until or unless that is done, I think that we must document this.

Comment by Marko Mäkelä [ 2023-09-19 ]

My plan is to revise the upgrade logic in MDEV-32044.

Generated at Thu Feb 08 10:10:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.