[MDEV-26917] InnoDB: Clustered record for sec rec not found index Created: 2021-10-27  Updated: 2023-01-03  Resolved: 2023-01-03

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.12, 10.5.15
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Alexander Fleischer Assignee: Allen Lee (Inactive)
Resolution: Incomplete Votes: 3
Labels: None
Environment:

Debian 11.1, Debian 11.3


Issue Links:
Blocks
is blocked by MDEV-30009 InnoDB shutdown hangs when the change... Closed
Duplicate
duplicates MDEV-9663 InnoDB assertion failure: *cursor->in... Closed
is duplicated by MDEV-26977 mariadb 10.5.12 reboot loop in AWS | ... Closed
Relates
relates to MCOL-5310 Columnstore crashes in malloc() with ... Closed
relates to MDEV-27734 Set innodb_change_buffering=none by d... Closed
relates to MDEV-27949 [crash] Unable to find a record to de... Closed
relates to MDEV-28975 AWS RDS mariadb 10.5.12 crashes upon ... Closed
relates to MDEV-29694 Remove the InnoDB change buffer Closed
relates to MDEV-30134 buf_page_t::unfix(): Assertion `!((f ... Closed
relates to MDEV-26978 [ERROR] InnoDB: Clustered record for ... Closed
relates to MDEV-27765 MariaDB stopped to work randomly - mi... Closed

 Description   

I got about 1 GB of these messages in the logfile:

2021-10-27 14:41:28 17998619 [ERROR] InnoDB: Clustered record for sec rec not found index `fs_mtime` of table `nextcloud`.`oc_filecache`
InnoDB: sec index record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 8; hex 80000000609b9914; asc     `   ;;
 1: len 8; hex 800000000269a5e6; asc      i  ;;
 
InnoDB: clust index record PHYSICAL RECORD: n_fields 18; compact format; info bits 0
 0: len 8; hex 800000000269a5e4; asc      i  ;;
 1: len 6; hex 000000000000; asc       ;;
 2: len 7; hex 80000000000000; asc        ;;
 3: len 8; hex 8000000000000002; asc         ;;
 4: len 30; hex 617070646174615f6f6378316f37396f786e69702f707265766965772f63; asc appdata_ocx1o79oxnip/preview/c; (total 51 bytes);
 5: len 30; hex 363233306533373536316561343864656463373035363934393837646538; asc 6230e37561ea48dedc705694987de8; (total 32 bytes);
 6: len 8; hex 800000000269a5e3; asc      i  ;;
 7: len 8; hex 3430343539393336; asc 40459936;;
 8: len 8; hex 8000000000000002; asc         ;;
 9: len 8; hex 8000000000000001; asc         ;;
 10: len 8; hex 8000000000000000; asc         ;;
 11: len 8; hex 8000000060ae224d; asc     ` "M;;
 12: len 8; hex 8000000060ae224d; asc     ` "M;;
 13: len 4; hex 80000000; asc     ;;
 14: len 8; hex 8000000000000000; asc         ;;
 15: len 13; hex 36303962393839653739333532; asc 609b989e79352;;
 16: len 4; hex 8000001f; asc     ;;
 17: len 0; hex ; asc ;;
 
TRANSACTION 421825623286288, ACTIVE 38 sec fetching rows
mysql tables in use 1, locked 0
0 lock struct(s), heap size 1128, 0 row lock(s)
MySQL thread id 17998619, OS thread handle 140350620571392, query id 731677120 172.21.243.109 nextcloud Sending data
SELECT `f`.* FROM `oc_filecache` `f` WHERE (((`f`.`storage` = '1594') AND (`f`.`path` LIKE 'files/...%')) OR ((`f`.`storage` = '2947') AND (`f`.`path` LIKE 'files/...%')) OR ((`f`.`storage` = '2947') AND (`f`.`path` LIKE 'files/...%')) OR...
 
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

(I shortend the filenames in the query for data protection reasons)

The error message contains always the same table and index name, the sql differs in the searched data, but is always the same structure.

CHECK TABLE oc_filecache gives only a "Ok".

OPTIMIZE TABLE worked, but there the errors are still there.

Dropping and re-creating the fs_mtime index worked, but the error messages are still there.

I also tried to export the whole table with mysqldump, dropping the table and re-importing the dump, but after this operation the errors are back again.

Server config:

[mysqld]
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /data/mysql
tmpdir          = /data/mysqltmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
key_buffer_size         = 16M
max_allowed_packet      = 16M
thread_stack            = 192K
thread_cache_size       = 8
myisam_recover_options  = BACKUP
query_cache_size        = 0
query_cache_type        = OFF
log_error = /data/log/mysql/error.log
slow_query_log_file     = /data/log/mysql/slow.log
long_query_time = 10
log_slow_rate_limit     = 1000
log_slow_verbosity      = query_plan
server-id               = 1
disable-log-bin
innodb_file_per_table=true
ssl-ca=/etc/mysql/nc1-db-ca-cert.pem
ssl-cert=/etc/mysql/nc1-db-server-cert.pem
ssl-key=/etc/mysql/nc1-db-server-key.pem
ssl=on
character-set-server  = utf8mb4
collation-server      = utf8mb4_general_ci
skip-name-resolve       = 1
join_buffer_size = 1M
transaction-isolation = READ-COMMITTED
max_connections = 128 # (> 151)
wait_timeout = 3600 # (< 28800)
interactive_timeout = 3600 # (< 28800)
join_buffer_size = 2M  #(> 1.0M, or always use indexes with joins)
table_open_cache = 4096 # (> 2000)
innodb_buffer_pool_size = 724M # (>= 724M) if possible.
tmp_table_size = 32M
max_heap_table_size = 32M
innodb_flush_method=O_DIRECT
max_statement_time = 90
performance_schema = OFF
thread_handling=pool-of-threads
innodb_log_file_size=192M
thread_pool_max_threads=128

Create table:

CREATE TABLE `oc_filecache` (
  `fileid` bigint(20) NOT NULL AUTO_INCREMENT,
  `storage` bigint(20) NOT NULL DEFAULT 0,
  `path` varchar(4000) COLLATE utf8mb4_bin DEFAULT NULL,
  `path_hash` varchar(32) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  `parent` bigint(20) NOT NULL DEFAULT 0,
  `name` varchar(250) COLLATE utf8mb4_bin DEFAULT NULL,
  `mimetype` bigint(20) NOT NULL DEFAULT 0,
  `mimepart` bigint(20) NOT NULL DEFAULT 0,
  `size` bigint(20) NOT NULL DEFAULT 0,
  `mtime` bigint(20) NOT NULL DEFAULT 0,
  `storage_mtime` bigint(20) NOT NULL DEFAULT 0,
  `encrypted` int(11) NOT NULL DEFAULT 0,
  `unencrypted_size` bigint(20) NOT NULL DEFAULT 0,
  `etag` varchar(40) COLLATE utf8mb4_bin DEFAULT NULL,
  `permissions` int(11) DEFAULT 0,
  `checksum` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
  PRIMARY KEY (`fileid`),
  UNIQUE KEY `fs_storage_path_hash` (`storage`,`path_hash`),
  KEY `fs_parent_name_hash` (`parent`,`name`),
  KEY `fs_storage_mimetype` (`storage`,`mimetype`),
  KEY `fs_storage_mimepart` (`storage`,`mimepart`),
  KEY `fs_storage_size` (`storage`,`size`,`fileid`),
  KEY `fs_size` (`size`),
  KEY `fs_mtime` (`mtime`)
) ENGINE=InnoDB AUTO_INCREMENT=46263064 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=COMPRESSED



 Comments   
Comment by Marko Mäkelä [ 2021-10-27 ]

afleischer, with which exact commands did you try to drop and re-create the index? If you used a single ALTER TABLE statement, the SQL layer would optimize away any DROP and ADD pairs of indexes.

The OPTIMIZE TABLE statement would not rebuild the table if innodb_defragment or innodb_optimize_fulltext_only are set. A better way to rebuild would be

ALTER TABLE oc_filecache FORCE;

Unfortunately, I have no ideas regarding the cause of the corruption. Could the table have been corrupted already before you upgraded to 10.5.12? CHECK TABLE on a backup could be useful.

I suspect that there may still be something wrong with the change buffer. SET GLOBAL innodb_change_buffering=none; might prevent further corruption of this kind. But it will of course not fix any previous corruption.

Comment by Alexander Fleischer [ 2021-10-27 ]

Thanks für your advise. For recreating the index I used "ALTER TABLE .. DROP INDEX ..." and "ALTER TABLE ... ADD INDEX ..." in 2 commands.

I executed the "ALTER TABLE oc_filecache FORCE;" now. Table ibd file shrinked by about 1G. Until now, no new error messages.

Even if there was corruption before 10.5.12 (I always used the versions from the official debian repository), I dropped the whole table and re-imported it from a mysqldump, but the errors came back.

Comment by Alexander Fleischer [ 2021-10-29 ]

Thank you very much. Looks good so far, no new error messages after executing the "ALTER TABLE ... FORCE".

Comment by Marko Mäkelä [ 2021-12-13 ]

afleischer, bitte, gern geschehen.

Did you get these errors ever again? Something appears to be wrong with the change buffer, but we do not know what, as we are unable to reproduce it ourselves. See MDEV-26977.

Comment by Alexander Fleischer [ 2021-12-13 ]

Hello,

the errors are back againe, but it looks like not so often (about 4M log size/week, this also maybe because the number of requests is lower).

Always the same oc_filecache table and the fs_mtime index.

MariaDB is running on Debian 11.1 on a VMWare virtual machine. There are about 1000 to 2500 queries per second. The table oc_filecache is really big, about 18M entries with about 6 GB filesize of ibd file. Database files are on SSD.

2021-12-13  8:38:27 7870421 [ERROR] InnoDB: Clustered record for sec rec not found index `fs_mtime` of table `nextcloud`.`oc_filecache`
InnoDB: sec index record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 8; hex 8000000061ae308f; asc     a 0 ;;
 1: len 8; hex 8000000002d931ce; asc       1 ;;
 
InnoDB: clust index record PHYSICAL RECORD: n_fields 18; compact format; info bits 0
 0: len 8; hex 8000000002d931cd; asc       1 ;;
 1: len 6; hex 000000000000; asc       ;;
 2: len 7; hex 80000000000000; asc        ;;
 3: len 8; hex 8000000000000002; asc         ;;
 4: len 30; hex 617070646174615f6f6378316f37396f786e69702f707265766965772f31; asc ***/preview/1; (total 68 bytes);
 5: len 30; hex 653463333634396138316335643231303933346532303337396638376162; asc e4c3649a81c5d210934e20379f87ab; (total 32 bytes);
 6: len 8; hex 800000000280d8ee; asc         ;;
 7: len 16; hex 3235362d3235362d63726f702e706e67; asc 256-256-crop.png;;
 8: len 8; hex 800000000000000c; asc         ;;
 9: len 8; hex 8000000000000008; asc         ;;
 10: len 8; hex 8000000000006798; asc       g ;;
 11: len 8; hex 8000000061ae308e; asc     a 0 ;;
 12: len 8; hex 8000000061ae308e; asc     a 0 ;;
 13: len 4; hex 80000000; asc     ;;
 14: len 8; hex 8000000000000000; asc         ;;
 15: len 30; hex 386335386637363263626262666233646165636661383363623665326230; asc 8c58f762cbbbfb3daecfa83cb6e2b0; (total 32 bytes);
 16: len 4; hex 8000001b; asc     ;;
 17: len 0; hex ; asc ;;
 
TRANSACTION 421198429919312, ACTIVE 6 sec fetching rows
mysql tables in use 1, locked 0
0 lock struct(s), heap size 1128, 0 row lock(s)
MySQL thread id 7870421, OS thread handle 139720891549440, query id 269136321 172.21*** nextcloud Sending data
SELECT `f`.* FROM `oc_filecache` `f` WHERE (((`f`.`storage` = '7184') AND (`f`.`path` LIKE '***/%')) OR ((`f`.`storage` = '13687') AND (`f`.`path` LIKE '***/%')) OR (`f`.`storage` = '10572')) AND ((`f`.`mimetype` <> 2) OR (`f`.`size` = 0)) AND (`f`.`path` NOT LIKE 'files_versions/%') AND (`f`.`path` NOT LIKE 'files_trashbin/%') ORDER BY `f`.`mtime` DESC LIMIT 500
 
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

I grep'ed the log for the corrupt message from MDEV-26977 and got it, but it was before the my first report and I exported and re-imported the table afterwards. It's a different index in this message, fs_time:

2021-10-05 10:11:02 1335216 [ERROR] InnoDB: Insert buffer insert fails; page free 19, dtuple size 21
InnoDB: Cannot insert index record DATA TUPLE: 2 fields;
 0: len 8; hex 800000005fd32cf4; asc     _ , ;;
 1: len 8; hex 8000000002b41595; asc         ;;
 
InnoDB: The table where this index record belongs
InnoDB: is now probably corrupt. Please run CHECK TABLE on
InnoDB: that table.
2021-10-05 10:11:02 1335216 [ERROR] InnoDB: page [page id: space=122050, page number=550089], size 8192, bitmap bits 0
2021-10-05 10:11:02 1335216 [ERROR] InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
2021-10-05 10:25:32 1346219 [ERROR] InnoDB: In pages [page id: space=122050, page number=554215] and [page id: space=122050, page number=667983] of index `fs_size` of table `nextcloud`.`oc_filecache`
InnoDB: records in wrong order on adjacent pages
InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 8; hex 80000000610248d4; asc     a H ;;
 1: len 8; hex 8000000002b41610; asc         ;;
 
InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 8; hex 8000000000012ef5; asc       . ;;
 1: len 8; hex 8000000001e4e822; asc        ";;
 
2021-10-05 10:25:32 1346219 [ERROR] InnoDB: Corruption of an index tree: table `nextcloud`.`oc_filecache` index `fs_size`, father ptr page no 676641, child page no 554215
PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 8; hex 8000000000012edd; asc       . ;;
 1: len 8; hex 8000000001bfc815; asc         ;;
2021-10-05 10:25:32 1346219 [Note] InnoDB: n_owned: 0; heap_no: 2; next rec: 146
PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 8; hex 8000000060b3bbc5; asc     `   ;;
 1: len 8; hex 8000000002b4514b; asc       QK;;
 2: len 4; hex 000a5321; asc   S!;;
2021-10-05 10:25:32 1346219 [Note] InnoDB: n_owned: 0; heap_no: 239; next rec: 1725

Comment by Arnaud Abélard [ 2022-01-17 ]

Hello,

We also are experiencing the exact them problem and also with nextcloud too. It started occuring after upgrading to mariadb 10.5 (mariadb's repository) from mariadb 10.1 (debian's packages). I've checked and we never had that problem with mariadb 10.1 during the past year.

Our oc_filecache table is 50M record big and we are using galera on 9 mariadb 10.5.13+maria~bullseye nodes with mariabackup and maxscale with the readwritesplit router. For some reason the problem doesn't seem to occurs on all nodes at the same time but it always occurs on several nodes.

Unfortunately, dumping and reimporting the table every odd day or so isn't an option for us.

our mariadb configuration:

wsrep_provider_options="gcs.fc_limit=480;gcs.fc_factor=0.8"
innodb_flush_log_at_trx_commit = 0
innodb_buffer_pool_size=68719476736
max_heap_table_size=268435456
tmp_table_size=268435456
innodb_log_file_size = 16G
wsrep_sync_wait=7
wsrep_retry_autocommit=10
tmpdir = /dev/shm
long_query_time = 30
wsrep_slave_threads=96
innodb_buffer_pool_instances=64

Arnaud

Comment by Marko Mäkelä [ 2022-01-31 ]

I would recommend to disable the InnoDB change buffer. Bugs in it could cause secondary indexes to get out of sync with the primary key.

SET GLOBAL innodb_change_buffering=none;

Note: This will not fix already incurred corruption. You would have rebuild the affected secondary indexes. I suspect that MDEV-26977 and part of the trouble reported in MDEV-27411 could be attributed to some bug in the change buffer that we have been unable to fix.

I had some ideas for improving the change buffer (MDEV-11634). According to some performance tests last week, any performance improvement via the change buffer was less than 5% when using SSD. The change buffer was originally developed to try to avoid random seeks on HDD, replacing them with more sequential page access. On SSD, there should be not much difference between random and sequential reads. I am trying to make a case for disabling the change buffer by default, issuing a deprecation warning, and removing the code altogether.

abelard-a, your configuration options suggest that a further source of your problems could be corner cases in the Galera snapshot transfer (SST) that could cause the database to be corrupted. Examples of that include MDEV-27524, MDEV-24845, MDEV-27459.

Comment by Marko Mäkelä [ 2022-02-11 ]

Do such failures go away if you disable the InnoDB change buffering (SET GLOBAL innodb_change_buffering=none) and rebuild the affected secondary indexes?

Comment by Arnaud Abélard [ 2022-02-11 ]

No, they don't go away after disabling the innodb change buffering unfortunately. We haven't rebuilt the table though so it might just be a consequence of the broken index but we are still getting errors in the logs.

Comment by Arnaud Abélard [ 2022-02-21 ]

Hello,

I can confirm that disabling InnoDB change buffering does fix the problem once the index has been repaired:

SET GLOBAL innodb_change_buffering=none;
ALTER TABLE oc_filecache FORCE;

Comment by Marko Mäkelä [ 2022-03-15 ]

afleischer, did you get any corruption after

SET GLOBAL innodb_change_buffering=none;
ALTER TABLE oc_filecache FORCE;

(or just rebuilding the corrupted index)?

Note that MariaDB does not have persistent parameters; you would have to set that parameter also in the server configuration file. Alternatively, you can upgrade to MariaDB 10.5.15, 10.6.7, or later version where MDEV-27734 disabled the change buffer by default. Again, that will not repair any corrupted indexes; ALTER TABLE will.

Comment by Alexander Fleischer [ 2022-03-16 ]

I missed your message before and changed the variable and rebuild the table today. But I have to wait some days to see the results in the logs.

Comment by Alexander Fleischer [ 2022-03-18 ]

No new error messages in log since the change & rebuild 2 days ago.

Comment by Rodolphe Cahen [ 2022-03-24 ]

Hello, had same kind of error after a nextcloud update to v23.
Debian11.2 10.5.12-MariaDB-0+deb11u1

SET GLOBAL innodb_change_buffering=none;
ALTER TABLE oc_filecache FORCE;

Problem fix. No new error messages in log since the change & rebuild 8 days ago.

Comment by Marko Mäkelä [ 2022-04-05 ]

Thank you for the feedback. I will keep this bug open until someone provides a way to reproduce the problem.

Comment by Marko Mäkelä [ 2022-11-02 ]

This inconsistency should not be related to buffering purge operations. The purge of committed transaction history in secondary indexes should only cover already delete-marked records. In the Description, we see that the secondary index record is not delete-marked:

InnoDB: sec index record PHYSICAL RECORD: …; info bits 0

Therefore, ibuf_delete() should be off the hook.

If there was a problem with applying delete-mark operations, there should have been prior messages like those in MDEV-27949:

Feb 25 01:49:56 ains mysqld: 2022-02-25 1:49:56 0 [ERROR] InnoDB: Unable to find a record to delete-mark

This error could simply be a consequence of that prior trouble. afleischer, stratege1401, any others, were there also such messages in the logs? Is there any way to reproduce the problem? Such as setting up a replication slave, with compression enabled?

Did this only occur in connection with ROW_FORMAT=COMPRESSED tables? (I know that NextCloud or OpenCloud used that format.)

Comment by Alexander Fleischer [ 2022-11-02 ]

I didn't save the gigabytes of logs, so I can't check now. But I'm sure, if there were other ERROR messages, I had also included it in my original issue.
I also cannot reproduce here, since the change of "nnodb_change_buffering=none;" i never had this message again. Unfortunatly I also couldn't reproduce these messages on my test nextcloud system (with less data), it only occured on the production system. I don't use replication at all. I got the messages only with the oc_filecache table, see first post for the show create. It's the biggest table with currently about 10 GB and about 25 million lines.

Comment by Joshua O'Leary [ 2022-11-15 ]

I think I've also encountered this issue too, it started off as a warning in the logs around 9.20 yesterday

2022-11-14  9:21:47 101439247 [ERROR] InnoDB: Clustered record for sec rec not found index `hub_id_activation_time` of table `mopo`.`rentals`
InnoDB: sec index record PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: SQL NULL;
 1: len 5; hex 0000000000; asc      ;;
 2: len 8; hex 93edf6010000a800; asc         ;;

But eventually resulted in MariaDB crashing and refusing to start at the end of the day, with corrupted data

2022-11-14 23:40:17 0 [ERROR] InnoDB: Corruption of an index tree: table `mopo`.`rentals` index `hub_id_activation_time`, father ptr page no 32, child page no 383377
PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: SQL NULL;
 1: len 5; hex 0000000000; asc      ;;
 2: len 8; hex 93edf6010000a800; asc         ;;
2022-11-14 23:40:17 0 [Note] InnoDB: n_owned: 0; heap_no: 20; next rec: 554
PHYSICAL RECORD: n_fields 4; compact format; info bits 16
 0: len 4; hex 80000000; asc     ;;
 1: SQL NULL;
 2: len 8; hex 0000000000000001; asc         ;;
 3: len 4; hex 00000020; asc     ;;
2022-11-14 23:40:17 0 [Note] InnoDB: n_owned: 0; heap_no: 2; next rec: 148
2022-11-14 23:40:17 0 [ERROR] [FATAL] InnoDB: You should dump + drop + reimport the table to fix the corruption. If the crash happens at database startup. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. Then dump + drop + reimport.
221114 23:40:17 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.5.11-MariaDB-1:10.5.11+maria~bionic-log
key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=147
max_threads=153

I had to start MariaDB with recovery mode 2, dump the corrupted table (rentals) with mysqldump and restore it to repair the corrupted data

Comment by Marko Mäkelä [ 2022-11-23 ]

This bug was filed almost 8 months after the infamous MDEV-9663 had been closed as incomplete. Below, I am copying a comment that I just posted there.

The CHECK TABLE…EXTENDED that was implemented in MDEV-24402 will flag secondary indexes corrupted if they contain entries that should not exist.

Crashes due to this corruption should have been (mostly) fixed in MDEV-13542. Because we were not able to reproduce this corruption, I cannot fully guarantee it.

While analyzing a failure from a stress test of MDEV-30009, I may have found a possible explanation of this. The scenario is as follows.

  1. Some changes were buffered to a secondary index leaf page that was not located in the buffer pool.
  2. The page was freed (possibly as part of DROP INDEX).
  3. During ibuf_read_merge_pages(), we will reset the change buffer bitmap bits but will not remove the change buffer records.
  4. The same page is allocated and reused for something else.
  5. The page is evicted from the buffer pool.
  6. Something is added to the change buffer for the page.
  7. On a change buffer merge, we will apply both old (bogus) and new entries to the page.

The extra delete-unmarked records could simply originate from previously buffered inserts that were not discarded as they were supposed to, in the above scenario.

As far as I can tell, all MySQL and MariaDB versions are affected by this. The code changes that were applied in MDEV-20934 did not fix this, because that code would only be executed on shutdown with innodb_fast_shutdown=0.

The InnoDB change buffer was disabled by default in MDEV-27734.

Note: We still have many open bugs related to the corruption of indexes that include virtual columns. Unlike this corruption, those corruptions are much easier to reproduce. Implementing some file format changes such as MDEV-17598 could help a lot with those bugs.

Comment by Marko Mäkelä [ 2022-11-23 ]

We are still looking for the following feedback:

  1. Any way to reproduce this (while explicitly enabling the change buffer after MDEV-27734 disabled it by default).
  2. Whether this corruption would stop from occurring after MDEV-30009 is fixed.

Due to the difficulty of reproducing this corruption, I think that we will have to keep waiting several months after the fix of MDEV-30009 is available.

Comment by Marko Mäkelä [ 2022-11-23 ]

mleich provided an even more juicy rr replay trace than the one that I previously analyzed, with a slightly different scenario:

mysqld: /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:3615: dberr_t ibuf_insert_to_index_page_low(const dtuple_t*, rec_offs**, mem_heap_t*, mtr_t*, page_cur_t*): Assertion `!__builtin_expect(((page_cur->block)->page.zip.data) != 0, 0)' failed.
#4  0x000055d5889b3120 in ibuf_insert_to_index_page_low (entry=entry@entry=0x61600482bfc0, offsets=offsets@entry=0x7f12df525c40, heap=0x613000427400, mtr=mtr@entry=0x7f12df526250, page_cur=page_cur@entry=0x7f12df525ce0)
    at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:3615
#5  0x000055d5889b7a9f in ibuf_insert_to_index_page (entry=entry@entry=0x61600482bfc0, block=block@entry=0x7f12f9d58650, index=0x616004823808, mtr=mtr@entry=0x7f12df526250) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:3779
#6  0x000055d5889c95f4 in ibuf_merge_or_delete_for_page (block=block@entry=0x7f12f9d58650, page_id=..., zip_size=zip_size@entry=4096) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:4312
#7  0x000055d588f3ca3e in buf_page_get_low (page_id=<error reading variable: Cannot access memory at address 0xfa0>, zip_size=zip_size@entry=4096, rw_latch=rw_latch@entry=2, guess=<optimized out>, guess@entry=0x0, mode=mode@entry=10, mtr=mtr@entry=0x7f12df5282b0, 
    err=<optimized out>, allow_ibuf_merge=<optimized out>) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/buf0buf.h:1014
#8  0x000055d588f3e20f in buf_page_get_gen (page_id=..., zip_size=zip_size@entry=4096, rw_latch=rw_latch@entry=2, guess=guess@entry=0x0, mode=mode@entry=10, mtr=mtr@entry=0x7f12df5282b0, err=0x7f12df5269f0, allow_ibuf_merge=true)
    at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/buf/buf0buf.cc:2962
#9  0x000055d588eda70d in btr_cur_search_to_nth_level (level=level@entry=0, tuple=tuple@entry=0x619000b37008, mode=mode@entry=PAGE_CUR_LE, latch_mode=latch_mode@entry=BTR_MODIFY_LEAF, cursor=cursor@entry=0x7f12df527e10, mtr=mtr@entry=0x7f12df5282b0, 
    autoinc=<optimized out>) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/btr/btr0cur.cc:1565
#10 0x000055d588ee0634 in btr_cur_t::search_leaf (this=this@entry=0x7f12df527e10, tuple=tuple@entry=0x619000b37008, mode=mode@entry=PAGE_CUR_LE, latch_mode=latch_mode@entry=BTR_MODIFY_LEAF, mtr=mtr@entry=0x7f12df5282b0, autoinc=autoinc@entry=0)
    at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/btr/btr0cur.cc:1206
#11 0x000055d588cb78f3 in btr_pcur_open (mtr=0x7f12df5282b0, autoinc=0, cursor=0x7f12df527e10, latch_mode=BTR_MODIFY_LEAF, mode=PAGE_CUR_LE, tuple=0x619000b37008) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/btr0pcur.inl:325
#12 row_search_index_entry (entry=entry@entry=0x619000b37008, mode=mode@entry=BTR_MODIFY_LEAF, pcur=pcur@entry=0x7f12df527e10, mtr=mtr@entry=0x7f12df5282b0) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/row/row0row.cc:1286
#13 0x000055d5891f11fd in row_undo_mod_del_unmark_sec_and_undo_update (mode=mode@entry=BTR_MODIFY_LEAF, thr=thr@entry=0x6160048415c0, index=index@entry=0x61600767d708, entry=entry@entry=0x619000b37008)
    at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/row/row0umod.cc:689

This fails in a rollback. This time, the dummy_index that was reconstructed from the change buffer record contains the same number of fields as the real index that row_undo_mod_del_unmark_sec_and_undo_update() is using (and which matches the PAGE_INDEX_ID in the secondary index leaf page). The code before the assertion expression includes a comment:

  if (page_cur_tuple_insert(page_cur, entry, offsets, &heap, 0, mtr))
    return DB_SUCCESS;
 
  /* Page reorganization or recompression should already have been
  attempted by page_cur_tuple_insert(). Besides, per
  ibuf_index_page_calc_free_zip() the page should not have been
  recompressed or reorganized. */
  ut_ad(!is_buf_block_get_page_zip(page_cur->block));

So, we end up having some extra records in the page. Because this is an rr replay trace, it is possible to set a data watchpoint on the change buffer bitmap bits and {{reverse-continue} to find out when the buffered changes should have been deleted:

(rr) reverse-finish
Run back to call of #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
0x00007f130dafc854 in __GI_abort () at abort.c:79
79	abort.c: No such file or directory.
(rr) reverse-finish
Run back to call of #0  0x000055d5889c95ef in ibuf_merge_or_delete_for_page (block=block@entry=0x7f12f9d58650, page_id={m_id = 47244640274}, zip_size=zip_size@entry=4096) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:4312
0x000055d588f3ca39 in buf_page_get_low (page_id=<error reading variable: Cannot access memory at address 0xfa0>, zip_size=zip_size@entry=4096, rw_latch=rw_latch@entry=2, guess=<optimized out>, guess@entry=0x0, mode=mode@entry=10, mtr=mtr@entry=0x7f12df5282b0, 
    err=<optimized out>, allow_ibuf_merge=<optimized out>) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/buf0buf.h:1014
1014	  ulint zip_size() const { return page.zip_size(); }
(rr) step
ibuf_merge_or_delete_for_page (block=block@entry=0x7f12f9d58650, page_id=<error reading variable: Cannot access memory at address 0x3ffffb01>, zip_size=zip_size@entry=4096) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:4132
4132	{
(rr) next
4173				bitmap_bits = ibuf_bitmap_page_get_bits(
(rr) step
ibuf_bitmap_page_get_bits_low (page=page@entry=0x7f12fa2ee000 "", page_id=<optimized out>, zip_size=zip_size@entry=4096, latch_type=latch_type@entry=2, mtr=mtr@entry=0x7f12df526250, bit=bit@entry=2)
    at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:542
542	{
(rr) next
562		map_byte = mach_read_from_1(page + IBUF_BITMAP + byte_offset);
(rr) step
mach_read_from_1 (b=0x7f12fa2ee067 "\004\060") at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/mach0data.inl:73
73		return(uint8_t(*b));
(rr) watch -l *b
Hardware watchpoint 4: -location *b
(rr) reverse-continue
Continuing.
 
Thread 3 hit Hardware watchpoint 4: -location *b
 
Old value = 48 '0'
New value = 51 '3'
(rr) set print frame-arguments none
(rr) backtrace
#0  0x000055d5889d728c in mtr_t::write<1u, (mtr_t::write_type)1, unsigned char> (this=..., block=..., ptr=..., val=...) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/mtr0log.h:202
#1  0x000055d5889c32d6 in ibuf_bitmap_page_set_bits<0> (block=..., page_id=..., physical_size=..., val=..., mtr=...) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/buf0types.h:135
#2  0x000055d5889c9e3c in ibuf_set_bitmap_for_bulk_load (block=..., mtr=..., reset=...) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/buf0buf.h:665
#3  0x000055d588e8529b in PageBulk::commit (this=..., success=...) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/btr/btr0bulk.cc:567

CREATE INDEX is resetting the change buffer bitmap bits for the newly initialized page, but it fails to delete the previously buffered records. Thus, the "garbage" records that were destined to a since-then freed page (I think after DROP INDEX but I did not check it) may be inserted later.

Comment by Marko Mäkelä [ 2022-11-24 ]

The failure scenario that I mentioned today was fixed in MDEV-30009.

Comment by Thirunarayanan Balathandayuthapani [ 2022-11-30 ]

Following test case can fail to remove the change buffer entries from ibuf index. It leads to hang in slow shutdown

--source include/have_innodb.inc
--source include/have_debug.inc
--source include/have_sequence.inc
 
CREATE TABLE t1(
        a INT AUTO_INCREMENT PRIMARY KEY,
        b CHAR(1),
        c INT,
        INDEX idx(b))
ENGINE=InnoDB STATS_PERSISTENT=0;
 
# The flag innodb_change_buffering_debug is only available in debug builds.
# It instructs InnoDB to try to evict pages from the buffer pool when
# change buffering is possible, so that the change buffer will be used
# whenever possible.
SET GLOBAL innodb_change_buffering_debug = 1;
SET GLOBAL innodb_change_buffering=all;
 
# Create enough rows for the table, so that the change buffer will be
# used for modifying the secondary index page. There must be multiple
# index pages, because changes to the root page are never buffered.
INSERT INTO t1 SELECT 0,'x',1 FROM seq_1_to_4096;
let MYSQLD_DATADIR=`select @@datadir`;
let PAGE_SIZE=`select @@innodb_page_size`;
 
ALTER TABLE t1 DROP INDEX idx;
 
set global innodb_fast_shutdown=0;
--source include/restart_mysqld.inc
 
# Cleanup
DROP TABLE t1;

InnoDB reset the bitmap bit while applying the change buffer merge for the freed page and it fails to
remove the record from ibuf index

              if (bitmap_bits
                    && DB_SUCCESS
                    == fseg_page_is_allocated(space, page_id.page_no())) {
                        ibuf_mtr_start(&mtr);
                        mtr.set_named_space(space);
                        ibuf_reset_bitmap(block, page_id, zip_size, &mtr);
                        ibuf_mtr_commit(&mtr);
                        bitmap_bits = 0;
                }
 
                if (!bitmap_bits) {
                        /* No changes are buffered for this page. */
                        space->release();
                        return DB_SUCCESS;
                }

During slow shutdown, InnoDB iterates through change buffer index again and again. It leads to hang

In 10.3 the following test case can do the same. But in 10.3, change buffer merge is happening in freed page (ie wrong though during slow shutdown)

 
set global innodb_disable_background_merge=1;
CREATE TABLE t1(
        a INT AUTO_INCREMENT PRIMARY KEY,
        b CHAR(1),
        c INT,
        INDEX idx(b))
ENGINE=InnoDB STATS_PERSISTENT=0;
 
connect(con1,localhost,root,,,);
start transaction with consistent snapshot;
 
# The flag innodb_change_buffering_debug is only available in debug builds.
# It instructs InnoDB to try to evict pages from the buffer pool when
# change buffering is possible, so that the change buffer will be used
# whenever possible.
SET GLOBAL innodb_change_buffering=all;
 
# Create enough rows for the table, so that the change buffer will be
# used for modifying the secondary index page. There must be multiple
# index pages, because changes to the root page are never buffered.
INSERT INTO t1 SELECT 0,'x',1 FROM seq_1_to_4096;
let MYSQLD_DATADIR=`select @@datadir`;
let PAGE_SIZE=`select @@innodb_page_size`;
 
ALTER TABLE t1 DROP INDEX idx;
 
INSERT INTO t1 SELECT 0,'y',2 FROM seq_1_to_8192;
connection con1;
commit;
set global innodb_disable_background_merge=0;
set global innodb_fast_shutdown=0;
--source include/restart_mysqld.inc
 
# Cleanup
DROP TABLE t1;

During slow shutdown, InnoDB does apply the change buffer values to the freed page.

Comment by Marko Mäkelä [ 2022-11-30 ]

Until MDEV-30134 has been fixed, it is unsafe to use innodb_change_buffering=purges or innodb_change_buffering=all.

Generated at Thu Feb 08 09:48:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.