Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26917

InnoDB: Clustered record for sec rec not found index

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Incomplete
    • 10.5.12, 10.5.15
    • N/A
    • None
    • Debian 11.1, Debian 11.3

    Description

      I got about 1 GB of these messages in the logfile:

      2021-10-27 14:41:28 17998619 [ERROR] InnoDB: Clustered record for sec rec not found index `fs_mtime` of table `nextcloud`.`oc_filecache`
      InnoDB: sec index record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
       0: len 8; hex 80000000609b9914; asc     `   ;;
       1: len 8; hex 800000000269a5e6; asc      i  ;;
       
      InnoDB: clust index record PHYSICAL RECORD: n_fields 18; compact format; info bits 0
       0: len 8; hex 800000000269a5e4; asc      i  ;;
       1: len 6; hex 000000000000; asc       ;;
       2: len 7; hex 80000000000000; asc        ;;
       3: len 8; hex 8000000000000002; asc         ;;
       4: len 30; hex 617070646174615f6f6378316f37396f786e69702f707265766965772f63; asc appdata_ocx1o79oxnip/preview/c; (total 51 bytes);
       5: len 30; hex 363233306533373536316561343864656463373035363934393837646538; asc 6230e37561ea48dedc705694987de8; (total 32 bytes);
       6: len 8; hex 800000000269a5e3; asc      i  ;;
       7: len 8; hex 3430343539393336; asc 40459936;;
       8: len 8; hex 8000000000000002; asc         ;;
       9: len 8; hex 8000000000000001; asc         ;;
       10: len 8; hex 8000000000000000; asc         ;;
       11: len 8; hex 8000000060ae224d; asc     ` "M;;
       12: len 8; hex 8000000060ae224d; asc     ` "M;;
       13: len 4; hex 80000000; asc     ;;
       14: len 8; hex 8000000000000000; asc         ;;
       15: len 13; hex 36303962393839653739333532; asc 609b989e79352;;
       16: len 4; hex 8000001f; asc     ;;
       17: len 0; hex ; asc ;;
       
      TRANSACTION 421825623286288, ACTIVE 38 sec fetching rows
      mysql tables in use 1, locked 0
      0 lock struct(s), heap size 1128, 0 row lock(s)
      MySQL thread id 17998619, OS thread handle 140350620571392, query id 731677120 172.21.243.109 nextcloud Sending data
      SELECT `f`.* FROM `oc_filecache` `f` WHERE (((`f`.`storage` = '1594') AND (`f`.`path` LIKE 'files/...%')) OR ((`f`.`storage` = '2947') AND (`f`.`path` LIKE 'files/...%')) OR ((`f`.`storage` = '2947') AND (`f`.`path` LIKE 'files/...%')) OR...
       
      InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      

      (I shortend the filenames in the query for data protection reasons)

      The error message contains always the same table and index name, the sql differs in the searched data, but is always the same structure.

      CHECK TABLE oc_filecache gives only a "Ok".

      OPTIMIZE TABLE worked, but there the errors are still there.

      Dropping and re-creating the fs_mtime index worked, but the error messages are still there.

      I also tried to export the whole table with mysqldump, dropping the table and re-importing the dump, but after this operation the errors are back again.

      Server config:

      [mysqld]
      user            = mysql
      pid-file        = /var/run/mysqld/mysqld.pid
      socket          = /var/run/mysqld/mysqld.sock
      port            = 3306
      basedir         = /usr
      datadir         = /data/mysql
      tmpdir          = /data/mysqltmp
      lc-messages-dir = /usr/share/mysql
      skip-external-locking
      key_buffer_size         = 16M
      max_allowed_packet      = 16M
      thread_stack            = 192K
      thread_cache_size       = 8
      myisam_recover_options  = BACKUP
      query_cache_size        = 0
      query_cache_type        = OFF
      log_error = /data/log/mysql/error.log
      slow_query_log_file     = /data/log/mysql/slow.log
      long_query_time = 10
      log_slow_rate_limit     = 1000
      log_slow_verbosity      = query_plan
      server-id               = 1
      disable-log-bin
      innodb_file_per_table=true
      ssl-ca=/etc/mysql/nc1-db-ca-cert.pem
      ssl-cert=/etc/mysql/nc1-db-server-cert.pem
      ssl-key=/etc/mysql/nc1-db-server-key.pem
      ssl=on
      character-set-server  = utf8mb4
      collation-server      = utf8mb4_general_ci
      skip-name-resolve       = 1
      join_buffer_size = 1M
      transaction-isolation = READ-COMMITTED
      max_connections = 128 # (> 151)
      wait_timeout = 3600 # (< 28800)
      interactive_timeout = 3600 # (< 28800)
      join_buffer_size = 2M  #(> 1.0M, or always use indexes with joins)
      table_open_cache = 4096 # (> 2000)
      innodb_buffer_pool_size = 724M # (>= 724M) if possible.
      tmp_table_size = 32M
      max_heap_table_size = 32M
      innodb_flush_method=O_DIRECT
      max_statement_time = 90
      performance_schema = OFF
      thread_handling=pool-of-threads
      innodb_log_file_size=192M
      thread_pool_max_threads=128
      

      Create table:

      CREATE TABLE `oc_filecache` (
        `fileid` bigint(20) NOT NULL AUTO_INCREMENT,
        `storage` bigint(20) NOT NULL DEFAULT 0,
        `path` varchar(4000) COLLATE utf8mb4_bin DEFAULT NULL,
        `path_hash` varchar(32) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
        `parent` bigint(20) NOT NULL DEFAULT 0,
        `name` varchar(250) COLLATE utf8mb4_bin DEFAULT NULL,
        `mimetype` bigint(20) NOT NULL DEFAULT 0,
        `mimepart` bigint(20) NOT NULL DEFAULT 0,
        `size` bigint(20) NOT NULL DEFAULT 0,
        `mtime` bigint(20) NOT NULL DEFAULT 0,
        `storage_mtime` bigint(20) NOT NULL DEFAULT 0,
        `encrypted` int(11) NOT NULL DEFAULT 0,
        `unencrypted_size` bigint(20) NOT NULL DEFAULT 0,
        `etag` varchar(40) COLLATE utf8mb4_bin DEFAULT NULL,
        `permissions` int(11) DEFAULT 0,
        `checksum` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
        PRIMARY KEY (`fileid`),
        UNIQUE KEY `fs_storage_path_hash` (`storage`,`path_hash`),
        KEY `fs_parent_name_hash` (`parent`,`name`),
        KEY `fs_storage_mimetype` (`storage`,`mimetype`),
        KEY `fs_storage_mimepart` (`storage`,`mimepart`),
        KEY `fs_storage_size` (`storage`,`size`,`fileid`),
        KEY `fs_size` (`size`),
        KEY `fs_mtime` (`mtime`)
      ) ENGINE=InnoDB AUTO_INCREMENT=46263064 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=COMPRESSED
      

      Attachments

        Issue Links

          Activity

            afleischer, with which exact commands did you try to drop and re-create the index? If you used a single ALTER TABLE statement, the SQL layer would optimize away any DROP and ADD pairs of indexes.

            The OPTIMIZE TABLE statement would not rebuild the table if innodb_defragment or innodb_optimize_fulltext_only are set. A better way to rebuild would be

            ALTER TABLE oc_filecache FORCE;
            

            Unfortunately, I have no ideas regarding the cause of the corruption. Could the table have been corrupted already before you upgraded to 10.5.12? CHECK TABLE on a backup could be useful.

            I suspect that there may still be something wrong with the change buffer. SET GLOBAL innodb_change_buffering=none; might prevent further corruption of this kind. But it will of course not fix any previous corruption.

            marko Marko Mäkelä added a comment - afleischer , with which exact commands did you try to drop and re-create the index? If you used a single ALTER TABLE statement, the SQL layer would optimize away any DROP and ADD pairs of indexes. The OPTIMIZE TABLE statement would not rebuild the table if innodb_defragment or innodb_optimize_fulltext_only are set. A better way to rebuild would be ALTER TABLE oc_filecache FORCE ; Unfortunately, I have no ideas regarding the cause of the corruption. Could the table have been corrupted already before you upgraded to 10.5.12? CHECK TABLE on a backup could be useful. I suspect that there may still be something wrong with the change buffer. SET GLOBAL innodb_change_buffering=none; might prevent further corruption of this kind. But it will of course not fix any previous corruption.
            afleischer Alexander Fleischer added a comment - - edited

            Thanks für your advise. For recreating the index I used "ALTER TABLE .. DROP INDEX ..." and "ALTER TABLE ... ADD INDEX ..." in 2 commands.

            I executed the "ALTER TABLE oc_filecache FORCE;" now. Table ibd file shrinked by about 1G. Until now, no new error messages.

            Even if there was corruption before 10.5.12 (I always used the versions from the official debian repository), I dropped the whole table and re-imported it from a mysqldump, but the errors came back.

            afleischer Alexander Fleischer added a comment - - edited Thanks für your advise. For recreating the index I used "ALTER TABLE .. DROP INDEX ..." and "ALTER TABLE ... ADD INDEX ..." in 2 commands. I executed the "ALTER TABLE oc_filecache FORCE;" now. Table ibd file shrinked by about 1G. Until now, no new error messages. Even if there was corruption before 10.5.12 (I always used the versions from the official debian repository), I dropped the whole table and re-imported it from a mysqldump, but the errors came back.

            Thank you very much. Looks good so far, no new error messages after executing the "ALTER TABLE ... FORCE".

            afleischer Alexander Fleischer added a comment - Thank you very much. Looks good so far, no new error messages after executing the "ALTER TABLE ... FORCE".

            afleischer, bitte, gern geschehen.

            Did you get these errors ever again? Something appears to be wrong with the change buffer, but we do not know what, as we are unable to reproduce it ourselves. See MDEV-26977.

            marko Marko Mäkelä added a comment - afleischer , bitte, gern geschehen. Did you get these errors ever again? Something appears to be wrong with the change buffer, but we do not know what, as we are unable to reproduce it ourselves. See MDEV-26977 .
            afleischer Alexander Fleischer added a comment - - edited

            Hello,

            the errors are back againe, but it looks like not so often (about 4M log size/week, this also maybe because the number of requests is lower).

            Always the same oc_filecache table and the fs_mtime index.

            MariaDB is running on Debian 11.1 on a VMWare virtual machine. There are about 1000 to 2500 queries per second. The table oc_filecache is really big, about 18M entries with about 6 GB filesize of ibd file. Database files are on SSD.

            2021-12-13  8:38:27 7870421 [ERROR] InnoDB: Clustered record for sec rec not found index `fs_mtime` of table `nextcloud`.`oc_filecache`
            InnoDB: sec index record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
             0: len 8; hex 8000000061ae308f; asc     a 0 ;;
             1: len 8; hex 8000000002d931ce; asc       1 ;;
             
            InnoDB: clust index record PHYSICAL RECORD: n_fields 18; compact format; info bits 0
             0: len 8; hex 8000000002d931cd; asc       1 ;;
             1: len 6; hex 000000000000; asc       ;;
             2: len 7; hex 80000000000000; asc        ;;
             3: len 8; hex 8000000000000002; asc         ;;
             4: len 30; hex 617070646174615f6f6378316f37396f786e69702f707265766965772f31; asc ***/preview/1; (total 68 bytes);
             5: len 30; hex 653463333634396138316335643231303933346532303337396638376162; asc e4c3649a81c5d210934e20379f87ab; (total 32 bytes);
             6: len 8; hex 800000000280d8ee; asc         ;;
             7: len 16; hex 3235362d3235362d63726f702e706e67; asc 256-256-crop.png;;
             8: len 8; hex 800000000000000c; asc         ;;
             9: len 8; hex 8000000000000008; asc         ;;
             10: len 8; hex 8000000000006798; asc       g ;;
             11: len 8; hex 8000000061ae308e; asc     a 0 ;;
             12: len 8; hex 8000000061ae308e; asc     a 0 ;;
             13: len 4; hex 80000000; asc     ;;
             14: len 8; hex 8000000000000000; asc         ;;
             15: len 30; hex 386335386637363263626262666233646165636661383363623665326230; asc 8c58f762cbbbfb3daecfa83cb6e2b0; (total 32 bytes);
             16: len 4; hex 8000001b; asc     ;;
             17: len 0; hex ; asc ;;
             
            TRANSACTION 421198429919312, ACTIVE 6 sec fetching rows
            mysql tables in use 1, locked 0
            0 lock struct(s), heap size 1128, 0 row lock(s)
            MySQL thread id 7870421, OS thread handle 139720891549440, query id 269136321 172.21*** nextcloud Sending data
            SELECT `f`.* FROM `oc_filecache` `f` WHERE (((`f`.`storage` = '7184') AND (`f`.`path` LIKE '***/%')) OR ((`f`.`storage` = '13687') AND (`f`.`path` LIKE '***/%')) OR (`f`.`storage` = '10572')) AND ((`f`.`mimetype` <> 2) OR (`f`.`size` = 0)) AND (`f`.`path` NOT LIKE 'files_versions/%') AND (`f`.`path` NOT LIKE 'files_trashbin/%') ORDER BY `f`.`mtime` DESC LIMIT 500
             
            InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
            

            I grep'ed the log for the corrupt message from MDEV-26977 and got it, but it was before the my first report and I exported and re-imported the table afterwards. It's a different index in this message, fs_time:

            2021-10-05 10:11:02 1335216 [ERROR] InnoDB: Insert buffer insert fails; page free 19, dtuple size 21
            InnoDB: Cannot insert index record DATA TUPLE: 2 fields;
             0: len 8; hex 800000005fd32cf4; asc     _ , ;;
             1: len 8; hex 8000000002b41595; asc         ;;
             
            InnoDB: The table where this index record belongs
            InnoDB: is now probably corrupt. Please run CHECK TABLE on
            InnoDB: that table.
            2021-10-05 10:11:02 1335216 [ERROR] InnoDB: page [page id: space=122050, page number=550089], size 8192, bitmap bits 0
            2021-10-05 10:11:02 1335216 [ERROR] InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
            2021-10-05 10:25:32 1346219 [ERROR] InnoDB: In pages [page id: space=122050, page number=554215] and [page id: space=122050, page number=667983] of index `fs_size` of table `nextcloud`.`oc_filecache`
            InnoDB: records in wrong order on adjacent pages
            InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
             0: len 8; hex 80000000610248d4; asc     a H ;;
             1: len 8; hex 8000000002b41610; asc         ;;
             
            InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
             0: len 8; hex 8000000000012ef5; asc       . ;;
             1: len 8; hex 8000000001e4e822; asc        ";;
             
            2021-10-05 10:25:32 1346219 [ERROR] InnoDB: Corruption of an index tree: table `nextcloud`.`oc_filecache` index `fs_size`, father ptr page no 676641, child page no 554215
            PHYSICAL RECORD: n_fields 2; compact format; info bits 0
             0: len 8; hex 8000000000012edd; asc       . ;;
             1: len 8; hex 8000000001bfc815; asc         ;;
            2021-10-05 10:25:32 1346219 [Note] InnoDB: n_owned: 0; heap_no: 2; next rec: 146
            PHYSICAL RECORD: n_fields 3; compact format; info bits 0
             0: len 8; hex 8000000060b3bbc5; asc     `   ;;
             1: len 8; hex 8000000002b4514b; asc       QK;;
             2: len 4; hex 000a5321; asc   S!;;
            2021-10-05 10:25:32 1346219 [Note] InnoDB: n_owned: 0; heap_no: 239; next rec: 1725
            

            afleischer Alexander Fleischer added a comment - - edited Hello, the errors are back againe, but it looks like not so often (about 4M log size/week, this also maybe because the number of requests is lower). Always the same oc_filecache table and the fs_mtime index. MariaDB is running on Debian 11.1 on a VMWare virtual machine. There are about 1000 to 2500 queries per second. The table oc_filecache is really big, about 18M entries with about 6 GB filesize of ibd file. Database files are on SSD. 2021-12-13 8:38:27 7870421 [ERROR] InnoDB: Clustered record for sec rec not found index `fs_mtime` of table `nextcloud`.`oc_filecache` InnoDB: sec index record PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 8; hex 8000000061ae308f; asc a 0 ;; 1: len 8; hex 8000000002d931ce; asc 1 ;;   InnoDB: clust index record PHYSICAL RECORD: n_fields 18; compact format; info bits 0 0: len 8; hex 8000000002d931cd; asc 1 ;; 1: len 6; hex 000000000000; asc ;; 2: len 7; hex 80000000000000; asc ;; 3: len 8; hex 8000000000000002; asc ;; 4: len 30; hex 617070646174615f6f6378316f37396f786e69702f707265766965772f31; asc ***/preview/1; (total 68 bytes); 5: len 30; hex 653463333634396138316335643231303933346532303337396638376162; asc e4c3649a81c5d210934e20379f87ab; (total 32 bytes); 6: len 8; hex 800000000280d8ee; asc ;; 7: len 16; hex 3235362d3235362d63726f702e706e67; asc 256-256-crop.png;; 8: len 8; hex 800000000000000c; asc ;; 9: len 8; hex 8000000000000008; asc ;; 10: len 8; hex 8000000000006798; asc g ;; 11: len 8; hex 8000000061ae308e; asc a 0 ;; 12: len 8; hex 8000000061ae308e; asc a 0 ;; 13: len 4; hex 80000000; asc ;; 14: len 8; hex 8000000000000000; asc ;; 15: len 30; hex 386335386637363263626262666233646165636661383363623665326230; asc 8c58f762cbbbfb3daecfa83cb6e2b0; (total 32 bytes); 16: len 4; hex 8000001b; asc ;; 17: len 0; hex ; asc ;;   TRANSACTION 421198429919312, ACTIVE 6 sec fetching rows mysql tables in use 1, locked 0 0 lock struct(s), heap size 1128, 0 row lock(s) MySQL thread id 7870421, OS thread handle 139720891549440, query id 269136321 172.21*** nextcloud Sending data SELECT `f`.* FROM `oc_filecache` `f` WHERE (((`f`.`storage` = '7184') AND (`f`.`path` LIKE '***/%')) OR ((`f`.`storage` = '13687') AND (`f`.`path` LIKE '***/%')) OR (`f`.`storage` = '10572')) AND ((`f`.`mimetype` <> 2) OR (`f`.`size` = 0)) AND (`f`.`path` NOT LIKE 'files_versions/%') AND (`f`.`path` NOT LIKE 'files_trashbin/%') ORDER BY `f`.`mtime` DESC LIMIT 500   InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ I grep'ed the log for the corrupt message from MDEV-26977 and got it, but it was before the my first report and I exported and re-imported the table afterwards. It's a different index in this message, fs_time: 2021-10-05 10:11:02 1335216 [ERROR] InnoDB: Insert buffer insert fails; page free 19, dtuple size 21 InnoDB: Cannot insert index record DATA TUPLE: 2 fields; 0: len 8; hex 800000005fd32cf4; asc _ , ;; 1: len 8; hex 8000000002b41595; asc ;;   InnoDB: The table where this index record belongs InnoDB: is now probably corrupt. Please run CHECK TABLE on InnoDB: that table. 2021-10-05 10:11:02 1335216 [ERROR] InnoDB: page [page id: space=122050, page number=550089], size 8192, bitmap bits 0 2021-10-05 10:11:02 1335216 [ERROR] InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ 2021-10-05 10:25:32 1346219 [ERROR] InnoDB: In pages [page id: space=122050, page number=554215] and [page id: space=122050, page number=667983] of index `fs_size` of table `nextcloud`.`oc_filecache` InnoDB: records in wrong order on adjacent pages InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 8; hex 80000000610248d4; asc a H ;; 1: len 8; hex 8000000002b41610; asc ;;   InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 8; hex 8000000000012ef5; asc . ;; 1: len 8; hex 8000000001e4e822; asc ";;   2021-10-05 10:25:32 1346219 [ERROR] InnoDB: Corruption of an index tree: table `nextcloud`.`oc_filecache` index `fs_size`, father ptr page no 676641, child page no 554215 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 8; hex 8000000000012edd; asc . ;; 1: len 8; hex 8000000001bfc815; asc ;; 2021-10-05 10:25:32 1346219 [Note] InnoDB: n_owned: 0; heap_no: 2; next rec: 146 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 8; hex 8000000060b3bbc5; asc ` ;; 1: len 8; hex 8000000002b4514b; asc QK;; 2: len 4; hex 000a5321; asc S!;; 2021-10-05 10:25:32 1346219 [Note] InnoDB: n_owned: 0; heap_no: 239; next rec: 1725

            Hello,

            We also are experiencing the exact them problem and also with nextcloud too. It started occuring after upgrading to mariadb 10.5 (mariadb's repository) from mariadb 10.1 (debian's packages). I've checked and we never had that problem with mariadb 10.1 during the past year.

            Our oc_filecache table is 50M record big and we are using galera on 9 mariadb 10.5.13+maria~bullseye nodes with mariabackup and maxscale with the readwritesplit router. For some reason the problem doesn't seem to occurs on all nodes at the same time but it always occurs on several nodes.

            Unfortunately, dumping and reimporting the table every odd day or so isn't an option for us.

            our mariadb configuration:

            wsrep_provider_options="gcs.fc_limit=480;gcs.fc_factor=0.8"
            innodb_flush_log_at_trx_commit = 0
            innodb_buffer_pool_size=68719476736
            max_heap_table_size=268435456
            tmp_table_size=268435456
            innodb_log_file_size = 16G
            wsrep_sync_wait=7
            wsrep_retry_autocommit=10
            tmpdir = /dev/shm
            long_query_time = 30
            wsrep_slave_threads=96
            innodb_buffer_pool_instances=64

            Arnaud

            abelard-a Arnaud Abélard added a comment - Hello, We also are experiencing the exact them problem and also with nextcloud too. It started occuring after upgrading to mariadb 10.5 (mariadb's repository) from mariadb 10.1 (debian's packages). I've checked and we never had that problem with mariadb 10.1 during the past year. Our oc_filecache table is 50M record big and we are using galera on 9 mariadb 10.5.13+maria~bullseye nodes with mariabackup and maxscale with the readwritesplit router. For some reason the problem doesn't seem to occurs on all nodes at the same time but it always occurs on several nodes. Unfortunately, dumping and reimporting the table every odd day or so isn't an option for us. our mariadb configuration: wsrep_provider_options="gcs.fc_limit=480;gcs.fc_factor=0.8" innodb_flush_log_at_trx_commit = 0 innodb_buffer_pool_size=68719476736 max_heap_table_size=268435456 tmp_table_size=268435456 innodb_log_file_size = 16G wsrep_sync_wait=7 wsrep_retry_autocommit=10 tmpdir = /dev/shm long_query_time = 30 wsrep_slave_threads=96 innodb_buffer_pool_instances=64 Arnaud

            I would recommend to disable the InnoDB change buffer. Bugs in it could cause secondary indexes to get out of sync with the primary key.

            SET GLOBAL innodb_change_buffering=none;
            

            Note: This will not fix already incurred corruption. You would have rebuild the affected secondary indexes. I suspect that MDEV-26977 and part of the trouble reported in MDEV-27411 could be attributed to some bug in the change buffer that we have been unable to fix.

            I had some ideas for improving the change buffer (MDEV-11634). According to some performance tests last week, any performance improvement via the change buffer was less than 5% when using SSD. The change buffer was originally developed to try to avoid random seeks on HDD, replacing them with more sequential page access. On SSD, there should be not much difference between random and sequential reads. I am trying to make a case for disabling the change buffer by default, issuing a deprecation warning, and removing the code altogether.

            abelard-a, your configuration options suggest that a further source of your problems could be corner cases in the Galera snapshot transfer (SST) that could cause the database to be corrupted. Examples of that include MDEV-27524, MDEV-24845, MDEV-27459.

            marko Marko Mäkelä added a comment - I would recommend to disable the InnoDB change buffer. Bugs in it could cause secondary indexes to get out of sync with the primary key. SET GLOBAL innodb_change_buffering=none; Note: This will not fix already incurred corruption. You would have rebuild the affected secondary indexes. I suspect that MDEV-26977 and part of the trouble reported in MDEV-27411 could be attributed to some bug in the change buffer that we have been unable to fix. I had some ideas for improving the change buffer ( MDEV-11634 ). According to some performance tests last week, any performance improvement via the change buffer was less than 5% when using SSD. The change buffer was originally developed to try to avoid random seeks on HDD, replacing them with more sequential page access. On SSD, there should be not much difference between random and sequential reads. I am trying to make a case for disabling the change buffer by default, issuing a deprecation warning, and removing the code altogether. abelard-a , your configuration options suggest that a further source of your problems could be corner cases in the Galera snapshot transfer (SST) that could cause the database to be corrupted. Examples of that include MDEV-27524 , MDEV-24845 , MDEV-27459 .

            Do such failures go away if you disable the InnoDB change buffering (SET GLOBAL innodb_change_buffering=none) and rebuild the affected secondary indexes?

            marko Marko Mäkelä added a comment - Do such failures go away if you disable the InnoDB change buffering ( SET GLOBAL innodb_change_buffering=none ) and rebuild the affected secondary indexes?

            No, they don't go away after disabling the innodb change buffering unfortunately. We haven't rebuilt the table though so it might just be a consequence of the broken index but we are still getting errors in the logs.

            abelard-a Arnaud Abélard added a comment - No, they don't go away after disabling the innodb change buffering unfortunately. We haven't rebuilt the table though so it might just be a consequence of the broken index but we are still getting errors in the logs.

            Hello,

            I can confirm that disabling InnoDB change buffering does fix the problem once the index has been repaired:

            SET GLOBAL innodb_change_buffering=none;
            ALTER TABLE oc_filecache FORCE;
            

            abelard-a Arnaud Abélard added a comment - Hello, I can confirm that disabling InnoDB change buffering does fix the problem once the index has been repaired: SET GLOBAL innodb_change_buffering=none; ALTER TABLE oc_filecache FORCE;

            afleischer, did you get any corruption after

            SET GLOBAL innodb_change_buffering=none;
            ALTER TABLE oc_filecache FORCE;
            

            (or just rebuilding the corrupted index)?

            Note that MariaDB does not have persistent parameters; you would have to set that parameter also in the server configuration file. Alternatively, you can upgrade to MariaDB 10.5.15, 10.6.7, or later version where MDEV-27734 disabled the change buffer by default. Again, that will not repair any corrupted indexes; ALTER TABLE will.

            marko Marko Mäkelä added a comment - afleischer , did you get any corruption after SET GLOBAL innodb_change_buffering=none; ALTER TABLE oc_filecache FORCE ; (or just rebuilding the corrupted index)? Note that MariaDB does not have persistent parameters; you would have to set that parameter also in the server configuration file. Alternatively, you can upgrade to MariaDB 10.5.15, 10.6.7, or later version where MDEV-27734 disabled the change buffer by default. Again, that will not repair any corrupted indexes; ALTER TABLE will.

            I missed your message before and changed the variable and rebuild the table today. But I have to wait some days to see the results in the logs.

            afleischer Alexander Fleischer added a comment - I missed your message before and changed the variable and rebuild the table today. But I have to wait some days to see the results in the logs.

            No new error messages in log since the change & rebuild 2 days ago.

            afleischer Alexander Fleischer added a comment - No new error messages in log since the change & rebuild 2 days ago.
            stratege1401 Rodolphe Cahen added a comment - - edited

            Hello, had same kind of error after a nextcloud update to v23.
            Debian11.2 10.5.12-MariaDB-0+deb11u1

            SET GLOBAL innodb_change_buffering=none;
            ALTER TABLE oc_filecache FORCE;

            Problem fix. No new error messages in log since the change & rebuild 8 days ago.

            stratege1401 Rodolphe Cahen added a comment - - edited Hello, had same kind of error after a nextcloud update to v23. Debian11.2 10.5.12-MariaDB-0+deb11u1 SET GLOBAL innodb_change_buffering=none; ALTER TABLE oc_filecache FORCE ; Problem fix. No new error messages in log since the change & rebuild 8 days ago.

            Thank you for the feedback. I will keep this bug open until someone provides a way to reproduce the problem.

            marko Marko Mäkelä added a comment - Thank you for the feedback. I will keep this bug open until someone provides a way to reproduce the problem.

            This inconsistency should not be related to buffering purge operations. The purge of committed transaction history in secondary indexes should only cover already delete-marked records. In the Description, we see that the secondary index record is not delete-marked:

            InnoDB: sec index record PHYSICAL RECORD: …; info bits 0
            

            Therefore, ibuf_delete() should be off the hook.

            If there was a problem with applying delete-mark operations, there should have been prior messages like those in MDEV-27949:

            Feb 25 01:49:56 ains mysqld: 2022-02-25 1:49:56 0 [ERROR] InnoDB: Unable to find a record to delete-mark
            

            This error could simply be a consequence of that prior trouble. afleischer, stratege1401, any others, were there also such messages in the logs? Is there any way to reproduce the problem? Such as setting up a replication slave, with compression enabled?

            Did this only occur in connection with ROW_FORMAT=COMPRESSED tables? (I know that NextCloud or OpenCloud used that format.)

            marko Marko Mäkelä added a comment - This inconsistency should not be related to buffering purge operations. The purge of committed transaction history in secondary indexes should only cover already delete-marked records. In the Description, we see that the secondary index record is not delete-marked: InnoDB: sec index record PHYSICAL RECORD: …; info bits 0 Therefore, ibuf_delete() should be off the hook. If there was a problem with applying delete-mark operations, there should have been prior messages like those in MDEV-27949 : Feb 25 01:49:56 ains mysqld: 2022-02-25 1:49:56 0 [ERROR] InnoDB: Unable to find a record to delete-mark This error could simply be a consequence of that prior trouble. afleischer , stratege1401 , any others, were there also such messages in the logs? Is there any way to reproduce the problem? Such as setting up a replication slave, with compression enabled? Did this only occur in connection with ROW_FORMAT=COMPRESSED tables? (I know that NextCloud or OpenCloud used that format.)
            afleischer Alexander Fleischer added a comment - - edited

            I didn't save the gigabytes of logs, so I can't check now. But I'm sure, if there were other ERROR messages, I had also included it in my original issue.
            I also cannot reproduce here, since the change of "nnodb_change_buffering=none;" i never had this message again. Unfortunatly I also couldn't reproduce these messages on my test nextcloud system (with less data), it only occured on the production system. I don't use replication at all. I got the messages only with the oc_filecache table, see first post for the show create. It's the biggest table with currently about 10 GB and about 25 million lines.

            afleischer Alexander Fleischer added a comment - - edited I didn't save the gigabytes of logs, so I can't check now. But I'm sure, if there were other ERROR messages, I had also included it in my original issue. I also cannot reproduce here, since the change of "nnodb_change_buffering=none;" i never had this message again. Unfortunatly I also couldn't reproduce these messages on my test nextcloud system (with less data), it only occured on the production system. I don't use replication at all. I got the messages only with the oc_filecache table, see first post for the show create. It's the biggest table with currently about 10 GB and about 25 million lines.
            jmoey139 Joshua O'Leary added a comment - - edited

            I think I've also encountered this issue too, it started off as a warning in the logs around 9.20 yesterday

            2022-11-14  9:21:47 101439247 [ERROR] InnoDB: Clustered record for sec rec not found index `hub_id_activation_time` of table `mopo`.`rentals`
            InnoDB: sec index record PHYSICAL RECORD: n_fields 3; compact format; info bits 0
             0: SQL NULL;
             1: len 5; hex 0000000000; asc      ;;
             2: len 8; hex 93edf6010000a800; asc         ;;
            

            But eventually resulted in MariaDB crashing and refusing to start at the end of the day, with corrupted data

            2022-11-14 23:40:17 0 [ERROR] InnoDB: Corruption of an index tree: table `mopo`.`rentals` index `hub_id_activation_time`, father ptr page no 32, child page no 383377
            PHYSICAL RECORD: n_fields 3; compact format; info bits 0
             0: SQL NULL;
             1: len 5; hex 0000000000; asc      ;;
             2: len 8; hex 93edf6010000a800; asc         ;;
            2022-11-14 23:40:17 0 [Note] InnoDB: n_owned: 0; heap_no: 20; next rec: 554
            PHYSICAL RECORD: n_fields 4; compact format; info bits 16
             0: len 4; hex 80000000; asc     ;;
             1: SQL NULL;
             2: len 8; hex 0000000000000001; asc         ;;
             3: len 4; hex 00000020; asc     ;;
            2022-11-14 23:40:17 0 [Note] InnoDB: n_owned: 0; heap_no: 2; next rec: 148
            2022-11-14 23:40:17 0 [ERROR] [FATAL] InnoDB: You should dump + drop + reimport the table to fix the corruption. If the crash happens at database startup. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. Then dump + drop + reimport.
            221114 23:40:17 [ERROR] mysqld got signal 6 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.
             
            To report this bug, see https://mariadb.com/kb/en/reporting-bugs
             
            We will try our best to scrape up some info that will hopefully help
            diagnose the problem, but since we have already crashed, 
            something is definitely wrong and this may fail.
             
            Server version: 10.5.11-MariaDB-1:10.5.11+maria~bionic-log
            key_buffer_size=16777216
            read_buffer_size=131072
            max_used_connections=147
            max_threads=153
            

            I had to start MariaDB with recovery mode 2, dump the corrupted table (rentals) with mysqldump and restore it to repair the corrupted data

            jmoey139 Joshua O'Leary added a comment - - edited I think I've also encountered this issue too, it started off as a warning in the logs around 9.20 yesterday 2022-11-14 9:21:47 101439247 [ERROR] InnoDB: Clustered record for sec rec not found index `hub_id_activation_time` of table `mopo`.`rentals` InnoDB: sec index record PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: SQL NULL; 1: len 5; hex 0000000000; asc ;; 2: len 8; hex 93edf6010000a800; asc ;; But eventually resulted in MariaDB crashing and refusing to start at the end of the day, with corrupted data 2022-11-14 23:40:17 0 [ERROR] InnoDB: Corruption of an index tree: table `mopo`.`rentals` index `hub_id_activation_time`, father ptr page no 32, child page no 383377 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: SQL NULL; 1: len 5; hex 0000000000; asc ;; 2: len 8; hex 93edf6010000a800; asc ;; 2022-11-14 23:40:17 0 [Note] InnoDB: n_owned: 0; heap_no: 20; next rec: 554 PHYSICAL RECORD: n_fields 4; compact format; info bits 16 0: len 4; hex 80000000; asc ;; 1: SQL NULL; 2: len 8; hex 0000000000000001; asc ;; 3: len 4; hex 00000020; asc ;; 2022-11-14 23:40:17 0 [Note] InnoDB: n_owned: 0; heap_no: 2; next rec: 148 2022-11-14 23:40:17 0 [ERROR] [FATAL] InnoDB: You should dump + drop + reimport the table to fix the corruption. If the crash happens at database startup. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. Then dump + drop + reimport. 221114 23:40:17 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware.   To report this bug, see https://mariadb.com/kb/en/reporting-bugs   We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail.   Server version: 10.5.11-MariaDB-1:10.5.11+maria~bionic-log key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=147 max_threads=153 I had to start MariaDB with recovery mode 2, dump the corrupted table (rentals) with mysqldump and restore it to repair the corrupted data

            This bug was filed almost 8 months after the infamous MDEV-9663 had been closed as incomplete. Below, I am copying a comment that I just posted there.

            The CHECK TABLE…EXTENDED that was implemented in MDEV-24402 will flag secondary indexes corrupted if they contain entries that should not exist.

            Crashes due to this corruption should have been (mostly) fixed in MDEV-13542. Because we were not able to reproduce this corruption, I cannot fully guarantee it.

            While analyzing a failure from a stress test of MDEV-30009, I may have found a possible explanation of this. The scenario is as follows.

            1. Some changes were buffered to a secondary index leaf page that was not located in the buffer pool.
            2. The page was freed (possibly as part of DROP INDEX).
            3. During ibuf_read_merge_pages(), we will reset the change buffer bitmap bits but will not remove the change buffer records.
            4. The same page is allocated and reused for something else.
            5. The page is evicted from the buffer pool.
            6. Something is added to the change buffer for the page.
            7. On a change buffer merge, we will apply both old (bogus) and new entries to the page.

            The extra delete-unmarked records could simply originate from previously buffered inserts that were not discarded as they were supposed to, in the above scenario.

            As far as I can tell, all MySQL and MariaDB versions are affected by this. The code changes that were applied in MDEV-20934 did not fix this, because that code would only be executed on shutdown with innodb_fast_shutdown=0.

            The InnoDB change buffer was disabled by default in MDEV-27734.

            Note: We still have many open bugs related to the corruption of indexes that include virtual columns. Unlike this corruption, those corruptions are much easier to reproduce. Implementing some file format changes such as MDEV-17598 could help a lot with those bugs.

            marko Marko Mäkelä added a comment - This bug was filed almost 8 months after the infamous MDEV-9663 had been closed as incomplete. Below, I am copying a comment that I just posted there. The CHECK TABLE…EXTENDED that was implemented in MDEV-24402 will flag secondary indexes corrupted if they contain entries that should not exist. Crashes due to this corruption should have been (mostly) fixed in MDEV-13542 . Because we were not able to reproduce this corruption, I cannot fully guarantee it. While analyzing a failure from a stress test of MDEV-30009 , I may have found a possible explanation of this. The scenario is as follows. Some changes were buffered to a secondary index leaf page that was not located in the buffer pool. The page was freed (possibly as part of DROP INDEX ). During ibuf_read_merge_pages() , we will reset the change buffer bitmap bits but will not remove the change buffer records. The same page is allocated and reused for something else. The page is evicted from the buffer pool. Something is added to the change buffer for the page. On a change buffer merge, we will apply both old (bogus) and new entries to the page. The extra delete-unmarked records could simply originate from previously buffered inserts that were not discarded as they were supposed to, in the above scenario. As far as I can tell, all MySQL and MariaDB versions are affected by this. The code changes that were applied in MDEV-20934 did not fix this, because that code would only be executed on shutdown with innodb_fast_shutdown=0 . The InnoDB change buffer was disabled by default in MDEV-27734 . Note: We still have many open bugs related to the corruption of indexes that include virtual columns. Unlike this corruption, those corruptions are much easier to reproduce. Implementing some file format changes such as MDEV-17598 could help a lot with those bugs.

            We are still looking for the following feedback:

            1. Any way to reproduce this (while explicitly enabling the change buffer after MDEV-27734 disabled it by default).
            2. Whether this corruption would stop from occurring after MDEV-30009 is fixed.

            Due to the difficulty of reproducing this corruption, I think that we will have to keep waiting several months after the fix of MDEV-30009 is available.

            marko Marko Mäkelä added a comment - We are still looking for the following feedback: Any way to reproduce this (while explicitly enabling the change buffer after MDEV-27734 disabled it by default). Whether this corruption would stop from occurring after MDEV-30009 is fixed. Due to the difficulty of reproducing this corruption, I think that we will have to keep waiting several months after the fix of MDEV-30009 is available.

            mleich provided an even more juicy rr replay trace than the one that I previously analyzed, with a slightly different scenario:

            mysqld: /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:3615: dberr_t ibuf_insert_to_index_page_low(const dtuple_t*, rec_offs**, mem_heap_t*, mtr_t*, page_cur_t*): Assertion `!__builtin_expect(((page_cur->block)->page.zip.data) != 0, 0)' failed.
            #4  0x000055d5889b3120 in ibuf_insert_to_index_page_low (entry=entry@entry=0x61600482bfc0, offsets=offsets@entry=0x7f12df525c40, heap=0x613000427400, mtr=mtr@entry=0x7f12df526250, page_cur=page_cur@entry=0x7f12df525ce0)
                at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:3615
            #5  0x000055d5889b7a9f in ibuf_insert_to_index_page (entry=entry@entry=0x61600482bfc0, block=block@entry=0x7f12f9d58650, index=0x616004823808, mtr=mtr@entry=0x7f12df526250) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:3779
            #6  0x000055d5889c95f4 in ibuf_merge_or_delete_for_page (block=block@entry=0x7f12f9d58650, page_id=..., zip_size=zip_size@entry=4096) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:4312
            #7  0x000055d588f3ca3e in buf_page_get_low (page_id=<error reading variable: Cannot access memory at address 0xfa0>, zip_size=zip_size@entry=4096, rw_latch=rw_latch@entry=2, guess=<optimized out>, guess@entry=0x0, mode=mode@entry=10, mtr=mtr@entry=0x7f12df5282b0, 
                err=<optimized out>, allow_ibuf_merge=<optimized out>) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/buf0buf.h:1014
            #8  0x000055d588f3e20f in buf_page_get_gen (page_id=..., zip_size=zip_size@entry=4096, rw_latch=rw_latch@entry=2, guess=guess@entry=0x0, mode=mode@entry=10, mtr=mtr@entry=0x7f12df5282b0, err=0x7f12df5269f0, allow_ibuf_merge=true)
                at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/buf/buf0buf.cc:2962
            #9  0x000055d588eda70d in btr_cur_search_to_nth_level (level=level@entry=0, tuple=tuple@entry=0x619000b37008, mode=mode@entry=PAGE_CUR_LE, latch_mode=latch_mode@entry=BTR_MODIFY_LEAF, cursor=cursor@entry=0x7f12df527e10, mtr=mtr@entry=0x7f12df5282b0, 
                autoinc=<optimized out>) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/btr/btr0cur.cc:1565
            #10 0x000055d588ee0634 in btr_cur_t::search_leaf (this=this@entry=0x7f12df527e10, tuple=tuple@entry=0x619000b37008, mode=mode@entry=PAGE_CUR_LE, latch_mode=latch_mode@entry=BTR_MODIFY_LEAF, mtr=mtr@entry=0x7f12df5282b0, autoinc=autoinc@entry=0)
                at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/btr/btr0cur.cc:1206
            #11 0x000055d588cb78f3 in btr_pcur_open (mtr=0x7f12df5282b0, autoinc=0, cursor=0x7f12df527e10, latch_mode=BTR_MODIFY_LEAF, mode=PAGE_CUR_LE, tuple=0x619000b37008) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/btr0pcur.inl:325
            #12 row_search_index_entry (entry=entry@entry=0x619000b37008, mode=mode@entry=BTR_MODIFY_LEAF, pcur=pcur@entry=0x7f12df527e10, mtr=mtr@entry=0x7f12df5282b0) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/row/row0row.cc:1286
            #13 0x000055d5891f11fd in row_undo_mod_del_unmark_sec_and_undo_update (mode=mode@entry=BTR_MODIFY_LEAF, thr=thr@entry=0x6160048415c0, index=index@entry=0x61600767d708, entry=entry@entry=0x619000b37008)
                at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/row/row0umod.cc:689
            

            This fails in a rollback. This time, the dummy_index that was reconstructed from the change buffer record contains the same number of fields as the real index that row_undo_mod_del_unmark_sec_and_undo_update() is using (and which matches the PAGE_INDEX_ID in the secondary index leaf page). The code before the assertion expression includes a comment:

              if (page_cur_tuple_insert(page_cur, entry, offsets, &heap, 0, mtr))
                return DB_SUCCESS;
             
              /* Page reorganization or recompression should already have been
              attempted by page_cur_tuple_insert(). Besides, per
              ibuf_index_page_calc_free_zip() the page should not have been
              recompressed or reorganized. */
              ut_ad(!is_buf_block_get_page_zip(page_cur->block));
            

            So, we end up having some extra records in the page. Because this is an rr replay trace, it is possible to set a data watchpoint on the change buffer bitmap bits and {{reverse-continue} to find out when the buffered changes should have been deleted:

            …
            (rr) reverse-finish
            Run back to call of #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
            0x00007f130dafc854 in __GI_abort () at abort.c:79
            79	abort.c: No such file or directory.
            …
            (rr) reverse-finish
            Run back to call of #0  0x000055d5889c95ef in ibuf_merge_or_delete_for_page (block=block@entry=0x7f12f9d58650, page_id={m_id = 47244640274}, zip_size=zip_size@entry=4096) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:4312
            0x000055d588f3ca39 in buf_page_get_low (page_id=<error reading variable: Cannot access memory at address 0xfa0>, zip_size=zip_size@entry=4096, rw_latch=rw_latch@entry=2, guess=<optimized out>, guess@entry=0x0, mode=mode@entry=10, mtr=mtr@entry=0x7f12df5282b0, 
                err=<optimized out>, allow_ibuf_merge=<optimized out>) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/buf0buf.h:1014
            1014	  ulint zip_size() const { return page.zip_size(); }
            (rr) step
            ibuf_merge_or_delete_for_page (block=block@entry=0x7f12f9d58650, page_id=<error reading variable: Cannot access memory at address 0x3ffffb01>, zip_size=zip_size@entry=4096) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:4132
            4132	{
            (rr) next
            …
            4173				bitmap_bits = ibuf_bitmap_page_get_bits(
            (rr) step
            ibuf_bitmap_page_get_bits_low (page=page@entry=0x7f12fa2ee000 "", page_id=<optimized out>, zip_size=zip_size@entry=4096, latch_type=latch_type@entry=2, mtr=mtr@entry=0x7f12df526250, bit=bit@entry=2)
                at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:542
            542	{
            (rr) next
            …
            562		map_byte = mach_read_from_1(page + IBUF_BITMAP + byte_offset);
            (rr) step
            mach_read_from_1 (b=0x7f12fa2ee067 "\004\060") at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/mach0data.inl:73
            73		return(uint8_t(*b));
            (rr) watch -l *b
            Hardware watchpoint 4: -location *b
            (rr) reverse-continue
            …
            Continuing.
             
            Thread 3 hit Hardware watchpoint 4: -location *b
             
            Old value = 48 '0'
            New value = 51 '3'
            …
            (rr) set print frame-arguments none
            (rr) backtrace
            #0  0x000055d5889d728c in mtr_t::write<1u, (mtr_t::write_type)1, unsigned char> (this=..., block=..., ptr=..., val=...) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/mtr0log.h:202
            #1  0x000055d5889c32d6 in ibuf_bitmap_page_set_bits<0> (block=..., page_id=..., physical_size=..., val=..., mtr=...) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/buf0types.h:135
            #2  0x000055d5889c9e3c in ibuf_set_bitmap_for_bulk_load (block=..., mtr=..., reset=...) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/buf0buf.h:665
            #3  0x000055d588e8529b in PageBulk::commit (this=..., success=...) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/btr/btr0bulk.cc:567
            

            CREATE INDEX is resetting the change buffer bitmap bits for the newly initialized page, but it fails to delete the previously buffered records. Thus, the "garbage" records that were destined to a since-then freed page (I think after DROP INDEX but I did not check it) may be inserted later.

            marko Marko Mäkelä added a comment - mleich provided an even more juicy rr replay trace than the one that I previously analyzed, with a slightly different scenario: mysqld: /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:3615: dberr_t ibuf_insert_to_index_page_low(const dtuple_t*, rec_offs**, mem_heap_t*, mtr_t*, page_cur_t*): Assertion `!__builtin_expect(((page_cur->block)->page.zip.data) != 0, 0)' failed. #4 0x000055d5889b3120 in ibuf_insert_to_index_page_low (entry=entry@entry=0x61600482bfc0, offsets=offsets@entry=0x7f12df525c40, heap=0x613000427400, mtr=mtr@entry=0x7f12df526250, page_cur=page_cur@entry=0x7f12df525ce0) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:3615 #5 0x000055d5889b7a9f in ibuf_insert_to_index_page (entry=entry@entry=0x61600482bfc0, block=block@entry=0x7f12f9d58650, index=0x616004823808, mtr=mtr@entry=0x7f12df526250) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:3779 #6 0x000055d5889c95f4 in ibuf_merge_or_delete_for_page (block=block@entry=0x7f12f9d58650, page_id=..., zip_size=zip_size@entry=4096) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:4312 #7 0x000055d588f3ca3e in buf_page_get_low (page_id=<error reading variable: Cannot access memory at address 0xfa0>, zip_size=zip_size@entry=4096, rw_latch=rw_latch@entry=2, guess=<optimized out>, guess@entry=0x0, mode=mode@entry=10, mtr=mtr@entry=0x7f12df5282b0, err=<optimized out>, allow_ibuf_merge=<optimized out>) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/buf0buf.h:1014 #8 0x000055d588f3e20f in buf_page_get_gen (page_id=..., zip_size=zip_size@entry=4096, rw_latch=rw_latch@entry=2, guess=guess@entry=0x0, mode=mode@entry=10, mtr=mtr@entry=0x7f12df5282b0, err=0x7f12df5269f0, allow_ibuf_merge=true) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/buf/buf0buf.cc:2962 #9 0x000055d588eda70d in btr_cur_search_to_nth_level (level=level@entry=0, tuple=tuple@entry=0x619000b37008, mode=mode@entry=PAGE_CUR_LE, latch_mode=latch_mode@entry=BTR_MODIFY_LEAF, cursor=cursor@entry=0x7f12df527e10, mtr=mtr@entry=0x7f12df5282b0, autoinc=<optimized out>) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/btr/btr0cur.cc:1565 #10 0x000055d588ee0634 in btr_cur_t::search_leaf (this=this@entry=0x7f12df527e10, tuple=tuple@entry=0x619000b37008, mode=mode@entry=PAGE_CUR_LE, latch_mode=latch_mode@entry=BTR_MODIFY_LEAF, mtr=mtr@entry=0x7f12df5282b0, autoinc=autoinc@entry=0) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/btr/btr0cur.cc:1206 #11 0x000055d588cb78f3 in btr_pcur_open (mtr=0x7f12df5282b0, autoinc=0, cursor=0x7f12df527e10, latch_mode=BTR_MODIFY_LEAF, mode=PAGE_CUR_LE, tuple=0x619000b37008) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/btr0pcur.inl:325 #12 row_search_index_entry (entry=entry@entry=0x619000b37008, mode=mode@entry=BTR_MODIFY_LEAF, pcur=pcur@entry=0x7f12df527e10, mtr=mtr@entry=0x7f12df5282b0) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/row/row0row.cc:1286 #13 0x000055d5891f11fd in row_undo_mod_del_unmark_sec_and_undo_update (mode=mode@entry=BTR_MODIFY_LEAF, thr=thr@entry=0x6160048415c0, index=index@entry=0x61600767d708, entry=entry@entry=0x619000b37008) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/row/row0umod.cc:689 This fails in a rollback. This time, the dummy_index that was reconstructed from the change buffer record contains the same number of fields as the real index that row_undo_mod_del_unmark_sec_and_undo_update() is using (and which matches the PAGE_INDEX_ID in the secondary index leaf page). The code before the assertion expression includes a comment: if (page_cur_tuple_insert(page_cur, entry, offsets, &heap, 0, mtr)) return DB_SUCCESS;   /* Page reorganization or recompression should already have been attempted by page_cur_tuple_insert(). Besides, per ibuf_index_page_calc_free_zip() the page should not have been recompressed or reorganized. */ ut_ad(!is_buf_block_get_page_zip(page_cur->block)); So, we end up having some extra records in the page. Because this is an rr replay trace, it is possible to set a data watchpoint on the change buffer bitmap bits and {{reverse-continue} to find out when the buffered changes should have been deleted: … (rr) reverse-finish Run back to call of #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 0x00007f130dafc854 in __GI_abort () at abort.c:79 79 abort.c: No such file or directory. … (rr) reverse-finish Run back to call of #0 0x000055d5889c95ef in ibuf_merge_or_delete_for_page (block=block@entry=0x7f12f9d58650, page_id={m_id = 47244640274}, zip_size=zip_size@entry=4096) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:4312 0x000055d588f3ca39 in buf_page_get_low (page_id=<error reading variable: Cannot access memory at address 0xfa0>, zip_size=zip_size@entry=4096, rw_latch=rw_latch@entry=2, guess=<optimized out>, guess@entry=0x0, mode=mode@entry=10, mtr=mtr@entry=0x7f12df5282b0, err=<optimized out>, allow_ibuf_merge=<optimized out>) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/buf0buf.h:1014 1014 ulint zip_size() const { return page.zip_size(); } (rr) step ibuf_merge_or_delete_for_page (block=block@entry=0x7f12f9d58650, page_id=<error reading variable: Cannot access memory at address 0x3ffffb01>, zip_size=zip_size@entry=4096) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:4132 4132 { (rr) next … 4173 bitmap_bits = ibuf_bitmap_page_get_bits( (rr) step ibuf_bitmap_page_get_bits_low (page=page@entry=0x7f12fa2ee000 "", page_id=<optimized out>, zip_size=zip_size@entry=4096, latch_type=latch_type@entry=2, mtr=mtr@entry=0x7f12df526250, bit=bit@entry=2) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/ibuf/ibuf0ibuf.cc:542 542 { (rr) next … 562 map_byte = mach_read_from_1(page + IBUF_BITMAP + byte_offset); (rr) step mach_read_from_1 (b=0x7f12fa2ee067 "\004\060") at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/mach0data.inl:73 73 return(uint8_t(*b)); (rr) watch -l *b Hardware watchpoint 4: -location *b (rr) reverse-continue … Continuing.   Thread 3 hit Hardware watchpoint 4: -location *b   Old value = 48 '0' New value = 51 '3' … (rr) set print frame-arguments none (rr) backtrace #0 0x000055d5889d728c in mtr_t::write<1u, (mtr_t::write_type)1, unsigned char> (this=..., block=..., ptr=..., val=...) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/mtr0log.h:202 #1 0x000055d5889c32d6 in ibuf_bitmap_page_set_bits<0> (block=..., page_id=..., physical_size=..., val=..., mtr=...) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/buf0types.h:135 #2 0x000055d5889c9e3c in ibuf_set_bitmap_for_bulk_load (block=..., mtr=..., reset=...) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/include/buf0buf.h:665 #3 0x000055d588e8529b in PageBulk::commit (this=..., success=...) at /data/Server/bb-10.6-MDEV-30009A/storage/innobase/btr/btr0bulk.cc:567 CREATE INDEX is resetting the change buffer bitmap bits for the newly initialized page, but it fails to delete the previously buffered records. Thus, the "garbage" records that were destined to a since-then freed page (I think after DROP INDEX but I did not check it) may be inserted later.

            The failure scenario that I mentioned today was fixed in MDEV-30009.

            marko Marko Mäkelä added a comment - The failure scenario that I mentioned today was fixed in MDEV-30009 .

            Following test case can fail to remove the change buffer entries from ibuf index. It leads to hang in slow shutdown

            --source include/have_innodb.inc
            --source include/have_debug.inc
            --source include/have_sequence.inc
             
            CREATE TABLE t1(
                    a INT AUTO_INCREMENT PRIMARY KEY,
                    b CHAR(1),
                    c INT,
                    INDEX idx(b))
            ENGINE=InnoDB STATS_PERSISTENT=0;
             
            # The flag innodb_change_buffering_debug is only available in debug builds.
            # It instructs InnoDB to try to evict pages from the buffer pool when
            # change buffering is possible, so that the change buffer will be used
            # whenever possible.
            SET GLOBAL innodb_change_buffering_debug = 1;
            SET GLOBAL innodb_change_buffering=all;
             
            # Create enough rows for the table, so that the change buffer will be
            # used for modifying the secondary index page. There must be multiple
            # index pages, because changes to the root page are never buffered.
            INSERT INTO t1 SELECT 0,'x',1 FROM seq_1_to_4096;
            let MYSQLD_DATADIR=`select @@datadir`;
            let PAGE_SIZE=`select @@innodb_page_size`;
             
            ALTER TABLE t1 DROP INDEX idx;
             
            set global innodb_fast_shutdown=0;
            --source include/restart_mysqld.inc
             
            # Cleanup
            DROP TABLE t1;
            

            InnoDB reset the bitmap bit while applying the change buffer merge for the freed page and it fails to
            remove the record from ibuf index

                          if (bitmap_bits
                                && DB_SUCCESS
                                == fseg_page_is_allocated(space, page_id.page_no())) {
                                    ibuf_mtr_start(&mtr);
                                    mtr.set_named_space(space);
                                    ibuf_reset_bitmap(block, page_id, zip_size, &mtr);
                                    ibuf_mtr_commit(&mtr);
                                    bitmap_bits = 0;
                            }
             
                            if (!bitmap_bits) {
                                    /* No changes are buffered for this page. */
                                    space->release();
                                    return DB_SUCCESS;
                            }
            

            During slow shutdown, InnoDB iterates through change buffer index again and again. It leads to hang

            In 10.3 the following test case can do the same. But in 10.3, change buffer merge is happening in freed page (ie wrong though during slow shutdown)

             
            set global innodb_disable_background_merge=1;
            CREATE TABLE t1(
                    a INT AUTO_INCREMENT PRIMARY KEY,
                    b CHAR(1),
                    c INT,
                    INDEX idx(b))
            ENGINE=InnoDB STATS_PERSISTENT=0;
             
            connect(con1,localhost,root,,,);
            start transaction with consistent snapshot;
             
            # The flag innodb_change_buffering_debug is only available in debug builds.
            # It instructs InnoDB to try to evict pages from the buffer pool when
            # change buffering is possible, so that the change buffer will be used
            # whenever possible.
            SET GLOBAL innodb_change_buffering=all;
             
            # Create enough rows for the table, so that the change buffer will be
            # used for modifying the secondary index page. There must be multiple
            # index pages, because changes to the root page are never buffered.
            INSERT INTO t1 SELECT 0,'x',1 FROM seq_1_to_4096;
            let MYSQLD_DATADIR=`select @@datadir`;
            let PAGE_SIZE=`select @@innodb_page_size`;
             
            ALTER TABLE t1 DROP INDEX idx;
             
            INSERT INTO t1 SELECT 0,'y',2 FROM seq_1_to_8192;
            connection con1;
            commit;
            set global innodb_disable_background_merge=0;
            set global innodb_fast_shutdown=0;
            --source include/restart_mysqld.inc
             
            # Cleanup
            DROP TABLE t1;
            

            During slow shutdown, InnoDB does apply the change buffer values to the freed page.

            thiru Thirunarayanan Balathandayuthapani added a comment - Following test case can fail to remove the change buffer entries from ibuf index. It leads to hang in slow shutdown --source include/have_innodb.inc --source include/have_debug.inc --source include/have_sequence.inc   CREATE TABLE t1( a INT AUTO_INCREMENT PRIMARY KEY, b CHAR(1), c INT, INDEX idx(b)) ENGINE=InnoDB STATS_PERSISTENT=0;   # The flag innodb_change_buffering_debug is only available in debug builds. # It instructs InnoDB to try to evict pages from the buffer pool when # change buffering is possible, so that the change buffer will be used # whenever possible. SET GLOBAL innodb_change_buffering_debug = 1; SET GLOBAL innodb_change_buffering=all;   # Create enough rows for the table, so that the change buffer will be # used for modifying the secondary index page. There must be multiple # index pages, because changes to the root page are never buffered. INSERT INTO t1 SELECT 0,'x',1 FROM seq_1_to_4096; let MYSQLD_DATADIR=`select @@datadir`; let PAGE_SIZE=`select @@innodb_page_size`;   ALTER TABLE t1 DROP INDEX idx;   set global innodb_fast_shutdown=0; --source include/restart_mysqld.inc   # Cleanup DROP TABLE t1; InnoDB reset the bitmap bit while applying the change buffer merge for the freed page and it fails to remove the record from ibuf index if (bitmap_bits && DB_SUCCESS == fseg_page_is_allocated(space, page_id.page_no())) { ibuf_mtr_start(&mtr); mtr.set_named_space(space); ibuf_reset_bitmap(block, page_id, zip_size, &mtr); ibuf_mtr_commit(&mtr); bitmap_bits = 0; }   if (!bitmap_bits) { /* No changes are buffered for this page. */ space->release(); return DB_SUCCESS; } During slow shutdown, InnoDB iterates through change buffer index again and again. It leads to hang In 10.3 the following test case can do the same. But in 10.3, change buffer merge is happening in freed page (ie wrong though during slow shutdown)   set global innodb_disable_background_merge=1; CREATE TABLE t1( a INT AUTO_INCREMENT PRIMARY KEY, b CHAR(1), c INT, INDEX idx(b)) ENGINE=InnoDB STATS_PERSISTENT=0;   connect(con1,localhost,root,,,); start transaction with consistent snapshot;   # The flag innodb_change_buffering_debug is only available in debug builds. # It instructs InnoDB to try to evict pages from the buffer pool when # change buffering is possible, so that the change buffer will be used # whenever possible. SET GLOBAL innodb_change_buffering=all;   # Create enough rows for the table, so that the change buffer will be # used for modifying the secondary index page. There must be multiple # index pages, because changes to the root page are never buffered. INSERT INTO t1 SELECT 0,'x',1 FROM seq_1_to_4096; let MYSQLD_DATADIR=`select @@datadir`; let PAGE_SIZE=`select @@innodb_page_size`;   ALTER TABLE t1 DROP INDEX idx;   INSERT INTO t1 SELECT 0,'y',2 FROM seq_1_to_8192; connection con1; commit; set global innodb_disable_background_merge=0; set global innodb_fast_shutdown=0; --source include/restart_mysqld.inc   # Cleanup DROP TABLE t1; During slow shutdown, InnoDB does apply the change buffer values to the freed page.

            Until MDEV-30134 has been fixed, it is unsafe to use innodb_change_buffering=purges or innodb_change_buffering=all.

            marko Marko Mäkelä added a comment - Until MDEV-30134 has been fixed, it is unsafe to use innodb_change_buffering=purges or innodb_change_buffering=all .

            People

              allen.lee@mariadb.com Allen Lee (Inactive)
              afleischer Alexander Fleischer
              Votes:
              3 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.