Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34879

InnoDB fails to merge the change buffer to ROW_FORMAT=COMPRESSED tables

Details

    Description

      While testing MDEV-34830, mleich encountered a case where a ROW_FORMAT=COMPRESSED table is being recovered incorrectly in a multi-batch recovery.

      I can reproduce the corruption. Unfortunately, the data set is partially encrypted and a compressed copy of it would be too large to be attached in Jira. I even tried to reduce the archive by omitting some tables, but I could not get it below the 10MiB limit.

      $BUILD/sql/mariadbd --innodb-buffer-pool-size=5m --plugin-dir=$BUILD/plugin/file_key_management --plugin-load-add=file_key_management.so --innodb-page-size=4k --file-key-management-filename=/dev/shm/encryption_keys.txt --datadir /dev/shm/data
      

      10.6 2ed33f2fb660617aa39d4abb7e7383764e63ac8c

      2024-09-05 14:44:31 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=69313049,77981397
      2024-09-05 14:44:31 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-1a.ibd' with space ID 57. Another data file called ./test/t8.ibd exists with the same space ID.
      2024-09-05 14:44:31 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-25.ibd' with space ID 72. Another data file called ./test/t4.ibd exists with the same space ID.
      2024-09-05 14:44:31 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-25.ibd' with space ID 72. Another data file called ./test/t4.ibd exists with the same space ID.
      2024-09-05 14:44:31 0 [Note] InnoDB: Multi-batch recovery needed at LSN 71640495
      2024-09-05 14:44:31 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-25.ibd' with space ID 72. Another data file called ./test/t4.ibd exists with the same space ID.
      2024-09-05 14:44:31 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-1a.ibd' with space ID 57. Another data file called ./test/t8.ibd exists with the same space ID.
      2024-09-05 14:44:31 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-25.ibd' with space ID 76. Another data file called ./test/t5.ibd exists with the same space ID.
      2024-09-05 14:44:31 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-25.ibd' with space ID 76. Another data file called ./test/t5.ibd exists with the same space ID.
      2024-09-05 14:44:31 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-2a.ibd' with space ID 78. Another data file called ./test/t2.ibd exists with the same space ID.
      2024-09-05 14:44:31 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-2a.ibd' with space ID 78. Another data file called ./test/t2.ibd exists with the same space ID.
      2024-09-05 14:44:31 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-25.ibd' with space ID 76. Another data file called ./test/t5.ibd exists with the same space ID.
      2024-09-05 14:44:31 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-2a.ibd' with space ID 78. Another data file called ./test/t2.ibd exists with the same space ID.
      2024-09-05 14:44:32 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-1a.ibd' with space ID 57. Another data file called ./test/t8.ibd exists with the same space ID.
      2024-09-05 14:44:32 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-24.ibd' with space ID 71. Another data file called ./test/t7.ibd exists with the same space ID.
      2024-09-05 14:44:32 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-25.ibd' with space ID 72. Another data file called ./test/t4.ibd exists with the same space ID.
      2024-09-05 14:44:32 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-25.ibd' with space ID 72. Another data file called ./test/t4.ibd exists with the same space ID.
      2024-09-05 14:44:32 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-25.ibd' with space ID 72. Another data file called ./test/t4.ibd exists with the same space ID.
      2024-09-05 14:44:32 0 [Note] InnoDB: To recover: LSN 71968140/71983616; 641 pages
      2024-09-05 14:44:32 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-1a.ibd' with space ID 57. Another data file called ./test/t8.ibd exists with the same space ID.
      2024-09-05 14:44:32 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-25.ibd' with space ID 76. Another data file called ./test/t5.ibd exists with the same space ID.
      2024-09-05 14:44:32 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-25.ibd' with space ID 76. Another data file called ./test/t5.ibd exists with the same space ID.
      2024-09-05 14:44:32 0 [Note] InnoDB: To recover: LSN 74921892/74932736; 671 pages
      2024-09-05 14:44:33 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-2a.ibd' with space ID 78. Another data file called ./test/t2.ibd exists with the same space ID.
      2024-09-05 14:44:33 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-2a.ibd' with space ID 78. Another data file called ./test/t2.ibd exists with the same space ID.
      2024-09-05 14:44:33 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-25.ibd' with space ID 76. Another data file called ./test/t5.ibd exists with the same space ID.
      2024-09-05 14:44:33 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-2a.ibd' with space ID 78. Another data file called ./test/t2.ibd exists with the same space ID.
      2024-09-05 14:44:33 0 [Note] InnoDB: To recover: LSN 77424997/77439488; 688 pages
      2024-09-05 14:44:33 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-24.ibd' with space ID 71. Another data file called ./test/t7.ibd exists with the same space ID.
      2024-09-05 14:44:33 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-24.ibd' with space ID 71. Another data file called ./test/t7.ibd exists with the same space ID.
      2024-09-05 14:44:33 0 [Note] InnoDB: To recover: LSN 80025251/80044544; 691 pages
      2024-09-05 14:44:33 0 [Note] InnoDB: Opened 3 undo tablespaces
      2024-09-05 14:44:33 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces
      2024-09-05 14:44:33 0 [Note] InnoDB: To recover: 306 pages
      2024-09-05 14:44:33 0 [Note] InnoDB: Last binlog file './mysql-bin.000001', position 1698901
      2024-09-05 14:44:33 0 [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
      2024-09-05 14:44:33 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
      2024-09-05 14:44:33 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      2024-09-05 14:44:33 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      2024-09-05 14:44:33 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
      2024-09-05 14:44:33 0 [Note] InnoDB: 10.6.20 started; log sequence number 81173989; transaction id 703
      2024-09-05 14:44:33 0 [Note] InnoDB: Loading buffer pool(s) from /dev/shm/data/ib_buffer_pool
      2024-09-05 14:44:33 0 [Note] Plugin 'FEEDBACK' is disabled.
      2024-09-05 14:44:33 0 [Note] InnoDB: Buffer pool(s) load completed at 240905 14:44:33
      2024-09-05 14:44:33 0 [Note] DDL_LOG: Crash recovery executed 8 entries
      2024-09-05 14:44:33 0 [Note] Server socket created on IP: '0.0.0.0'.
      2024-09-05 14:44:33 0 [Note] Server socket created on IP: '::'.
      2024-09-05 14:44:33 0 [Note] /mariadb/10.6/build/sql/mariadbd: ready for connections.
      Version: '10.6.20-MariaDB'  socket: '/dev/shm/murjutuupa'  port: 3306  Source distribution
      2024-09-05 14:44:34 3 [ERROR] InnoDB: Flagged corruption of `k` in table `test`.`t4` in CHECK TABLE; Wrong count
      

      If I start up the server on the same copy of the data directory with a larger innodb_buffer_pool_size, then CHECK TABLE will report no error.

      Attachments

        Issue Links

          Activity

            It looks like the secondary index root (page 4 of tablespace 0x48) would be recovered correctly, but the PAGE_LEVEL is 1, that is, the index consists of multiple pages. At the time of buf_pool_t::close() on server shutdown, the compressed copy of the secondary index page is identical between a failing and a succeeding run. For the uncompressed root page there is a insignificant difference, in the page checksum field (bytes 0‥3) that only matters on the compressed page. I went on to compare the leaf pages 0x3d, 0x3e, 0x3f, 0x40 between the good and bad recovery. For all uncompressed pages there was the insignificant checksum difference. For one leaf page, there was a significant difference:

            diff -u <(od -Ax -t x1 /dev/shm/good-3e.bin) <(od -Ax -t x1 /dev/shm/bad-3e.bin)
            --- /dev/fd/63	2024-09-06 16:06:10.619400726 +0300
            +++ /dev/fd/62	2024-09-06 16:06:10.619400726 +0300
            @@ -1,7 +1,7 @@
            -000000 00 00 00 00 00 00 00 3e 00 00 00 3d 00 00 00 3f
            -000010 00 00 00 00 04 d7 68 0e 45 bf 00 00 00 00 00 00
            -000020 00 00 00 00 00 48 00 25 08 09 80 97 00 00 00 00
            -000030 08 01 00 02 00 01 00 95 00 00 00 00 00 00 02 b9
            +000000 98 bf a7 56 00 00 00 3e 00 00 00 3d 00 00 00 3f
            +000010 00 00 00 00 04 86 8d 0b 45 bf 00 00 00 00 00 00
            +000020 00 00 00 00 00 48 00 25 07 fc 80 96 00 00 00 00
            +000030 07 f4 00 05 00 00 00 94 00 00 00 00 00 00 02 aa
             000040 00 00 00 00 00 00 00 00 00 8a 00 00 00 00 00 00
             000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00
             000060 02 00 1a 69 6e 66 69 6d 75 6d 00 08 00 0b 00 00
            @@ -95,7 +95,7 @@
             0005e0 00 00 00 00 00 fa a5 04 03 68 02 08 00 00 00 00
             0005f0 00 00 fa a6 00 03 70 00 0d 00 00 00 01 00 00 00
             000600 17 00 03 78 00 0d 00 00 00 01 00 00 00 22 00 03
            -000610 80 00 0d 00 00 00 01 00 00 00 26 06 03 88 00 0d
            +000610 80 00 0d 00 00 00 01 00 00 00 26 05 03 88 00 0d
             000620 00 00 00 01 00 00 00 43 00 03 90 00 0d 00 00 00
             000630 01 00 00 00 61 00 03 98 00 0d 00 00 00 01 00 00
             000640 00 77 00 03 a0 00 0d 00 00 00 01 00 00 00 dd 04
            @@ -125,9 +125,8 @@
             0007c0 00 00 00 02 00 00 00 69 00 04 90 00 0d 00 00 00
             0007d0 02 00 00 00 6a 00 04 98 00 0d 00 00 00 02 00 00
             0007e0 00 9f 00 04 a0 f8 89 00 00 00 02 00 00 00 b4 00
            -0007f0 04 a8 00 0d 00 00 00 00 00 00 fa a7 00 04 b0 fd
            -000800 f8 00 00 00 00 00 00 fa a8 00 00 00 00 00 00 00
            -000810 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            +0007f0 04 a8 fe 05 00 00 00 00 00 00 fa a7 00 00 00 00
            +000800 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
             *
             000fa0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70
             000fb0 07 8c 07 58 07 24 06 f0 06 bc 06 88 06 54 06 20
            diff -u <(od -Ax -t x1 /dev/shm/goodz-3e.bin) <(od -Ax -t x1 /dev/shm/badz-3e.bin)
            --- /dev/fd/63	2024-09-06 16:07:02.001211689 +0300
            +++ /dev/fd/62	2024-09-06 16:07:01.997211548 +0300
            @@ -1,7 +1,7 @@
            -000000 64 b6 ff ff 00 00 00 3e 00 00 00 3d 00 00 00 3f
            -000010 00 00 00 00 04 d7 68 0e 45 bf 00 00 00 00 00 00
            -000020 00 00 00 00 00 48 00 25 08 09 80 97 00 00 00 00
            -000030 08 01 00 02 00 01 00 95 00 00 00 00 00 00 02 b9
            +000000 98 bf a7 56 00 00 00 3e 00 00 00 3d 00 00 00 3f
            +000010 00 00 00 00 04 86 8d 0b 45 bf 00 00 00 00 00 00
            +000020 00 00 00 00 00 48 00 25 07 fc 80 96 00 00 00 00
            +000030 07 f4 00 05 00 00 00 94 00 00 00 00 00 00 02 aa
             000040 00 00 00 00 00 00 00 00 00 8a 00 00 00 00 00 00
             000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 48 0d
             000060 12 64 00 00 00 00 ff ff 2d d2 4b 4b d4 51 18 07
            @@ -25,8 +25,7 @@
             000180 38 57 c8 fb 91 39 39 2c e4 61 96 f1 3e a7 f9 22
             000190 32 91 cf 02 16 b3 89 97 78 97 6b fc 9e 35 f4 2c
             0001a0 65 07 2f 30 fc e7 30 bf f8 0f 94 58 55 d6 81 28
            -0001b0 00 00 00 00 00 00 fa a7 81 2a 00 00 00 00 00 00
            -0001c0 fa a8 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            -0001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            +0001b0 00 00 00 00 00 00 fa a7 00 00 00 00 00 00 00 00
            +0001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
             *
             000400
            

            I must focus on the recovery of this page 0x3e in the tablespace 0x48. The correct FIL_PAGE_LSN would be 0x04d7680e while we only got the changes up to 0x04868d0b.

            marko Marko Mäkelä added a comment - It looks like the secondary index root (page 4 of tablespace 0x48) would be recovered correctly, but the PAGE_LEVEL is 1, that is, the index consists of multiple pages. At the time of buf_pool_t::close() on server shutdown, the compressed copy of the secondary index page is identical between a failing and a succeeding run. For the uncompressed root page there is a insignificant difference, in the page checksum field (bytes 0‥3) that only matters on the compressed page. I went on to compare the leaf pages 0x3d, 0x3e, 0x3f, 0x40 between the good and bad recovery. For all uncompressed pages there was the insignificant checksum difference. For one leaf page, there was a significant difference: diff -u <(od -Ax -t x1 /dev/shm/good-3e.bin) <(od -Ax -t x1 /dev/shm/bad-3e.bin) --- /dev/fd/63 2024-09-06 16:06:10.619400726 +0300 +++ /dev/fd/62 2024-09-06 16:06:10.619400726 +0300 @@ -1,7 +1,7 @@ -000000 00 00 00 00 00 00 00 3e 00 00 00 3d 00 00 00 3f -000010 00 00 00 00 04 d7 68 0e 45 bf 00 00 00 00 00 00 -000020 00 00 00 00 00 48 00 25 08 09 80 97 00 00 00 00 -000030 08 01 00 02 00 01 00 95 00 00 00 00 00 00 02 b9 +000000 98 bf a7 56 00 00 00 3e 00 00 00 3d 00 00 00 3f +000010 00 00 00 00 04 86 8d 0b 45 bf 00 00 00 00 00 00 +000020 00 00 00 00 00 48 00 25 07 fc 80 96 00 00 00 00 +000030 07 f4 00 05 00 00 00 94 00 00 00 00 00 00 02 aa 000040 00 00 00 00 00 00 00 00 00 8a 00 00 00 00 00 00 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 000060 02 00 1a 69 6e 66 69 6d 75 6d 00 08 00 0b 00 00 @@ -95,7 +95,7 @@ 0005e0 00 00 00 00 00 fa a5 04 03 68 02 08 00 00 00 00 0005f0 00 00 fa a6 00 03 70 00 0d 00 00 00 01 00 00 00 000600 17 00 03 78 00 0d 00 00 00 01 00 00 00 22 00 03 -000610 80 00 0d 00 00 00 01 00 00 00 26 06 03 88 00 0d +000610 80 00 0d 00 00 00 01 00 00 00 26 05 03 88 00 0d 000620 00 00 00 01 00 00 00 43 00 03 90 00 0d 00 00 00 000630 01 00 00 00 61 00 03 98 00 0d 00 00 00 01 00 00 000640 00 77 00 03 a0 00 0d 00 00 00 01 00 00 00 dd 04 @@ -125,9 +125,8 @@ 0007c0 00 00 00 02 00 00 00 69 00 04 90 00 0d 00 00 00 0007d0 02 00 00 00 6a 00 04 98 00 0d 00 00 00 02 00 00 0007e0 00 9f 00 04 a0 f8 89 00 00 00 02 00 00 00 b4 00 -0007f0 04 a8 00 0d 00 00 00 00 00 00 fa a7 00 04 b0 fd -000800 f8 00 00 00 00 00 00 fa a8 00 00 00 00 00 00 00 -000810 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +0007f0 04 a8 fe 05 00 00 00 00 00 00 fa a7 00 00 00 00 +000800 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 000fa0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70 000fb0 07 8c 07 58 07 24 06 f0 06 bc 06 88 06 54 06 20 diff -u <(od -Ax -t x1 /dev/shm/goodz-3e.bin) <(od -Ax -t x1 /dev/shm/badz-3e.bin) --- /dev/fd/63 2024-09-06 16:07:02.001211689 +0300 +++ /dev/fd/62 2024-09-06 16:07:01.997211548 +0300 @@ -1,7 +1,7 @@ -000000 64 b6 ff ff 00 00 00 3e 00 00 00 3d 00 00 00 3f -000010 00 00 00 00 04 d7 68 0e 45 bf 00 00 00 00 00 00 -000020 00 00 00 00 00 48 00 25 08 09 80 97 00 00 00 00 -000030 08 01 00 02 00 01 00 95 00 00 00 00 00 00 02 b9 +000000 98 bf a7 56 00 00 00 3e 00 00 00 3d 00 00 00 3f +000010 00 00 00 00 04 86 8d 0b 45 bf 00 00 00 00 00 00 +000020 00 00 00 00 00 48 00 25 07 fc 80 96 00 00 00 00 +000030 07 f4 00 05 00 00 00 94 00 00 00 00 00 00 02 aa 000040 00 00 00 00 00 00 00 00 00 8a 00 00 00 00 00 00 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 48 0d 000060 12 64 00 00 00 00 ff ff 2d d2 4b 4b d4 51 18 07 @@ -25,8 +25,7 @@ 000180 38 57 c8 fb 91 39 39 2c e4 61 96 f1 3e a7 f9 22 000190 32 91 cf 02 16 b3 89 97 78 97 6b fc 9e 35 f4 2c 0001a0 65 07 2f 30 fc e7 30 bf f8 0f 94 58 55 d6 81 28 -0001b0 00 00 00 00 00 00 fa a7 81 2a 00 00 00 00 00 00 -0001c0 fa a8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 -0001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +0001b0 00 00 00 00 00 00 fa a7 00 00 00 00 00 00 00 00 +0001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 000400 I must focus on the recovery of this page 0x3e in the tablespace 0x48. The correct FIL_PAGE_LSN would be 0x04d7680e while we only got the changes up to 0x04868d0b .

            The problem is related to our good friend innodb_change_buffering (MDEV-27734, MDEV-29694). In the good run, the FIL_PAGE_LSN was last changed by the following:

            (rr) backtrace
            #0  0x00005620d2fa0780 in memcpy (__dest=0x7f60f7178010, __src=0x7f60f7177010, __len=8) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29
            #1  memcpy_aligned<8ul> (dest=0x7f60f7178010, src=0x7f60f7177010, n=8) at /mariadb/10.6/include/assume_aligned.h:44
            #2  buf_flush_note_modification (b=0x7f60f56eb280, start=<optimized out>, end=<optimized out>) at /mariadb/10.6/storage/innobase/include/buf0flu.h:111
            #3  mtr_t::commit (this=this@entry=0x7f6116448140) at /mariadb/10.6/storage/innobase/mtr/mtr0mtr.cc:204
            #4  0x00005620d2f80eb4 in ibuf_mtr_commit (mtr=0x7f6116448140) at /mariadb/10.6/storage/innobase/include/ibuf0ibuf.inl:64
            #5  ibuf_merge_or_delete_for_page (block=block@entry=0x7f60f56eb280, page_id=..., zip_size=<optimized out>) at /mariadb/10.6/storage/innobase/ibuf/ibuf0ibuf.cc:4314
            #6  0x00005620d290b938 in buf_page_ibuf_merge_try (block=block@entry=0x7f60f56eb280, rw_latch=rw_latch@entry=1, err=err@entry=0x7f611644864c) at /mariadb/10.6/storage/innobase/include/buf0buf.h:653
            #7  0x00005620d290baa0 in buf_page_get_low (page_id=..., zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=<optimized out>, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) at /mariadb/10.6/storage/innobase/buf/buf0buf.cc:3018
            

            In the bad run (with innodb_buffer_pool_size=5m on my system), there are a few calls to this function, but not for this particular page. We apparently forgot that there may be unmerged changes for this page.

            I tried one more experiment: Restarting the server between the multi-batch recovery and the CHECK TABLE. In this case, the change buffer merge could take place on the CHECK TABLE. But also in this case, a corruption will be reported:

            $echo 'check table t4;'|mariadb -uroot test
            Table	Op	Msg_type	Msg_text
            test.t4	check	Warning	InnoDB: Index 'k' contains 1012 entries, should be 1013.
            test.t4	check	error	Corrupt
            

            marko Marko Mäkelä added a comment - The problem is related to our good friend innodb_change_buffering ( MDEV-27734 , MDEV-29694 ). In the good run, the FIL_PAGE_LSN was last changed by the following: (rr) backtrace #0 0x00005620d2fa0780 in memcpy (__dest=0x7f60f7178010, __src=0x7f60f7177010, __len=8) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29 #1 memcpy_aligned<8ul> (dest=0x7f60f7178010, src=0x7f60f7177010, n=8) at /mariadb/10.6/include/assume_aligned.h:44 #2 buf_flush_note_modification (b=0x7f60f56eb280, start=<optimized out>, end=<optimized out>) at /mariadb/10.6/storage/innobase/include/buf0flu.h:111 #3 mtr_t::commit (this=this@entry=0x7f6116448140) at /mariadb/10.6/storage/innobase/mtr/mtr0mtr.cc:204 #4 0x00005620d2f80eb4 in ibuf_mtr_commit (mtr=0x7f6116448140) at /mariadb/10.6/storage/innobase/include/ibuf0ibuf.inl:64 #5 ibuf_merge_or_delete_for_page (block=block@entry=0x7f60f56eb280, page_id=..., zip_size=<optimized out>) at /mariadb/10.6/storage/innobase/ibuf/ibuf0ibuf.cc:4314 #6 0x00005620d290b938 in buf_page_ibuf_merge_try (block=block@entry=0x7f60f56eb280, rw_latch=rw_latch@entry=1, err=err@entry=0x7f611644864c) at /mariadb/10.6/storage/innobase/include/buf0buf.h:653 #7 0x00005620d290baa0 in buf_page_get_low (page_id=..., zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=<optimized out>, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) at /mariadb/10.6/storage/innobase/buf/buf0buf.cc:3018 In the bad run (with innodb_buffer_pool_size=5m on my system), there are a few calls to this function, but not for this particular page. We apparently forgot that there may be unmerged changes for this page. I tried one more experiment: Restarting the server between the multi-batch recovery and the CHECK TABLE . In this case, the change buffer merge could take place on the CHECK TABLE . But also in this case, a corruption will be reported: $echo 'check table t4;'|mariadb -uroot test Table Op Msg_type Msg_text test.t4 check Warning InnoDB: Index 'k' contains 1012 entries, should be 1013. test.t4 check error Corrupt

            New working day, new generated rr replay trace. In today’s trace, after the page 0x48:0x3e was recovered and written out, there were 2 further recovery batches:

            10.6

            2024-09-09 13:18:06 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-24.ibd' with space ID 71. Another data file called ./test/t7.ibd exists with the same space ID.
            2024-09-09 13:18:06 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-24.ibd' with space ID 71. Another data file called ./test/t7.ibd exists with the same space ID.
            2024-09-09 13:18:06 0 [Note] InnoDB: To recover: LSN 80025251/80044544; 691 pages
            2024-09-09 13:18:06 0 [Note] InnoDB: Opened 3 undo tablespaces
            2024-09-09 13:18:06 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces
            2024-09-09 13:18:06 0 [Note] InnoDB: To recover: 306 pages
            …
            2024-09-09 13:18:28 3 [ERROR] InnoDB: Flagged corruption of `k` in table `test`.`t4` in CHECK TABLE; Wrong count
            

            The issue seems to be that when we are reading the block into the buffer pool after the recovery, we are not checking whether any buffered changes exist for it. There is also a performance deficiency that buf_page_get_low() is invoking buf_read_page_low() to first read a compressed page only, instead of reserving both an uncompressed and a compressed page frame upfront. This in fact explains why we are seeing this issue at all. The following seems to fix the recovery in my environment:

            diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
            index 580e004ab3b..54cb04a113e 100644
            --- a/storage/innobase/buf/buf0buf.cc
            +++ b/storage/innobase/buf/buf0buf.cc
            @@ -3765,7 +3765,7 @@ dberr_t buf_page_t::read_complete(const fil_node_t &node)
               if (recovery && !recv_recover_page(node.space, this))
                 return DB_PAGE_CORRUPTED;
             
            -  const bool ibuf_may_exist= frame && !recv_no_ibuf_operations &&
            +  const bool ibuf_may_exist= !recv_no_ibuf_operations &&
                 (!expected_id.space() || !is_predefined_tablespace(expected_id.space())) &&
                 fil_page_get_type(read_frame) == FIL_PAGE_INDEX &&
                 page_is_leaf(read_frame);
            

            For pages that were read in the compressed-only format into the buffer pool, we’d pretend that no buffered changes existed, and therefore wrongly skipping the change buffer merge.

            marko Marko Mäkelä added a comment - New working day, new generated rr replay trace. In today’s trace, after the page 0x48:0x3e was recovered and written out, there were 2 further recovery batches: 10.6 2024-09-09 13:18:06 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-24.ibd' with space ID 71. Another data file called ./test/t7.ibd exists with the same space ID. 2024-09-09 13:18:06 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-cdb92-24.ibd' with space ID 71. Another data file called ./test/t7.ibd exists with the same space ID. 2024-09-09 13:18:06 0 [Note] InnoDB: To recover: LSN 80025251/80044544; 691 pages 2024-09-09 13:18:06 0 [Note] InnoDB: Opened 3 undo tablespaces 2024-09-09 13:18:06 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces 2024-09-09 13:18:06 0 [Note] InnoDB: To recover: 306 pages … 2024-09-09 13:18:28 3 [ERROR] InnoDB: Flagged corruption of `k` in table `test`.`t4` in CHECK TABLE; Wrong count The issue seems to be that when we are reading the block into the buffer pool after the recovery, we are not checking whether any buffered changes exist for it. There is also a performance deficiency that buf_page_get_low() is invoking buf_read_page_low() to first read a compressed page only, instead of reserving both an uncompressed and a compressed page frame upfront. This in fact explains why we are seeing this issue at all. The following seems to fix the recovery in my environment: diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc index 580e004ab3b..54cb04a113e 100644 --- a/storage/innobase/buf/buf0buf.cc +++ b/storage/innobase/buf/buf0buf.cc @@ -3765,7 +3765,7 @@ dberr_t buf_page_t::read_complete(const fil_node_t &node) if (recovery && !recv_recover_page(node.space, this)) return DB_PAGE_CORRUPTED; - const bool ibuf_may_exist= frame && !recv_no_ibuf_operations && + const bool ibuf_may_exist= !recv_no_ibuf_operations && (!expected_id.space() || !is_predefined_tablespace(expected_id.space())) && fil_page_get_type(read_frame) == FIL_PAGE_INDEX && page_is_leaf(read_frame); For pages that were read in the compressed-only format into the buffer pool, we’d pretend that no buffered changes existed, and therefore wrongly skipping the change buffer merge.

            The unnecessary call to buf_pool_t::unzip() can be avoided by having buf_read_page() invoke buf_read_page_low(…, unzip=true). Each caller of buf_read_page() is interested in an uncompressed page frame. The parameter unzip=false makes sense for asynchronous read-ahead.

            marko Marko Mäkelä added a comment - The unnecessary call to buf_pool_t::unzip() can be avoided by having buf_read_page() invoke buf_read_page_low(…, unzip=true) . Each caller of buf_read_page() is interested in an uncompressed page frame. The parameter unzip=false makes sense for asynchronous read-ahead.

            Thanks for the patch. Yet another issue with change buffering. Looks good to me.

            debarun Debarun Banerjee added a comment - Thanks for the patch. Yet another issue with change buffering. Looks good to me.

            The tree origin/st-10.11-merge b187414764ce7616017ab180e257b1f6be8be599 2024-09-16T10:58:40+03:0
            which contains fixes for MDEV-34791 and MDEV-34879 performed well in RQG testing.
            1. No new problems
            2. Test runs on origin/10.11 b187414764ce7616017ab180e257b1f6be8be599 2024-09-16T10:58:40+03:00
                showed a serious crowd of tests failing with
                - server hangs (MDEV-34791 should fix that)
                - corruptions (MDEV-34879 should fix that)
            

            mleich Matthias Leich added a comment - The tree origin/st-10.11-merge b187414764ce7616017ab180e257b1f6be8be599 2024-09-16T10:58:40+03:0 which contains fixes for MDEV-34791 and MDEV-34879 performed well in RQG testing. 1. No new problems 2. Test runs on origin/10.11 b187414764ce7616017ab180e257b1f6be8be599 2024-09-16T10:58:40+03:00 showed a serious crowd of tests failing with - server hangs (MDEV-34791 should fix that) - corruptions (MDEV-34879 should fix that)
            adelin Adelin added a comment -

            Hello everyone,

            We use a MariaDB Galera Cluster of three nodes, on Debian 11.
            After we've upgraded from 10.11.9 to 10.11.10, on all of our nodes we get the following message popping up every once in a while:

            [ERROR] InnoDB: Cannot load compressed BLOB [page id: space=6226, page number=35431]

            On each node the space_id / page_number is different, but considering this error appeared on all the nodes after the upgrade, I'm just wondering if the latest changes didn't break anything.

            This error makes me believe data corruption has occurred, however `CHECK TABLE` of the table having that reported `space_id` does not report any. Fortunately, I can recover the tables from the other nodes, since the corrupted tables are different on each node.

            All that being said, I was just wondering if these latest changes made for this bug didn't break anything, which corrupts data.

            Thanks,
            A.

            adelin Adelin added a comment - Hello everyone, We use a MariaDB Galera Cluster of three nodes, on Debian 11. After we've upgraded from 10.11.9 to 10.11.10, on all of our nodes we get the following message popping up every once in a while: [ERROR] InnoDB: Cannot load compressed BLOB [page id: space=6226, page number=35431] On each node the space_id / page_number is different, but considering this error appeared on all the nodes after the upgrade, I'm just wondering if the latest changes didn't break anything. This error makes me believe data corruption has occurred, however `CHECK TABLE` of the table having that reported `space_id` does not report any. Fortunately, I can recover the tables from the other nodes, since the corrupted tables are different on each node. All that being said, I was just wondering if these latest changes made for this bug didn't break anything, which corrupts data. Thanks, A.

            adelin, I think that your bug may be a duplicate of MDEV-35413, which is a "fake" corruption report due to a race condition during a read of a BLOB page in a ROW_FORMAT=COMPRESSED table. There is also one more bug related to ROW_FORMAT=COMPRESSED and the change buffer, which I don’t think has been filed yet; see MDEV-35626.

            marko Marko Mäkelä added a comment - adelin , I think that your bug may be a duplicate of MDEV-35413 , which is a "fake" corruption report due to a race condition during a read of a BLOB page in a ROW_FORMAT=COMPRESSED table. There is also one more bug related to ROW_FORMAT=COMPRESSED and the change buffer, which I don’t think has been filed yet; see MDEV-35626 .

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.