Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23231

[ERROR] [FATAL] InnoDB: Apparent corruption of an index page ... during crash recovery

Details

    Description

      Work flow:
      1. Start server
      2. 33 sessions run concurrend DDL and DML
      3. At some point of time SIGKILL gets sent to the server during 2. is ongoing
      4. Attempt to restart the server
       
      # 2020-07-17T19:57:26 [99933] | [rr 15697 73975]2020-07-17 19:56:49 0 [Note] Recovering after a crash using mysql-bin
      # 2020-07-17T19:57:26 [99933] | [rr 15697 82508]2020-07-17 19:56:53 0 [ERROR] InnoDB: n recs wrong 2817 2816
      # 2020-07-17T19:57:26 [99933] | [rr 15697 82512]2020-07-17 19:56:53 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 673628625, calculated checksums for field1: crc32 673628625, innodb 1039194117,  page type 17855 == INDEX.none 3735928559, stored checksum in field2 673628625, calculated checksums for field2: crc32 673628625, innodb 500201189, none 3735928559,  page LSN 0 33227406, low 4 bytes of LSN at page end 33227406, page number (if stored to page already) 42, space id (if created with >= MySQL-4.1.1 and stored already) 12
      # 2020-07-17T19:57:26 [99933] | [rr 15697 82516]2020-07-17 19:56:53 0 [Note] InnoDB: Page may be an index page where index id is 40
      # 2020-07-17T19:57:26 [99933] | [rr 15697 82520]2020-07-17 19:56:53 0 [Note] InnoDB: Index 40 is `k` in table `test`.`t8`
      # 2020-07-17T19:57:26 [99933] | [rr 15697 82524]2020-07-17 19:56:53 0 [ERROR] [FATAL] InnoDB: Apparent corruption of an index page [page id: space=12, page number=42] to be written to data file. We intentionally crash the server to prevent corrupt data from ending up in data files.
      # 2020-07-17T19:57:26 [99933] | [rr 15697 82543]200717 19:56:53 [rr 15697 82546][ERROR] mysqld got signal 6 ;
       
      (rr) bt
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
      #1  0x0000640d09c208b1 in __GI_abort () at abort.c:79
      #2  0x000055754a97a0ae in ib::fatal::~fatal (this=<optimized out>, __in_chrg=<optimized out>) at /home/mleich/bb-10.4-thiru/storage/innobase/ut/ut0ut.cc:601
      #3  0x000055754aa15ecf in buf_dblwr_assert_on_corrupt_block (block=block@entry=0x757b0cbf4a60) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0dblwr.cc:843
      #4  0x000055754aa16008 in buf_dblwr_check_block (block=block@entry=0x757b0cbf4a60) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0dblwr.cc:898
      #5  0x000055754aa1751d in buf_dblwr_flush_buffered_writes () at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0dblwr.cc:1036
      #6  0x000055754aa18654 in buf_dblwr_add_to_batch (bpage=bpage@entry=0x757b0cc11b30) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0dblwr.cc:1170
      #7  0x000055754aa27cac in buf_flush_write_block_low (bpage=bpage@entry=0x757b0cc11b30, flush_type=flush_type@entry=BUF_FLUSH_LRU, sync=sync@entry=false) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1049
      #8  0x000055754aa2aace in buf_flush_page (buf_pool=buf_pool@entry=0x55754d4415d0, bpage=bpage@entry=0x757b0cc11b30, flush_type=flush_type@entry=BUF_FLUSH_LRU, sync=sync@entry=false) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1197
      #9  0x000055754aa2c877 in buf_flush_try_neighbors (page_id=page_id@entry=..., flush_type=flush_type@entry=BUF_FLUSH_LRU, n_flushed=119, n_to_flush=n_to_flush@entry=548) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1423
      #10 0x000055754aa2cb2b in buf_flush_page_and_try_neighbors (bpage=bpage@entry=0x757b0cc0b570, flush_type=flush_type@entry=BUF_FLUSH_LRU, n_to_flush=n_to_flush@entry=548, count=count@entry=0x3dff6cc42c60)
          at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1497
      #11 0x000055754aa2d30d in buf_flush_LRU_list_batch (buf_pool=buf_pool@entry=0x55754d4415d0, max=548, n=n@entry=0x3dff6cc42c60) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1631
      #12 0x000055754aa2d93a in buf_do_LRU_batch (buf_pool=buf_pool@entry=0x55754d4415d0, max=max@entry=548, n=n@entry=0x3dff6cc42c60) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1694
      #13 0x000055754aa2db52 in buf_flush_batch (buf_pool=buf_pool@entry=0x55754d4415d0, flush_type=flush_type@entry=BUF_FLUSH_LRU, min_n=min_n@entry=548, lsn_limit=lsn_limit@entry=0, n=n@entry=0x3dff6cc42c60)
          at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1823
      #14 0x000055754aa2dd44 in buf_flush_do_batch (buf_pool=buf_pool@entry=0x55754d4415d0, type=type@entry=BUF_FLUSH_LRU, min_n=548, lsn_limit=lsn_limit@entry=0, n=n@entry=0x3dff6cc42c60) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1984
      #15 0x000055754aa2e12b in buf_flush_LRU_list (buf_pool=buf_pool@entry=0x55754d4415d0) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:2246
      #16 0x000055754aa2e2a4 in pc_flush_slot () at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:2767
      #17 0x000055754aa2f9e3 in buf_flush_page_cleaner_coordinator () at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:3196
      #18 0x00006a93691756db in start_thread (arg=0x3dff6cc43700) at pthread_create.c:463
      #19 0x0000640d09d01a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      (rr)
       
      Observed on
      - origin/bb-10.4-thiru a13662dac1df224c607983815b2fe85ee72749c2 2020-07-17T22:49:03+05:30
      - some actual 10.5.5 ~ 2020-07-13
       
      RQG
      git clone https://github.com/mleich1/rqg --branch experimental RQG
      origin/experimental f3d72ce51b150ee1f52a93d18f2024af293d925e 2020-07-13T20:46:53+02:00
       
      perl rqg.pl \                        
      --grammar=conf/mariadb/innodb_compression_encryption.yy \
      --gendata=conf/mariadb/innodb_compression_encryption.zz \
      --mysqld=--plugin-load-add=file_key_management.so \
      --mysqld=--loose-file-key-management-filename=/home/mleich/RQG_mleich/conf/mariadb/encryption_keys.txt \
      --mysqld=--innodb-encrypt-log \
      --mysqld=--innodb-encrypt-tables \
      --mysqld=--innodb_use_native_aio=0 \
      --mysqld=--innodb_lock_schedule_algorithm=fcfs \
      --mysqld=--loose-idle_write_transaction_timeout=0 \
      --mysqld=--loose-idle_transaction_timeout=0 \
      --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      --mysqld=--connect_timeout=60 \
      --mysqld=--interactive_timeout=28800 \
      --mysqld=--slave_net_timeout=60 \
      --mysqld=--net_read_timeout=30 \
      --mysqld=--net_write_timeout=60 \
      --mysqld=--loose-table_lock_wait_timeout=50 \
      --mysqld=--wait_timeout=28800 \
      --mysqld=--lock-wait-timeout=86400 \
      --mysqld=--innodb-lock-wait-timeout=50 \
      --no-mask \
      --queries=10000000 \
      --seed=random \
      --reporters=Backtrace \
      --reporters=ErrorLog \
      --reporters=Deadlock1 \
      --reporters=CrashRecovery1 \
      --validators=None \
      --mysqld=--log_output=none \
      --mysqld=--log-bin \
      --mysqld=--log_bin_trust_function_creators=1 \
      --mysqld=--loose-max-statement-time=30 \
      --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      --engine=InnoDB \
      --restart_timeout=120 \
      --duration=300 \
      --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      --threads=33 \
      --mysqld=--innodb_page_size=64K \
      --mysqld=--innodb-buffer-pool-size=256M \
      --duration=300 \
      --no_mask \
      --workdir=<local settings> \
      --vardir=<local settings> \
      --mtr-build-thread=<local settings> \
      --basedir1=<local settings> \
      --basedir2=<local settings> \
      --script_debug=_nix_ \
      --rr=Server \
      --rr_options=--chaos
      
      

      Attachments

        Issue Links

          Activity

            I did some quick initial analysis of the rr replay trace. The file is encrypted, and the decrypted value of PAGE_N_RECS before applying any redo log is 0xa92 and redo log apply is growing it to 0xaff (2815), which appears to be off by one:

            [rr 15697 82508]2020-07-17 19:56:53 0 [ERROR] InnoDB: n recs wrong 2817 2816

            10.4 956f21c3b0beba89835e8d24fde4b67b589ab142 with MDEV-23199 fix

            Thread 4 hit Hardware watchpoint 1: -location block->frame[0x36]@2
             
            Old value = "\n\222"
            New value = <incomplete sequence \371>
            __memmove_avx_unaligned_erms ()
                at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:249
            249	in ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
            (rr) when
            Current event: 9067
            …
            Thread 4 hit Hardware watchpoint 1: -location block->frame[0x36]@2
             
            Old value = "\n\377"
            New value = "\n\376"
            page_header_set_field (page=page@entry=0x757b0df20000 "\317\371:\376", 
                page_zip=page_zip@entry=0x0, field=field@entry=16, val=2814)
                at /home/mleich/bb-10.4-thiru/storage/innobase/include/page0page.ic:172
            172		if (page_zip) {
            Current event: 9067
             
            Thread 4 hit Hardware watchpoint 1: -location block->frame[0x36]@2
             
            Old value = "\n\376"
            New value = "\n\377"
            page_header_set_field (page=page@entry=0x757b0df20000 "\317\371:\376", page_zip=page_zip@entry=0x0, field=field@entry=16, val=2815) at /home/mleich/bb-10.4-thiru/storage/innobase/include/page0page.ic:172
            172		if (page_zip) {
            Current event: 9067
            …
            [rr 15697 71203]2020-07-17 19:56:48 0 [Note] InnoDB: Buffer pool(s) load completed at 200717 19:56:48
            [rr 15697 73269]2020-07-17 19:56:49 0 [Warning] /home/mleich/Server_bin/bb-10.4-thiru_debug_Og/bin/mysqld: unknown variable 'loose-table_lock_wait_timeout=50'
            [rr 15697 73975]2020-07-17 19:56:49 0 [Note] Recovering after a crash using mysql-bin
            [rr 15697 82508]2020-07-17 19:56:53 0 [ERROR] InnoDB: n recs wrong 2817 2816
            

            I had set up a hardware watchpoint on the PAGE_N_RECS: watch l block>frame[0x36]@2.

            mleich, what else should we copy than ../1/data_orig to be able to start up the recovery on a separate system? What are the encryption parameters?
            My preference would be to repeat the recovery failure locally, possibly with --debug=d,ib_log or some instrumentation added to the executable (such as added calls to page_simple_validate_new() after applying each record to the page), and then (if needed) use this information to find out what the server did wrong before being killed.

            marko Marko Mäkelä added a comment - I did some quick initial analysis of the rr replay trace. The file is encrypted, and the decrypted value of PAGE_N_RECS before applying any redo log is 0xa92 and redo log apply is growing it to 0xaff (2815), which appears to be off by one: [rr 15697 82508]2020-07-17 19:56:53 0 [ERROR] InnoDB: n recs wrong 2817 2816 10.4 956f21c3b0beba89835e8d24fde4b67b589ab142 with MDEV-23199 fix Thread 4 hit Hardware watchpoint 1: -location block->frame[0x36]@2   Old value = "\n\222" New value = <incomplete sequence \371> __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:249 249 in ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (rr) when Current event: 9067 … Thread 4 hit Hardware watchpoint 1: -location block->frame[0x36]@2   Old value = "\n\377" New value = "\n\376" page_header_set_field (page=page@entry=0x757b0df20000 "\317\371:\376", page_zip=page_zip@entry=0x0, field=field@entry=16, val=2814) at /home/mleich/bb-10.4-thiru/storage/innobase/include/page0page.ic:172 172 if (page_zip) { Current event: 9067   Thread 4 hit Hardware watchpoint 1: -location block->frame[0x36]@2   Old value = "\n\376" New value = "\n\377" page_header_set_field (page=page@entry=0x757b0df20000 "\317\371:\376", page_zip=page_zip@entry=0x0, field=field@entry=16, val=2815) at /home/mleich/bb-10.4-thiru/storage/innobase/include/page0page.ic:172 172 if (page_zip) { Current event: 9067 … [rr 15697 71203]2020-07-17 19:56:48 0 [Note] InnoDB: Buffer pool(s) load completed at 200717 19:56:48 [rr 15697 73269]2020-07-17 19:56:49 0 [Warning] /home/mleich/Server_bin/bb-10.4-thiru_debug_Og/bin/mysqld: unknown variable 'loose-table_lock_wait_timeout=50' [rr 15697 73975]2020-07-17 19:56:49 0 [Note] Recovering after a crash using mysql-bin [rr 15697 82508]2020-07-17 19:56:53 0 [ERROR] InnoDB: n recs wrong 2817 2816 I had set up a hardware watchpoint on the PAGE_N_RECS : watch l block >frame[0x36]@2 . mleich , what else should we copy than ../1/data_orig to be able to start up the recovery on a separate system? What are the encryption parameters? My preference would be to repeat the recovery failure locally, possibly with --debug=d,ib_log or some instrumentation added to the executable (such as added calls to page_simple_validate_new() after applying each record to the page), and then (if needed) use this information to find out what the server did wrong before being killed.

            I checked the contents of the doublewrite buffer for any page number 42. Could it be that the tablespace ID is encrypted?

            od -Ax -t x1 -j 0x400000 -N 0x800000 data_orig/ibdata1 | grep -A2 '^..0000 .. .. .. .. 00 00 00 2a'
            

            420000 de ad be ef 00 00 00 2a 00 00 00 2e 00 00 00 31
            420010 00 00 00 00 01 fb 02 92 92 19 00 00 00 01 a3 d3
            420020 28 b6 00 00 00 0c cf a5 00 02 bf dc b4 81 16 61
            --
            560000 cf f9 3a fe 00 00 00 2a 00 00 00 21 00 00 00 2c
            560010 00 00 00 00 01 08 c8 a1 45 bf 00 00 00 01 26 be
            560020 6e 7d 00 00 00 0c 21 82 bd df 05 84 33 86 4b c2
            

            Note: recv_sys.dblwr.pages[127] seems to correspond to the first file offset (0x400000). And it looks like the tablespace ID is encrypted in the file, but not in the buffer:

            (gdb) break buf_dblwr_process
            (gdb) run
            (gdb) p/x recv_sys.dblwr.pages[125][4]@4
            $1 = {0x0, 0x0, 0x0, 0x2a}
            (gdb) p/x recv_sys.dblwr.pages[125][34]@4
            $2 = {0x0, 0x0, 0x0, 0xc}
            (gdb) p/x recv_sys.dblwr.pages[105][4]@4
            $3 = {0x0, 0x0, 0x0, 0x2a}
            (gdb) p/x recv_sys.dblwr.pages[105][34]@4
            $4 = {0x0, 0x0, 0x0, 0xc}
            

            So, we indeed seem to have two copies of the page 12:42! And one of them has the 0xdeadbeef checksum. This might count as wrong usage (innodb_checksum_algorithm=none), but I would not dismiss it yet.

            marko Marko Mäkelä added a comment - I checked the contents of the doublewrite buffer for any page number 42. Could it be that the tablespace ID is encrypted? od -Ax -t x1 -j 0x400000 -N 0x800000 data_orig/ibdata1 | grep -A2 '^..0000 .. .. .. .. 00 00 00 2a' 420000 de ad be ef 00 00 00 2a 00 00 00 2e 00 00 00 31 420010 00 00 00 00 01 fb 02 92 92 19 00 00 00 01 a3 d3 420020 28 b6 00 00 00 0c cf a5 00 02 bf dc b4 81 16 61 -- 560000 cf f9 3a fe 00 00 00 2a 00 00 00 21 00 00 00 2c 560010 00 00 00 00 01 08 c8 a1 45 bf 00 00 00 01 26 be 560020 6e 7d 00 00 00 0c 21 82 bd df 05 84 33 86 4b c2 Note: recv_sys.dblwr.pages[127] seems to correspond to the first file offset (0x400000). And it looks like the tablespace ID is encrypted in the file, but not in the buffer: (gdb) break buf_dblwr_process (gdb) run (gdb) p/x recv_sys.dblwr.pages[125][4]@4 $1 = {0x0, 0x0, 0x0, 0x2a} (gdb) p/x recv_sys.dblwr.pages[125][34]@4 $2 = {0x0, 0x0, 0x0, 0xc} (gdb) p/x recv_sys.dblwr.pages[105][4]@4 $3 = {0x0, 0x0, 0x0, 0x2a} (gdb) p/x recv_sys.dblwr.pages[105][34]@4 $4 = {0x0, 0x0, 0x0, 0xc} So, we indeed seem to have two copies of the page 12:42! And one of them has the 0xdeadbeef checksum. This might count as wrong usage ( innodb_checksum_algorithm=none ), but I would not dismiss it yet.

            I can make the recovery succeed with the following gdb statements:

            Thread 1 "mysqld" hit Breakpoint 1, buf_dblwr_process ()
                at /mariadb/10.4/storage/innobase/buf/buf0dblwr.cc:533
            533		ulint		page_no_dblwr	= 0;
            (gdb) return
            Make buf_dblwr_process() return now? (y or n) y
            #0  0x0000555556154d81 in recv_recovery_from_checkpoint_start (
                flush_lsn=<optimized out>)
                at /mariadb/10.4/storage/innobase/log/log0recv.cc:3760
            warning: Source file is more recent than executable.
            3760			if (srv_operation == SRV_OPERATION_NORMAL) {
            (gdb) c
            Continuing.
            

            Before the doublewrite buffer recovery, the page contents (read_buf) matches with the following:

            od -Ax -t x1 -N 0x10000 -j 0x2a0000 test/t8.ibd
            

            which starts as follows:

            2a0000 de ad be ef 00 00 00 2a 00 00 00 2e 00 00 00 31
            2a0010 00 00 00 00 01 fb 02 92 92 19 00 00 00 01 a3 d3
            2a0020 28 b6 00 00 00 0c cf a5 00 02 bf dc b4 81 16 61
            

            This is also exactly identical to the contents of the doublewrite buffer. The following will only report some trailing garbage:

            od -An -t x1 -N 0x10000 -j 0x2a0000 test/t8.ibd |diff - <(od -An -t x1 -N 0x10000 -j 0x420000 ibdata1)
            

            --- -   2020-07-28 15:22:24.325525625 +0300
            +++ /dev/fd/63  2020-07-28 15:22:24.320068285 +0300
            @@ -3325,771 +3325,3 @@
              9a a1 2f 28 d9 a0 b5 e1 2d dc 81 b3 fd 0b 76 00
              00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
             *
            - 86 7c d0 13 bb 9c c8 32 3e e4 bd 8b 7a 25 a3 23
            - 99 35 dd 61 c7 4d ed fb a4 89 4b c8 92 cc d1 5f
            - a3 5c 79 ad 5c 0b 26 f0 b4 61 df 8c fb 84 dd 5b
            …
            

            It is notable that we are replacing a large number of pages with the contents of the doublewrite buffer:

            10.4

            2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=34] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=34] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=33] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=33] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=31] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=31] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=27] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=27] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=23] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=23] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=15] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=15] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=42] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=42] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=3] from the doublewrite buffer.
            2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=3] from the doublewrite buffer.
            [New Thread 0x7ffefffff700 (LWP 946496)]
            2020-07-28 15:04:44 0 [Note] InnoDB: 6 transaction(s) which must be rolled back or cleaned up in total 13534 row operations to undo
            

            The FIL_PAGE_LSN in t8.ibd page was 0x1fb0292, and the uncompressed page in the doublewrite buffer was much older, 0x108c8a1. That is even older than the checkpoint LSN was 0x16f5eed, which should disqualify the page!

            2020-07-28 15:10:29 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=24076013
            

            Doublewrite buffer recovery must not:

            • Attempt to replace valid page_compressed or encrypted pages.
            • Consider pages whose FIL_PAGE_LSN is older than the checkpoint LSN or newer than the last scanned redo log LSN.
            • Consider invalid pages.
            marko Marko Mäkelä added a comment - I can make the recovery succeed with the following gdb statements: Thread 1 "mysqld" hit Breakpoint 1, buf_dblwr_process () at /mariadb/10.4/storage/innobase/buf/buf0dblwr.cc:533 533 ulint page_no_dblwr = 0; (gdb) return Make buf_dblwr_process() return now? (y or n) y #0 0x0000555556154d81 in recv_recovery_from_checkpoint_start ( flush_lsn=<optimized out>) at /mariadb/10.4/storage/innobase/log/log0recv.cc:3760 warning: Source file is more recent than executable. 3760 if (srv_operation == SRV_OPERATION_NORMAL) { (gdb) c Continuing. Before the doublewrite buffer recovery, the page contents ( read_buf ) matches with the following: od -Ax -t x1 -N 0x10000 -j 0x2a0000 test/t8.ibd which starts as follows: 2a0000 de ad be ef 00 00 00 2a 00 00 00 2e 00 00 00 31 2a0010 00 00 00 00 01 fb 02 92 92 19 00 00 00 01 a3 d3 2a0020 28 b6 00 00 00 0c cf a5 00 02 bf dc b4 81 16 61 This is also exactly identical to the contents of the doublewrite buffer. The following will only report some trailing garbage: od -An -t x1 -N 0x10000 -j 0x2a0000 test/t8.ibd |diff - <(od -An -t x1 -N 0x10000 -j 0x420000 ibdata1) --- - 2020-07-28 15:22:24.325525625 +0300 +++ /dev/fd/63 2020-07-28 15:22:24.320068285 +0300 @@ -3325,771 +3325,3 @@ 9a a1 2f 28 d9 a0 b5 e1 2d dc 81 b3 fd 0b 76 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * - 86 7c d0 13 bb 9c c8 32 3e e4 bd 8b 7a 25 a3 23 - 99 35 dd 61 c7 4d ed fb a4 89 4b c8 92 cc d1 5f - a3 5c 79 ad 5c 0b 26 f0 b4 61 df 8c fb 84 dd 5b … It is notable that we are replacing a large number of pages with the contents of the doublewrite buffer: 10.4 2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=34] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=34] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=33] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=33] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=31] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=31] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=27] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=27] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=23] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=23] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=15] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=15] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=42] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=42] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Trying to recover page [page id: space=12, page number=3] from the doublewrite buffer. 2020-07-28 15:04:44 0 [Note] InnoDB: Recovered page [page id: space=12, page number=3] from the doublewrite buffer. [New Thread 0x7ffefffff700 (LWP 946496)] 2020-07-28 15:04:44 0 [Note] InnoDB: 6 transaction(s) which must be rolled back or cleaned up in total 13534 row operations to undo The FIL_PAGE_LSN in t8.ibd page was 0x1fb0292, and the uncompressed page in the doublewrite buffer was much older, 0x108c8a1. That is even older than the checkpoint LSN was 0x16f5eed, which should disqualify the page! 2020-07-28 15:10:29 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=24076013 Doublewrite buffer recovery must not : Attempt to replace valid page_compressed or encrypted pages. Consider pages whose FIL_PAGE_LSN is older than the checkpoint LSN or newer than the last scanned redo log LSN. Consider invalid pages.

            I am unable to attach the archive of the data directory, because at 54MiB it exceeds the 10MiB limit. (Encrypted data compresses badly.)

            marko Marko Mäkelä added a comment - I am unable to attach the archive of the data directory, because at 54MiB it exceeds the 10MiB limit. (Encrypted data compresses badly.)
            645286697@qq.com chen han added a comment -

            Dear Marko, I see that the stack crashing in process of writing from the doublewrite memory buffer to the disk, but the code change is in the crash recovery process.
            Why does the code change affect the process of flushing doublewrite memory buffer?

            645286697@qq.com chen han added a comment - Dear Marko, I see that the stack crashing in process of writing from the doublewrite memory buffer to the disk, but the code change is in the crash recovery process. Why does the code change affect the process of flushing doublewrite memory buffer?

            People

              thiru Thirunarayanan Balathandayuthapani
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.