Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30479

OPT_PAGE_CHECKSUM mismatch during recovery after innodb_undo_log_truncate=ON

Details

    Description

      origin/10.7 d1a4f6e627b3ad2c86946da08b8e82909707da7c 2023-01-17T11:41:29+02:00
      rr
      origin/bb-10.6-MDEV-30400 de4030e4d49805a7ded5c0bfee01cc3fd7623522 2023-01-24T14:09:21+02:00
      Scenario:
      1. Start the DB server and generate some initial data
      2. Several sessions run a DML mix
      3. During 2. is ongoing SIGKILL the DB server and try to restart
      The restart fails with
      # 2023-01-26T12:43:46 [195747] | [rr 260971 8174]2023-01-26 12:42:58 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=2, page number=577]
      # 2023-01-26T12:43:46 [195747] | [rr 260971 8178]2023-01-26 12:42:58 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
      # 2023-01-26T12:43:46 [195747] | [rr 260971 8269]2023-01-26 12:42:58 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=2, page number=577]; set innodb_force_recovery to ignore
      # 2023-01-26T12:43:46 [195747] | [rr 260971 8273]2023-01-26 12:42:58 0 [ERROR] InnoDB: Failed to read page 577 from file './/undo002': Page read from tablespace is corrupted.
      # 2023-01-26T12:43:46 [195747] | [rr 260971 17093]2023-01-26 12:43:00 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1496] with error Data structure corruption
      # 2023-01-26T12:43:46 [195747] | [rr 260971 17097]2023-01-26 12:43:00 0 [Note] InnoDB: Starting shutdown...
      # 2023-01-26T12:43:46 [195747] | [rr 260971 17665]2023-01-26 12:43:01 0 [ERROR] Plugin 'InnoDB' init function returned error.
       
      pluto:/data/results/1674736672/TBR-1646$ _RR_TRACE_DIR=./1/rr/ rr replay --mark-stdio
       
      RQG
      ===
      # git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
      #
      # GIT_SHOW: HEAD -> master, origin/master, origin/HEAD 36f1d5c64870abcdbe123ec98869f7045533d458 2023-01-23T16:20:15+01:00
      # rqg.pl  : Version 4.2.1 (2022-12)
      #
      # $RQG_HOME/rqg.pl \
      # --grammar=conf/engines/many_indexes.yy \
      # --gendata=conf/engines/many_indexes.zz \
      # --reporters=CrashRecovery \
      # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
      # --mysqld=--loose-idle_write_transaction_timeout=0 \
      # --mysqld=--loose-idle_transaction_timeout=0 \
      # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      # --mysqld=--connect_timeout=60 \
      # --mysqld=--interactive_timeout=28800 \
      # --mysqld=--slave_net_timeout=60 \
      # --mysqld=--net_read_timeout=30 \
      # --mysqld=--net_write_timeout=60 \
      # --mysqld=--loose-table_lock_wait_timeout=50 \
      # --mysqld=--wait_timeout=28800 \
      # --mysqld=--lock-wait-timeout=86400 \
      # --mysqld=--innodb-lock-wait-timeout=50 \
      # --no-mask \
      # --queries=10000000 \
      # --seed=random \
      # --reporters=Backtrace \
      # --reporters=ErrorLog \
      # --reporters=Deadlock \
      # --validators=None \
      # --mysqld=--log_output=none \
      # --mysqld=--log_bin_trust_function_creators=1 \
      # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      # --engine=InnoDB \
      # --restart_timeout=240 \
      # --mysqld=--plugin-load-add=file_key_management.so \
      # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      # --mysqld=--plugin-load-add=provider_lzo.so \
      # --mysqld=--plugin-load-add=provider_bzip2.so \
      # --mysqld=--plugin-load-add=provider_lzma.so \
      # --mysqld=--plugin-load-add=provider_snappy.so \
      # --mysqld=--plugin-load-add=provider_lz4.so \
      # --duration=300 \
      # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      # --mysqld=--innodb_file_per_table=1 \
      # --mysqld=--loose-innodb_read_only_compressed=OFF \
      # --mysqld=--innodb_stats_persistent=off \
      # --mysqld=--innodb_adaptive_hash_index=off \
      # --mysqld=--innodb_sort_buffer_size=65536 \
      # --redefine=conf/mariadb/redefine_checks_off.yy \
      # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
      # --mysqld=--loose-max-statement-time=30 \
      # --threads=2 \
      # --mysqld=--innodb-use-native-aio=0 \
      # --mysqld=--loose-gdb \
      # --mysqld=--loose-debug-gdb \
      # --rr=Extended \
      # --rr_options=--wait \
      # --mysqld=--innodb_undo_tablespaces=3 \
      # --mysqld=--innodb_undo_log_truncate=ON \
      # --mysqld=--loose_innodb_change_buffering=all \
      # --mysqld=--innodb_rollback_on_timeout=OFF \
      # --mysqld=--innodb_page_size=4K \
      # --mysqld=--innodb-buffer-pool-size=256M \
      # <local settings>
      I had 3 replays of the problem per ~ 280 runs of the test.
      

      Attachments

        1. data.tar.xz
          6.65 MB
        2. TBR-1646.cc
          15 kB

        Issue Links

          Activity

            The page in question was written here:

            ssh pluto
            rr replay /data/results/1674736672/TBR-1646/1/rr/mysqld-1
            

            run 1928000
            watch -l log_sys.lsn._M_i
            reverse-continue
            backtrace
            

            …
            Thread 1 hit Hardware watchpoint 1: -location log_sys.lsn._M_i
             
            Old value = 93515647
            New value = 93515562
            0x0000563239a5a349 in std::__atomic_base<unsigned long>::store (
                __m=std::memory_order_release, __i=93515647, this=<optimized out>)
                at /usr/include/c++/9/bits/atomic_base.h:390
            390	      store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept
            …
            #5  0x0000563239a4cb77 in mtr_t::commit (this=this@entry=0x7fbb382d6cb0)
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/mtr/mtr0mtr.cc:148
            #6  0x0000563239d04be4 in trx_purge_free_segment (
                rseg=rseg@entry=0x56323b7c12c0 <trx_sys+28544>, hdr_addr=...)
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:469
            #7  0x0000563239d062a7 in trx_purge_truncate_rseg_history (rseg=..., limit=...)
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:549
            #8  0x0000563239d10371 in trx_purge_truncate_history ()
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:643
            #9  0x0000563239d16812 in trx_purge (n_tasks=<optimized out>, 
            …
            

            At this point of execution, we can access the block here:

            frame 4
            set $block=(buf_block_t*)m_memo.small[5].object
            

            Between this and the recovery, the page contents differs quite a bit:

            --- /dev/fd/63	2023-01-27 10:22:59.045904382 +0200
            +++ /dev/fd/62	2023-01-27 10:22:59.025903573 +0200
            @@ -1,6 +1,6 @@
             000000 00 00 00 00 00 00 02 41 ff ff ff ff ff ff ff ff
            -000010 00 00 00 00 05 92 ef 2a 00 03 00 00 00 00 00 00
            -000020 00 00 00 00 00 02 ff ff ff ff 00 00 00 00 01 8e
            +000010 00 00 00 00 03 82 bf 89 00 03 00 00 00 00 00 00
            +000020 00 00 00 00 00 02 00 00 02 16 00 26 00 00 01 af
             000030 00 26 00 00 00 00 00 00 00 d6 00 00 00 00 00 00
             000040 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
             *
            @@ -11,7 +11,14 @@
             000270 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
             000280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
             *
            -0006f0 00 00 00 00 00 00 00 00 00 d7 00 00 00 00 00 00
            +0004b0 00 00 00 00 00 00 00 00 00 a4 00 00 00 00 00 00
            +0004c0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
            +*
            +0004e0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
            +0004f0 69 d2 00 00 02 47 ff ff ff ff ff ff ff ff ff ff
            +000500 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
            +*
            +0006f0 ff ff 00 00 00 00 00 00 00 d7 00 00 00 00 00 00
             000700 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
             *
             000720 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
            @@ -21,5 +28,12 @@
             000930 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
             000940 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
             *
            -000ff0 00 00 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b
            +000db0 00 00 00 00 00 00 00 00 00 d8 00 00 00 00 00 00
            +000dc0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
            +*
            +000de0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
            +000df0 69 d2 00 00 00 f9 ff ff ff ff ff ff ff ff ff ff
            +000e00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
            +*
            +000ff0 ff ff 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b
             001000
            

            The mini-transaction generated the following 85 bytes of log:

            (rr) p/x *m_log.m_first_block.m_data@m_log.m_size
            $6 = {0x39, 0x2, 0x25, 0x32, 0xff, 0xff, 0xff, 0xff, 0x0, 0x0, 0xb7, 0x0, 
              0xff, 0xff, 0xff, 0xff, 0x0, 0x0, 0x34, 0x2, 0x25, 0x31, 0x0, 0x34, 0x2, 
              0x25, 0x2d, 0x0, 0x47, 0x2, 0x81, 0xc1, 0x82, 0x32, 0x4, 0xff, 0x34, 0x2, 
              0x0, 0x3d, 0x5b, 0x2, 0x2, 0x37, 0x35, 0x2, 0x0, 0x80, 0x3b, 0xea, 0x48, 
              0x2, 0x81, 0xc1, 0x81, 0xf2, 0x81, 0xc0, 0x0, 0x77, 0x2, 0x25, 0x0, 0x67, 
              0x57, 0x32, 0xe1, 0x78, 0x2, 0x81, 0xc1, 0x0, 0x19, 0xee, 0x8b, 0x3c, 0x77, 
              0x2, 0x0, 0x0, 0x67, 0x32, 0x82, 0x1f, 0x0}
            

            Let us filter out the records for this page 2:0x241:
            0x47, 0x2, 0x81, 0xc1, 0x82, 0x32, 0x4, 0xff, MEMSET, writing 4×0xff bytes at offset 690
            0x48, 0x2, 0x81, 0xc1, 0x81, 0xf2, 0x81, 0xc0, 0x0, MEMSET, writing 576×0x0 bytes at offset 626
            0x78, 0x2, 0x81, 0xc1, 0x0, 0x19, 0xee, 0x8b, 0x3c, PAGE_CHECKSUM 0x19ee8b3c (this trips the recovery failure)
            At the start of the mini-transaction (and before applying any redo log to it on recovery), the page contents differs quite a bit as well:

            --- /dev/fd/63	2023-01-27 10:48:35.250176986 +0200
            +++ /dev/fd/62	2023-01-27 10:48:35.250176986 +0200
            @@ -1,6 +1,6 @@
             000000 00 00 00 00 00 00 02 41 ff ff ff ff ff ff ff ff
            -000010 00 00 00 00 05 92 ee fa 00 03 00 00 00 00 00 00
            -000020 00 00 00 00 00 02 ff ff ff ff 00 00 00 00 01 8e
            +000010 00 00 00 00 03 82 bf 89 00 03 00 00 00 00 00 00
            +000020 00 00 00 00 00 02 00 00 02 16 00 26 00 00 01 af
             000030 00 26 00 00 00 00 00 00 00 d6 00 00 00 00 00 00
             000040 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
             *
            @@ -8,17 +8,17 @@
             000070 69 d2 00 00 00 f7 ff ff ff ff ff ff ff ff ff ff
             000080 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
             *
            -000270 ff ff 00 00 00 00 00 00 00 e7 00 00 00 00 00 00
            -000280 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
            +000270 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            +000280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
             *
            -0002a0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
            -0002b0 69 d2 00 00 00 37 00 00 00 38 ff ff ff ff ff ff
            -0002c0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
            +0004b0 00 00 00 00 00 00 00 00 00 a4 00 00 00 00 00 00
            +0004c0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
             *
            -0004b0 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            -0004c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            +0004e0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
            +0004f0 69 d2 00 00 02 47 ff ff ff ff ff ff ff ff ff ff
            +000500 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
             *
            -0006f0 00 00 00 00 00 00 00 00 00 d7 00 00 00 00 00 00
            +0006f0 ff ff 00 00 00 00 00 00 00 d7 00 00 00 00 00 00
             000700 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
             *
             000720 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
            @@ -28,5 +28,12 @@
             000930 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
             000940 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
             *
            -000ff0 00 00 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b
            +000db0 00 00 00 00 00 00 00 00 00 d8 00 00 00 00 00 00
            +000dc0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
            +*
            +000de0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
            +000df0 69 d2 00 00 00 f9 ff ff ff ff ff ff ff ff ff ff
            +000e00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
            +*
            +000ff0 ff ff 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b
             001000
            

            So, we will have to track earlier changes to the page, to find where some change was not logged.

            marko Marko Mäkelä added a comment - The page in question was written here: ssh pluto rr replay /data/results/1674736672/TBR-1646/1/rr/mysqld-1 run 1928000 watch -l log_sys.lsn._M_i reverse-continue backtrace … Thread 1 hit Hardware watchpoint 1: -location log_sys.lsn._M_i   Old value = 93515647 New value = 93515562 0x0000563239a5a349 in std::__atomic_base<unsigned long>::store ( __m=std::memory_order_release, __i=93515647, this=<optimized out>) at /usr/include/c++/9/bits/atomic_base.h:390 390 store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept … #5 0x0000563239a4cb77 in mtr_t::commit (this=this@entry=0x7fbb382d6cb0) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/mtr/mtr0mtr.cc:148 #6 0x0000563239d04be4 in trx_purge_free_segment ( rseg=rseg@entry=0x56323b7c12c0 <trx_sys+28544>, hdr_addr=...) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:469 #7 0x0000563239d062a7 in trx_purge_truncate_rseg_history (rseg=..., limit=...) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:549 #8 0x0000563239d10371 in trx_purge_truncate_history () at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:643 #9 0x0000563239d16812 in trx_purge (n_tasks=<optimized out>, … At this point of execution, we can access the block here: frame 4 set $block=(buf_block_t*)m_memo.small[5].object Between this and the recovery, the page contents differs quite a bit: --- /dev/fd/63 2023-01-27 10:22:59.045904382 +0200 +++ /dev/fd/62 2023-01-27 10:22:59.025903573 +0200 @@ -1,6 +1,6 @@ 000000 00 00 00 00 00 00 02 41 ff ff ff ff ff ff ff ff -000010 00 00 00 00 05 92 ef 2a 00 03 00 00 00 00 00 00 -000020 00 00 00 00 00 02 ff ff ff ff 00 00 00 00 01 8e +000010 00 00 00 00 03 82 bf 89 00 03 00 00 00 00 00 00 +000020 00 00 00 00 00 02 00 00 02 16 00 26 00 00 01 af 000030 00 26 00 00 00 00 00 00 00 d6 00 00 00 00 00 00 000040 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00 * @@ -11,7 +11,14 @@ 000270 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * -0006f0 00 00 00 00 00 00 00 00 00 d7 00 00 00 00 00 00 +0004b0 00 00 00 00 00 00 00 00 00 a4 00 00 00 00 00 00 +0004c0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00 +* +0004e0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6 +0004f0 69 d2 00 00 02 47 ff ff ff ff ff ff ff ff ff ff +000500 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff +* +0006f0 ff ff 00 00 00 00 00 00 00 d7 00 00 00 00 00 00 000700 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00 * 000720 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6 @@ -21,5 +28,12 @@ 000930 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000940 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * -000ff0 00 00 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b +000db0 00 00 00 00 00 00 00 00 00 d8 00 00 00 00 00 00 +000dc0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00 +* +000de0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6 +000df0 69 d2 00 00 00 f9 ff ff ff ff ff ff ff ff ff ff +000e00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff +* +000ff0 ff ff 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b 001000 The mini-transaction generated the following 85 bytes of log: (rr) p/x *m_log.m_first_block.m_data@m_log.m_size $6 = {0x39, 0x2, 0x25, 0x32, 0xff, 0xff, 0xff, 0xff, 0x0, 0x0, 0xb7, 0x0, 0xff, 0xff, 0xff, 0xff, 0x0, 0x0, 0x34, 0x2, 0x25, 0x31, 0x0, 0x34, 0x2, 0x25, 0x2d, 0x0, 0x47, 0x2, 0x81, 0xc1, 0x82, 0x32, 0x4, 0xff, 0x34, 0x2, 0x0, 0x3d, 0x5b, 0x2, 0x2, 0x37, 0x35, 0x2, 0x0, 0x80, 0x3b, 0xea, 0x48, 0x2, 0x81, 0xc1, 0x81, 0xf2, 0x81, 0xc0, 0x0, 0x77, 0x2, 0x25, 0x0, 0x67, 0x57, 0x32, 0xe1, 0x78, 0x2, 0x81, 0xc1, 0x0, 0x19, 0xee, 0x8b, 0x3c, 0x77, 0x2, 0x0, 0x0, 0x67, 0x32, 0x82, 0x1f, 0x0} Let us filter out the records for this page 2:0x241: 0x47, 0x2, 0x81, 0xc1, 0x82, 0x32, 0x4, 0xff, MEMSET, writing 4× 0xff bytes at offset 690 0x48, 0x2, 0x81, 0xc1, 0x81, 0xf2, 0x81, 0xc0, 0x0, MEMSET, writing 576× 0x0 bytes at offset 626 0x78, 0x2, 0x81, 0xc1, 0x0, 0x19, 0xee, 0x8b, 0x3c, PAGE_CHECKSUM 0x19ee8b3c (this trips the recovery failure) At the start of the mini-transaction (and before applying any redo log to it on recovery), the page contents differs quite a bit as well: --- /dev/fd/63 2023-01-27 10:48:35.250176986 +0200 +++ /dev/fd/62 2023-01-27 10:48:35.250176986 +0200 @@ -1,6 +1,6 @@ 000000 00 00 00 00 00 00 02 41 ff ff ff ff ff ff ff ff -000010 00 00 00 00 05 92 ee fa 00 03 00 00 00 00 00 00 -000020 00 00 00 00 00 02 ff ff ff ff 00 00 00 00 01 8e +000010 00 00 00 00 03 82 bf 89 00 03 00 00 00 00 00 00 +000020 00 00 00 00 00 02 00 00 02 16 00 26 00 00 01 af 000030 00 26 00 00 00 00 00 00 00 d6 00 00 00 00 00 00 000040 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00 * @@ -8,17 +8,17 @@ 000070 69 d2 00 00 00 f7 ff ff ff ff ff ff ff ff ff ff 000080 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * -000270 ff ff 00 00 00 00 00 00 00 e7 00 00 00 00 00 00 -000280 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00 +000270 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +000280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * -0002a0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6 -0002b0 69 d2 00 00 00 37 00 00 00 38 ff ff ff ff ff ff -0002c0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff +0004b0 00 00 00 00 00 00 00 00 00 a4 00 00 00 00 00 00 +0004c0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00 * -0004b0 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 -0004c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +0004e0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6 +0004f0 69 d2 00 00 02 47 ff ff ff ff ff ff ff ff ff ff +000500 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * -0006f0 00 00 00 00 00 00 00 00 00 d7 00 00 00 00 00 00 +0006f0 ff ff 00 00 00 00 00 00 00 d7 00 00 00 00 00 00 000700 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00 * 000720 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6 @@ -28,5 +28,12 @@ 000930 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000940 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * -000ff0 00 00 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b +000db0 00 00 00 00 00 00 00 00 00 d8 00 00 00 00 00 00 +000dc0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00 +* +000de0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6 +000df0 69 d2 00 00 00 f9 ff ff ff ff ff ff ff ff ff ff +000e00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff +* +000ff0 ff ff 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b 001000 So, we will have to track earlier changes to the page, to find where some change was not logged.

            The FIL_PAGE_LSN of the page that was read in recovery was 0x382bf89 (58900361). That change was written to the buffer pool here:

            (rr) when
            Current event: 1276639
            (rr) bt
            #0  buf_flush_note_modification (end=58900361, start=58900260, 
                b=0x7fbb4551cb40)
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/include/buf0flu.h:118
            #1  mtr_t::commit (this=this@entry=0x7fbb396e0cb0)
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/mtr/mtr0mtr.cc:196
            #2  0x0000563239d04be4 in trx_purge_free_segment (
                rseg=rseg@entry=0x56323b7c0cc0 <trx_sys+27008>, hdr_addr=...)
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:469
            #3  0x0000563239d062a7 in trx_purge_truncate_rseg_history (rseg=..., limit=...)
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:549
            #4  0x0000563239d10371 in trx_purge_truncate_history ()
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:643
            #5  0x0000563239d16812 in trx_purge (n_tasks=<optimized out>, 
            

            A little later, this same thread would invoke mtr_t::commit_shrink() on a different tablespace (3, undo003). The last write of our problematic page happened after that:

            Thread 5 hit Hardware watchpoint 8: -location $block.page.oldest_modification_
             
            Old value = 
                  {m = {<std::__atomic_base<unsigned long>> = {_M_i = 22754130}, <No data fields>}}
            New value = 
                  {m = {<std::__atomic_base<unsigned long>> = {_M_i = 1}, <No data fields>}}
            0x0000563239ee7eca in Atomic_relaxed<unsigned long>::store (
                o=std::memory_order_release, i=1, this=0x7fbb4551cb50)
                at /usr/include/c++/9/bits/atomic_base.h:390
            390	      store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept
            1: log_sys.lsn._M_i = 71354177
            (rr) when
            Current event: 1543240
            

            Probably soon after this, the checkpoint had been advanced to what the recovery reported:

            2023-01-26 12:42:54 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=71646474,71825142
            

            Some next interesting events for the page occur at:

            LSN when remarks
            75680224 1677754 update of FIL_PAGE_LSN
            75700082 1677868 update of FIL_PAGE_LSN
            89776724 1890477 mtr_t::page_checksum() calculates checksum 0xd732b6f5
            89777019 1890477 update of FIL_PAGE_LSN
            91986805 1903303 mtr_t::page_checksum() calculates checksum 0x2beb6bbd
            91986862 1903303 update of FIL_PAGE_LSN
            91986862 1903308 mtr_t::page_checksum() calculates checksum 0xeb825f4
            91986912 1903315 update of FIL_PAGE_LSN
            …
            93515647 1928020 trx_purge_truncate_history() invokes buf_page_t::reset_oldest_modification(), discarding changes to this page!
            93515647 1928220 mtr_t::commit_shrink() is invoked on this tablespace
            116667020 2494112 the server process receives SIGKILL

            The results of mtr_t::page_checksum() calls will be appended to the end of the log at mtr_t::commit(), right before the FIL_PAGE_LSN is updated, so they should be consistent.

            At LSN=93491365 and when 1926356 we also got a call to mtr_t::commit_shrink() to shrink undo001, a.k.a. tablespace 1.

            Something appears to be wrong on the recovery side. Recovery somehow misses the fact that the tablespace was shrunk, and tries to apply a log record that had been written at LSN=93515562, which was before LSN=93515647, which is when shrinking the tablespace was in progress. The recv_sys.recovered_lsn reached 116666679, which only misses 1 or 2 last 512-byte log blocks that would have been written after the SIGKILL. There was a message about completed truncation of undo002 long before that.

            marko Marko Mäkelä added a comment - The FIL_PAGE_LSN of the page that was read in recovery was 0x382bf89 (58900361). That change was written to the buffer pool here: (rr) when Current event: 1276639 (rr) bt #0 buf_flush_note_modification (end=58900361, start=58900260, b=0x7fbb4551cb40) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/include/buf0flu.h:118 #1 mtr_t::commit (this=this@entry=0x7fbb396e0cb0) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/mtr/mtr0mtr.cc:196 #2 0x0000563239d04be4 in trx_purge_free_segment ( rseg=rseg@entry=0x56323b7c0cc0 <trx_sys+27008>, hdr_addr=...) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:469 #3 0x0000563239d062a7 in trx_purge_truncate_rseg_history (rseg=..., limit=...) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:549 #4 0x0000563239d10371 in trx_purge_truncate_history () at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:643 #5 0x0000563239d16812 in trx_purge (n_tasks=<optimized out>, A little later, this same thread would invoke mtr_t::commit_shrink() on a different tablespace (3, undo003 ). The last write of our problematic page happened after that: Thread 5 hit Hardware watchpoint 8: -location $block.page.oldest_modification_   Old value = {m = {<std::__atomic_base<unsigned long>> = {_M_i = 22754130}, <No data fields>}} New value = {m = {<std::__atomic_base<unsigned long>> = {_M_i = 1}, <No data fields>}} 0x0000563239ee7eca in Atomic_relaxed<unsigned long>::store ( o=std::memory_order_release, i=1, this=0x7fbb4551cb50) at /usr/include/c++/9/bits/atomic_base.h:390 390 store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept 1: log_sys.lsn._M_i = 71354177 (rr) when Current event: 1543240 Probably soon after this, the checkpoint had been advanced to what the recovery reported: 2023-01-26 12:42:54 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=71646474,71825142 Some next interesting events for the page occur at: LSN when remarks 75680224 1677754 update of FIL_PAGE_LSN 75700082 1677868 update of FIL_PAGE_LSN 89776724 1890477 mtr_t::page_checksum() calculates checksum 0xd732b6f5 89777019 1890477 update of FIL_PAGE_LSN 91986805 1903303 mtr_t::page_checksum() calculates checksum 0x2beb6bbd 91986862 1903303 update of FIL_PAGE_LSN 91986862 1903308 mtr_t::page_checksum() calculates checksum 0xeb825f4 91986912 1903315 update of FIL_PAGE_LSN … 93515647 1928020 trx_purge_truncate_history() invokes buf_page_t::reset_oldest_modification() , discarding changes to this page! 93515647 1928220 mtr_t::commit_shrink() is invoked on this tablespace 116667020 2494112 the server process receives SIGKILL The results of mtr_t::page_checksum() calls will be appended to the end of the log at mtr_t::commit() , right before the FIL_PAGE_LSN is updated, so they should be consistent. At LSN=93491365 and when 1926356 we also got a call to mtr_t::commit_shrink() to shrink undo001 , a.k.a. tablespace 1. Something appears to be wrong on the recovery side. Recovery somehow misses the fact that the tablespace was shrunk, and tries to apply a log record that had been written at LSN=93515562, which was before LSN=93515647, which is when shrinking the tablespace was in progress. The recv_sys.recovered_lsn reached 116666679, which only misses 1 or 2 last 512-byte log blocks that would have been written after the SIGKILL. There was a message about completed truncation of undo002 long before that.

            On recovery, we have the following:

            (rr) p recv_sys.truncated_undo_spaces
            $13 = {{lsn = 93491365, pages = 640}, {lsn = 93515647, pages = 640}, {lsn = 0, pages = 0} <repeats 125 times>}
            

            So, we are missing a check for the LSN somewhere. The page read was initiated here:

            #5  0x000055580c887b3f in buf_page_init_for_read (mode=mode@entry=132, 
                page_id=..., zip_size=zip_size@entry=0, unzip=unzip@entry=true)
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/buf/buf0rea.cc:121
            #6  0x000055580c88cf67 in buf_read_page_low (err=err@entry=0x7ffce40ae060, 
                space=space@entry=0x612000023440, sync=sync@entry=false, 
                mode=mode@entry=132, page_id=..., zip_size=zip_size@entry=0, 
                unzip=<optimized out>)
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/buf/buf0rea.cc:300
            #7  0x000055580c894c95 in buf_read_recv_pages (space_id=<optimized out>, 
                page_nos=page_nos@entry=0x7ffce40ae720, n=n@entry=1)
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/buf/buf0rea.cc:711
            #8  0x000055580c352be5 in recv_read_in_area (page_id=..., i=
                  {first = {m_id = 12884901888}, second = {state = page_recv_t::RECV_NOT_PROCESSED, last_offset = 1, log = {head = 0x7ff54a6d92c8, tail = 0x7ff54dd7e600}}}) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/include/buf0types.h:131
            #9  0x000055580c36ea76 in recv_sys_t::apply (this=0x55580e0966a0 <recv_sys>, 
                last_batch=last_batch@entry=true)
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/log/log0recv.cc:3344
            

            Already at this point, the latest data about undo tablespace truncation is available, and hence we should not have initiated the page read. To be on the safe side, we might both want to avoid initiating the page read, and check in recv_recover_page() if an unnecessary read took place.

            marko Marko Mäkelä added a comment - On recovery, we have the following: (rr) p recv_sys.truncated_undo_spaces $13 = {{lsn = 93491365, pages = 640}, {lsn = 93515647, pages = 640}, {lsn = 0, pages = 0} <repeats 125 times>} So, we are missing a check for the LSN somewhere. The page read was initiated here: #5 0x000055580c887b3f in buf_page_init_for_read (mode=mode@entry=132, page_id=..., zip_size=zip_size@entry=0, unzip=unzip@entry=true) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/buf/buf0rea.cc:121 #6 0x000055580c88cf67 in buf_read_page_low (err=err@entry=0x7ffce40ae060, space=space@entry=0x612000023440, sync=sync@entry=false, mode=mode@entry=132, page_id=..., zip_size=zip_size@entry=0, unzip=<optimized out>) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/buf/buf0rea.cc:300 #7 0x000055580c894c95 in buf_read_recv_pages (space_id=<optimized out>, page_nos=page_nos@entry=0x7ffce40ae720, n=n@entry=1) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/buf/buf0rea.cc:711 #8 0x000055580c352be5 in recv_read_in_area (page_id=..., i= {first = {m_id = 12884901888}, second = {state = page_recv_t::RECV_NOT_PROCESSED, last_offset = 1, log = {head = 0x7ff54a6d92c8, tail = 0x7ff54dd7e600}}}) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/include/buf0types.h:131 #9 0x000055580c36ea76 in recv_sys_t::apply (this=0x55580e0966a0 <recv_sys>, last_batch=last_batch@entry=true) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/log/log0recv.cc:3344 Already at this point, the latest data about undo tablespace truncation is available, and hence we should not have initiated the page read. To be on the safe side, we might both want to avoid initiating the page read, and check in recv_recover_page() if an unnecessary read took place.

            At the time the page was added to the read queue, we did not know about the truncation yet:

            Thread 1 hit Breakpoint 9, recv_sys_t::parse (this=this@entry=0x55580e0966a0 <recv_sys>, checkpoint_lsn=<optimized out>, store=store@entry=0x7ffce40ae0f0, apply=apply@entry=true)
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/log/log0recv.cc:2559
            2559	            cached_pages_it= pages.emplace(id, page_recv_t()).first;
            3: x/i $pc
            => 0x55580c37e457 <recv_sys_t::parse(unsigned long, store_t*, bool)+24577>:	lea    -0x1840(%r12),%rax
            4: /x id.m_id = 0x200000241
            (rr) p recv_sys.truncated_undo_spaces 
            $18 = {{lsn = 0, pages = 0} <repeats 127 times>}
            (rr) when
            Current event: 4941
            

            That became known later:

            #0  0x000055580c37c927 in recv_sys_t::parse (this=this@entry=0x55580e0966a0 <recv_sys>, checkpoint_lsn=<optimized out>, store=store@entry=0x7ffce40ae0f0, apply=apply@entry=true)
                at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/log/log0recv.cc:2427
            #1  0x000055580c381d92 in recv_scan_log_recs (store=store@entry=0x7ffce40ae0f0, log_block=<optimized out>, checkpoint_lsn=<optimized out>, checkpoint_lsn@entry=71646474, start_lsn=start_lsn@entry=93518848, 
                end_lsn=<optimized out>, contiguous_lsn=contiguous_lsn@entry=0x7ffce40ae220, group_scanned_lsn=<optimized out>) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/log/log0recv.cc:3696
            

            I think that at this point, we must remove the garbage entries from recv_sys.pages.

            marko Marko Mäkelä added a comment - At the time the page was added to the read queue, we did not know about the truncation yet: Thread 1 hit Breakpoint 9, recv_sys_t::parse (this=this@entry=0x55580e0966a0 <recv_sys>, checkpoint_lsn=<optimized out>, store=store@entry=0x7ffce40ae0f0, apply=apply@entry=true) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/log/log0recv.cc:2559 2559 cached_pages_it= pages.emplace(id, page_recv_t()).first; 3: x/i $pc => 0x55580c37e457 <recv_sys_t::parse(unsigned long, store_t*, bool)+24577>: lea -0x1840(%r12),%rax 4: /x id.m_id = 0x200000241 (rr) p recv_sys.truncated_undo_spaces $18 = {{lsn = 0, pages = 0} <repeats 127 times>} (rr) when Current event: 4941 That became known later: #0 0x000055580c37c927 in recv_sys_t::parse (this=this@entry=0x55580e0966a0 <recv_sys>, checkpoint_lsn=<optimized out>, store=store@entry=0x7ffce40ae0f0, apply=apply@entry=true) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/log/log0recv.cc:2427 #1 0x000055580c381d92 in recv_scan_log_recs (store=store@entry=0x7ffce40ae0f0, log_block=<optimized out>, checkpoint_lsn=<optimized out>, checkpoint_lsn@entry=71646474, start_lsn=start_lsn@entry=93518848, end_lsn=<optimized out>, contiguous_lsn=contiguous_lsn@entry=0x7ffce40ae220, group_scanned_lsn=<optimized out>) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/log/log0recv.cc:3696 I think that at this point, we must remove the garbage entries from recv_sys.pages .

            data.tar.xz is a copy of the data directory that failed to recover. This turns out to be due to an off-by-one error:

            diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
            index 2becd00cec6..30181728035 100644
            --- a/storage/innobase/log/log0recv.cc
            +++ b/storage/innobase/log/log0recv.cc
            @@ -2086,7 +2086,7 @@ inline bool page_recv_t::trim(lsn_t start_lsn)
             {
               while (log.head)
               {
            -    if (log.head->lsn >= start_lsn) return false;
            +    if (log.head->lsn > start_lsn) return false;
                 last_offset= 1; /* the next record must not be same_page */
                 log_rec_t *next= log.head->next;
                 recv_sys.free(log.head);
            

            For this data directory, a mini-transaction that happened to end exactly at the LSN where the undo tablespace truncation starts should have been removed, and the page should not have been read at all:

            Thread 1 hit Hardware access (read/write) watchpoint 9: *$33
             
            Value = 93515647
            page_recv_t::trim (this=0x55de97c8dd38, start_lsn=93515647) at /mariadb/10.5/storage/innobase/log/log0recv.cc:2089
            2089	    if (log.head->lsn >= start_lsn) return false;
            (rr) p *this
            $34 = {state = page_recv_t::RECV_NOT_PROCESSED, last_offset = 1, log = {head = 0x7fd79c6dfec8, tail = 0x7fd79c6dff18}}
            (rr) p *(log_phys_t*)log.head
            $35 = {<log_rec_t> = {next = 0x7fd79c6dfef0, lsn = 93515647}, start_lsn = 93515562}
            

            With this simple fix (applied to 10.6 or 10.7, which implements MDEV-18976), the data directory will recover fine:

            2023-01-27 15:44:26 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=71646474,71825142
            2023-01-27 15:44:27 0 [Note] InnoDB: Opened 3 undo tablespaces
            2023-01-27 15:44:27 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces
            2023-01-27 15:44:27 0 [Note] InnoDB: Starting final batch to recover 4916 pages from redo log.
            2023-01-27 15:44:28 0 [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
            2023-01-27 15:44:28 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
            2023-01-27 15:44:28 0 [Note] InnoDB: Creating shared tablespace for temporary tables
            2023-01-27 15:44:28 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
            2023-01-27 15:44:28 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
            2023-01-27 15:44:28 0 [Note] InnoDB: 10.6.12 started; log sequence number 116666679; transaction id 6439
            

            marko Marko Mäkelä added a comment - data.tar.xz is a copy of the data directory that failed to recover. This turns out to be due to an off-by-one error: diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 2becd00cec6..30181728035 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -2086,7 +2086,7 @@ inline bool page_recv_t::trim(lsn_t start_lsn) { while (log.head) { - if (log.head->lsn >= start_lsn) return false; + if (log.head->lsn > start_lsn) return false; last_offset= 1; /* the next record must not be same_page */ log_rec_t *next= log.head->next; recv_sys.free(log.head); For this data directory, a mini-transaction that happened to end exactly at the LSN where the undo tablespace truncation starts should have been removed, and the page should not have been read at all: Thread 1 hit Hardware access (read/write) watchpoint 9: *$33   Value = 93515647 page_recv_t::trim (this=0x55de97c8dd38, start_lsn=93515647) at /mariadb/10.5/storage/innobase/log/log0recv.cc:2089 2089 if (log.head->lsn >= start_lsn) return false; (rr) p *this $34 = {state = page_recv_t::RECV_NOT_PROCESSED, last_offset = 1, log = {head = 0x7fd79c6dfec8, tail = 0x7fd79c6dff18}} (rr) p *(log_phys_t*)log.head $35 = {<log_rec_t> = {next = 0x7fd79c6dfef0, lsn = 93515647}, start_lsn = 93515562} With this simple fix (applied to 10.6 or 10.7, which implements MDEV-18976 ), the data directory will recover fine: 2023-01-27 15:44:26 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=71646474,71825142 2023-01-27 15:44:27 0 [Note] InnoDB: Opened 3 undo tablespaces 2023-01-27 15:44:27 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces 2023-01-27 15:44:27 0 [Note] InnoDB: Starting final batch to recover 4916 pages from redo log. 2023-01-27 15:44:28 0 [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active. 2023-01-27 15:44:28 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1" 2023-01-27 15:44:28 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2023-01-27 15:44:28 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2023-01-27 15:44:28 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2023-01-27 15:44:28 0 [Note] InnoDB: 10.6.12 started; log sequence number 116666679; transaction id 6439

            The source tree
            origin/bb-10.6-MDEV-30479 09435a493eab0e4a7c12e05284c61aa0d12f8128 2023-02-01T14:12:08+02:00
            performed well in RQG testing.

            mleich Matthias Leich added a comment - The source tree origin/bb-10.6- MDEV-30479 09435a493eab0e4a7c12e05284c61aa0d12f8128 2023-02-01T14:12:08+02:00 performed well in RQG testing.

            Unfortunately, this one-character fix caused the bug MDEV-31487. I will reanalyze the trace and try to come up with a better fix.

            marko Marko Mäkelä added a comment - Unfortunately, this one-character fix caused the bug MDEV-31487 . I will reanalyze the trace and try to come up with a better fix.

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.