Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28870

InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT during crash recovery

Details

    Description

      origin/10.6 5bb90cb2ac7b93f40c8ca4672ec7e2c18f0ea238 2022-06-16T10:01:29+03:00
      Scenario:
      1. Start the server and load some initial data.
      2. Several sessions run a DDL/DML mix
      3. During 2. is ongoing the server gets intentional killed (SIGSEGV) 
      4. The restart attempt fails with
      [rr 1379888 3340]2022-06-16 13:05:09 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=80040121,81223111
      [rr 1379888 5048]2022-06-16 13:05:16 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 88
      [rr 1379888 5052]2022-06-16 13:05:16 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1447] with error Data structure corruption
       
      pluto:/data/results/1655382284/TBR-1547
      _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-1
           # Fate of the server till the SIGKILL arrives.
      _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysql-2
          # The restart attempt
      pluto:/data/results/1655382284/TBR-1547/1/fbackup
          # File backup of the server datadirectory made after the SIGKILL
          # and before the restart attempt.
       
      RQG
      ===
      # git clone https://github.com/mleich1/rqg --branch experimental RQG
      #
      # GIT_SHOW: HEAD -> experimental, origin/experimental a32979c6d789ddf0982e6a56af687565be00590c 2022-06-14T17:21:56+02:00
      # rqg.pl  : Version 4.0.6 (2022-05)
      #
      # $RQG_HOME/rqg.pl \
      # --grammar=conf/mariadb/innodb_compression_encryption.yy \
      # --gendata=conf/mariadb/innodb_compression_encryption.zz \
      # --max_gd_duration=1800 \
      # --mysqld=--innodb-encrypt-log \
      # --mysqld=--innodb-encrypt-tables \
      # --reporters=CrashRecovery1 \
      # --redefine=conf/mariadb/redefine_innodb_undo.yy \
      # --mysqld=--innodb-immediate-scrub-data-uncompressed=1 \
      # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
      # --mysqld=--loose-idle_write_transaction_timeout=0 \
      # --mysqld=--loose-idle_transaction_timeout=0 \
      # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      # --mysqld=--connect_timeout=60 \
      # --mysqld=--interactive_timeout=28800 \
      # --mysqld=--slave_net_timeout=60 \
      # --mysqld=--net_read_timeout=30 \
      # --mysqld=--net_write_timeout=60 \
      # --mysqld=--loose-table_lock_wait_timeout=50 \
      # --mysqld=--wait_timeout=28800 \
      # --mysqld=--lock-wait-timeout=86400 \
      # --mysqld=--innodb-lock-wait-timeout=50 \
      # --no-mask \
      # --queries=10000000 \
      # --seed=random \
      # --reporters=Backtrace \
      # --reporters=ErrorLog \
      # --reporters=Deadlock1 \
      # --validators=None \
      # --mysqld=--log_output=none \
      # --mysqld=--log_bin_trust_function_creators=1 \
      # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      # --engine=InnoDB \
      # --restart_timeout=240 \
      # --mysqld=--plugin-load-add=file_key_management.so \
      # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      # --mysqld=--plugin-load-add=provider_lzo.so \
      # --mysqld=--plugin-load-add=provider_bzip2.so \
      # --mysqld=--plugin-load-add=provider_lzma.so \
      # --mysqld=--plugin-load-add=provider_snappy.so \
      # --mysqld=--plugin-load-add=provider_lz4.so \
      # --duration=300 \
      # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      # --mysqld=--innodb_file_per_table=1 \
      # --mysqld=--loose-innodb_read_only_compressed=OFF \
      # --mysqld=--loose-innodb-sync-debug \
      # --mysqld=--innodb_stats_persistent=off \
      # --mysqld=--innodb_adaptive_hash_index=on \
      # --mysqld=--log-bin \
      # --mysqld=--sync-binlog=1 \
      # --mysqld=--loose-innodb_evict_tables_on_commit_debug=on \
      # --mysqld=--loose-max-statement-time=30 \
      # --threads=33 \
      # --mysqld=--innodb-use-native-aio=0 \
      # --mysqld=--loose-gdb \
      # --mysqld=--loose-debug-gdb \
      # --rr=Extended \
      # --rr_options=--chaos --wait \
      # --mysqld=--loose_innodb_change_buffering=purges \
      # --mysqld=--innodb_rollback_on_timeout=OFF \
      # --vardir_type=slow \
      # --mysqld=--innodb_page_size=16K \
      # --mysqld=--innodb-buffer-pool-size=256M \
      # --no_mask \
      # <local settings>
      

      Attachments

        Issue Links

          Activity

            mleich Matthias Leich created issue -
            mleich Matthias Leich added a comment - - edited

            The bug is quite similar to MDEV-27711.

            mleich Matthias Leich added a comment - - edited The bug is quite similar to MDEV-27711 .

            Thank you! This might also explain intermittent failures of the test atomic.rename_table where InnoDB fails to recover.

            It turns out that a FILE_MODIFY record was parsed for the file that will be complained about, but the first page of the tablespace had not been written before the server was killed:

            ssh pluto
            od -Ax -t x1 /data/results/1655382284/TBR-1547/1/data/test/#sql-alter-11fd01-21.ibd
            000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            *
            01a000
            rr replay /data/results/1655382284/TBR-1547/1/rr/mysqld-2
            watch -l recv_sys.found_corrupt_log
            continue
            reverse-continue
            up
            watch -l rs.first
            reverse-continue
            break fil_ibd_load
            continue
            finish
            …
            Run till exit from #0  fil_ibd_load (space_id=88, 
                filename=0x55affd515394 "./test/#sql-alter-11fd01-21.ibd", 
                space=@0x7ffd55619558: 0x0)
                at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:2527
            0x000055affbb908f8 in fil_name_process (
                name=0x55affd515394 "./test/#sql-alter-11fd01-21.ibd", len=31, 
                space_id=88, deleted=false, lsn=80695284, store=0x7ffd55619cfc)
                at /data/Server/10.6X/storage/innobase/log/log0recv.cc:1253
            1253			switch (fil_ibd_load(space_id, name, space)) {
            Value returned is $1 = FIL_LOAD_DEFER
            

            This is related to MDEV-24626. thiru, can you please debug this further?

            marko Marko Mäkelä added a comment - Thank you! This might also explain intermittent failures of the test atomic.rename_table where InnoDB fails to recover. It turns out that a FILE_MODIFY record was parsed for the file that will be complained about, but the first page of the tablespace had not been written before the server was killed: ssh pluto od -Ax -t x1 /data/results/1655382284/TBR-1547/1/data/test/#sql-alter-11fd01-21.ibd 000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 01a000 rr replay /data/results/1655382284/TBR-1547/1/rr/mysqld-2 watch -l recv_sys.found_corrupt_log continue reverse-continue up watch -l rs.first reverse-continue break fil_ibd_load continue finish … Run till exit from #0 fil_ibd_load (space_id=88, filename=0x55affd515394 "./test/#sql-alter-11fd01-21.ibd", space=@0x7ffd55619558: 0x0) at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:2527 0x000055affbb908f8 in fil_name_process ( name=0x55affd515394 "./test/#sql-alter-11fd01-21.ibd", len=31, space_id=88, deleted=false, lsn=80695284, store=0x7ffd55619cfc) at /data/Server/10.6X/storage/innobase/log/log0recv.cc:1253 1253 switch (fil_ibd_load(space_id, name, space)) { Value returned is $1 = FIL_LOAD_DEFER This is related to MDEV-24626 . thiru , can you please debug this further?
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Thirunarayanan Balathandayuthapani [ thiru ]
            Description
            {noformat}
            origin/10.6 5bb90cb2ac7b93f40c8ca4672ec7e2c18f0ea238 2022-06-16T10:01:29+03:00
            Scenario:
            1. Start the server and load some initial data.
            2. Several sessions run a DDL/DML mix
            3. During 2. is ongoing the server gets intentional killed (SIGSEGV)
            4. The restart attempt fails with
            [rr 1379888 3340]2022-06-16 13:05:09 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=80040121,81223111
            [rr 1379888 5048]2022-06-16 13:05:16 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 88
            [rr 1379888 5052]2022-06-16 13:05:16 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1447] with error Data structure corruption

            pluto:/data/results/1655382284/TBR-1547
            _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-1
                 # Fate of the server till the SIGKILL arrives.
            _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysql-2
                # The restart attempt
            pluto:/data/results/1655382284/TBR-1547/1/fbackup
                # File backup of the server datadirectory made after the SIGKILL
                # and before the restart attempt.

            RQG
            ===
            # git clone https://github.com/mleich1/rqg --branch experimental RQG
            #
            # GIT_SHOW: HEAD -> experimental, origin/experimental a32979c6d789ddf0982e6a56af687565be00590c 2022-06-14T17:21:56+02:00
            # rqg.pl : Version 4.0.6 (2022-05)
            #
            # $RQG_HOME/rqg.pl \
            # --grammar=conf/mariadb/innodb_compression_encryption.yy \
            # --gendata=conf/mariadb/innodb_compression_encryption.zz \
            # --max_gd_duration=1800 \
            # --mysqld=--innodb-encrypt-log \
            # --mysqld=--innodb-encrypt-tables \
            # --reporters=CrashRecovery1 \
            # --redefine=conf/mariadb/redefine_innodb_undo.yy \
            # --mysqld=--innodb-immediate-scrub-data-uncompressed=1 \
            # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
            # --mysqld=--loose-idle_write_transaction_timeout=0 \
            # --mysqld=--loose-idle_transaction_timeout=0 \
            # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            # --mysqld=--connect_timeout=60 \
            # --mysqld=--interactive_timeout=28800 \
            # --mysqld=--slave_net_timeout=60 \
            # --mysqld=--net_read_timeout=30 \
            # --mysqld=--net_write_timeout=60 \
            # --mysqld=--loose-table_lock_wait_timeout=50 \
            # --mysqld=--wait_timeout=28800 \
            # --mysqld=--lock-wait-timeout=86400 \
            # --mysqld=--innodb-lock-wait-timeout=50 \
            # --no-mask \
            # --queries=10000000 \
            # --seed=random \
            # --reporters=Backtrace \
            # --reporters=ErrorLog \
            # --reporters=Deadlock1 \
            # --validators=None \
            # --mysqld=--log_output=none \
            # --mysqld=--log_bin_trust_function_creators=1 \
            # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            # --engine=InnoDB \
            # --restart_timeout=240 \
            # --mysqld=--plugin-load-add=file_key_management.so \
            # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
            # --mysqld=--plugin-load-add=provider_lzo.so \
            # --mysqld=--plugin-load-add=provider_bzip2.so \
            # --mysqld=--plugin-load-add=provider_lzma.so \
            # --mysqld=--plugin-load-add=provider_snappy.so \
            # --mysqld=--plugin-load-add=provider_lz4.so \
            # --duration=300 \
            # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
            # --mysqld=--innodb_file_per_table=1 \
            # --mysqld=--loose-innodb_read_only_compressed=OFF \
            # --mysqld=--loose-innodb-sync-debug \
            # --mysqld=--innodb_stats_persistent=off \
            # --mysqld=--innodb_adaptive_hash_index=on \
            # --mysqld=--log-bin \
            # --mysqld=--sync-binlog=1 \
            # --mysqld=--loose-innodb_evict_tables_on_commit_debug=on \
            # --mysqld=--loose-max-statement-time=30 \
            # --threads=33 \
            # --mysqld=--innodb-use-native-aio=0 \
            # --mysqld=--loose-gdb \
            # --mysqld=--loose-debug-gdb \
            # --rr=Extended \
            # --rr_options=--chaos --wait \
            # --mysqld=--loose_innodb_change_buffering=purges \
            # --mysqld=--innodb_rollback_on_timeout=OFF \
            # --vardir_type=slow \
            # --mysqld=--innodb_page_size=16K \
            # --mysqld=--innodb-buffer-pool-size=256M \
            # --no_mask \
            # <local settings>
            {noformat}
            {noformat}
            origin/10.6 5bb90cb2ac7b93f40c8ca4672ec7e2c18f0ea238 2022-06-16T10:01:29+03:00
            Scenario:
            1. Start the server and load some initial data.
            2. Several sessions run a DDL/DML mix
            3. During 2. is ongoing the server gets intentional killed (SIGSEGV)
            4. The restart attempt fails with
            [rr 1379888 3340]2022-06-16 13:05:09 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=80040121,81223111
            [rr 1379888 5048]2022-06-16 13:05:16 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 88
            [rr 1379888 5052]2022-06-16 13:05:16 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1447] with error Data structure corruption

            pluto:/data/results/1655382284/TBR-1547
            _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-1
                 # Fate of the server till the SIGKILL arrives.
            _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysql-2
                # The restart attempt
            pluto:/data/results/1655382284/TBR-1547/1/fbackup
                # File backup of the server datadirectory made after the SIGKILL
                # and before the restart attempt.

            RQG
            ===
            # git clone https://github.com/mleich1/rqg --branch experimental RQG
            #
            # GIT_SHOW: HEAD -> experimental, origin/experimental a32979c6d789ddf0982e6a56af687565be00590c 2022-06-14T17:21:56+02:00
            # rqg.pl : Version 4.0.6 (2022-05)
            #
            # $RQG_HOME/rqg.pl \
            # --grammar=conf/mariadb/innodb_compression_encryption.yy \
            # --gendata=conf/mariadb/innodb_compression_encryption.zz \
            # --max_gd_duration=1800 \
            # --mysqld=--innodb-encrypt-log \
            # --mysqld=--innodb-encrypt-tables \
            # --reporters=CrashRecovery1 \
            # --redefine=conf/mariadb/redefine_innodb_undo.yy \
            # --mysqld=--innodb-immediate-scrub-data-uncompressed=1 \
            # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
            # --mysqld=--loose-idle_write_transaction_timeout=0 \
            # --mysqld=--loose-idle_transaction_timeout=0 \
            # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            # --mysqld=--connect_timeout=60 \
            # --mysqld=--interactive_timeout=28800 \
            # --mysqld=--slave_net_timeout=60 \
            # --mysqld=--net_read_timeout=30 \
            # --mysqld=--net_write_timeout=60 \
            # --mysqld=--loose-table_lock_wait_timeout=50 \
            # --mysqld=--wait_timeout=28800 \
            # --mysqld=--lock-wait-timeout=86400 \
            # --mysqld=--innodb-lock-wait-timeout=50 \
            # --no-mask \
            # --queries=10000000 \
            # --seed=random \
            # --reporters=Backtrace \
            # --reporters=ErrorLog \
            # --reporters=Deadlock1 \
            # --validators=None \
            # --mysqld=--log_output=none \
            # --mysqld=--log_bin_trust_function_creators=1 \
            # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            # --engine=InnoDB \
            # --restart_timeout=240 \
            # --mysqld=--plugin-load-add=file_key_management.so \
            # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
            # --mysqld=--plugin-load-add=provider_lzo.so \
            # --mysqld=--plugin-load-add=provider_bzip2.so \
            # --mysqld=--plugin-load-add=provider_lzma.so \
            # --mysqld=--plugin-load-add=provider_snappy.so \
            # --mysqld=--plugin-load-add=provider_lz4.so \
            # --duration=300 \
            # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
            # --mysqld=--innodb_file_per_table=1 \
            # --mysqld=--loose-innodb_read_only_compressed=OFF \
            # --mysqld=--loose-innodb-sync-debug \
            # --mysqld=--innodb_stats_persistent=off \
            # --mysqld=--innodb_adaptive_hash_index=on \
            # --mysqld=--log-bin \
            # --mysqld=--sync-binlog=1 \
            # --mysqld=--loose-innodb_evict_tables_on_commit_debug=on \
            # --mysqld=--loose-max-statement-time=30 \
            # --threads=33 \
            # --mysqld=--innodb-use-native-aio=0 \
            # --mysqld=--loose-gdb \
            # --mysqld=--loose-debug-gdb \
            # --rr=Extended \
            # --rr_options=--chaos --wait \
            # --mysqld=--loose_innodb_change_buffering=purges \
            # --mysqld=--innodb_rollback_on_timeout=OFF \
            # --vardir_type=slow \
            # --mysqld=--innodb_page_size=16K \
            # --mysqld=--innodb-buffer-pool-size=256M \
            # --no_mask \
            # <local settings>
            {noformat}

            It looks like FILE_DELETE record is eaten by FILE_CHECKPOINT. So during recovery, InnoDB encounters FILE_MODIFY for the tablespace, not FILE_DELETE.
            let me paste the stack trace:

            (rr) t 4
            [Switching to thread 4 (Thread 1178881.1179183)]
            #0  mtr_t::log_file_op (this=0x1b7635450800, type=FILE_MODIFY, space_id=88, path=0x362b30042690 "./test/#sql-alter-11fd01-21.ibd", new_path=0x0)
                at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:1475
            1475	  ut_ad((new_path != nullptr) == (type == FILE_RENAME));
            (rr) where
            #0  mtr_t::log_file_op (this=0x1b7635450800, type=FILE_MODIFY, space_id=88, path=0x362b30042690 "./test/#sql-alter-11fd01-21.ibd", new_path=0x0)
                at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:1475
            #1  0x000055741bff04ce in fil_name_write (space_id=88, name=0x362b30042690 "./test/#sql-alter-11fd01-21.ibd", mtr=0x1b7635450800)
                at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:1572
            #2  0x000055741bff639e in fil_names_write (space=0x362b3001e2e0, mtr=0x1b7635450800) at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:3121
            #3  0x000055741bff6c62 in fil_names_clear (lsn=77132331, do_write=true) at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:3214
            #4  0x000055741bf828c7 in log_checkpoint_low (oldest_lsn=77132331, end_lsn=80695284) at /data/Server/10.6X/storage/innobase/buf/buf0flu.cc:1745
            #5  0x000055741bf82d6b in log_checkpoint () at /data/Server/10.6X/storage/innobase/buf/buf0flu.cc:1804
            #6  0x000055741bf85516 in buf_flush_page_cleaner () at /data/Server/10.6X/storage/innobase/buf/buf0flu.cc:2341
            #7  0x000055741bf895e5 in std::__invoke_impl<void, void (*)()> (__f=@0x705b035a4348: 0x55741bf84bb0 <buf_flush_page_cleaner()>)
                at /usr/include/c++/9/bits/invoke.h:60
            #8  0x000055741bf89591 in std::__invoke<void (*)()> (__fn=@0x705b035a4348: 0x55741bf84bb0 <buf_flush_page_cleaner()>)
                at /usr/include/c++/9/bits/invoke.h:95
            #9  0x000055741bf89532 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x705b035a4348) at /usr/include/c++/9/thread:244
            #10 0x000055741bf89504 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x705b035a4348) at /usr/include/c++/9/thread:251
            #11 0x000055741bf894e4 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x705b035a4340)
                at /usr/include/c++/9/thread:195
            #12 0x00007fb567642de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
            #13 0x0000056114927609 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #14 0x000055741ab6c293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            (rr) when
            Current event: 1333838
             
             
             
             
             
            (rr) t 16
            [Switching to thread 16 (Thread 1178881.1296841)]
            #0  0x0000000070000002 in syscall_traced ()
            (rr) where
            #0  0x0000000070000002 in syscall_traced ()
            #1  0x00000ed43c97c1a4 in _raw_syscall () at /home/roc/rr/rr/src/preload/raw_syscall.S:120
            #2  0x00000ed43c9772ce in traced_raw_syscall (call=<optimized out>) at /home/roc/rr/rr/src/preload/syscallbuf.c:278
            #3  0x00000ed43c97b0c3 in sys_fcntl (call=<optimized out>) at /home/roc/rr/rr/src/preload/syscallbuf.c:1529
            #4  syscall_hook_internal (call=0x7fb561a7cfa0) at /home/roc/rr/rr/src/preload/syscallbuf.c:3293
            #5  syscall_hook (call=0x7fb561a7cfa0) at /home/roc/rr/rr/src/preload/syscallbuf.c:3454
            #6  0x00000ed43c9770b0 in _syscall_hook_trampoline () at /home/roc/rr/rr/src/preload/syscall_hook.S:313
            #7  0x00000ed43c97710f in __morestack () at /home/roc/rr/rr/src/preload/syscall_hook.S:458
            #8  0x00000ed43c977116 in _syscall_hook_trampoline_48_3d_01_f0_ff_ff () at /home/roc/rr/rr/src/preload/syscall_hook.S:472
            #9  0x000055741ab658a3 in syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
            #10 0x000055741bd85e97 in binary_semaphore::wait (this=0x7fb566aa96d0) at /data/Server/10.6X/storage/innobase/log/log0sync.cc:127
            #11 0x000055741bd86376 in group_commit_lock::acquire (this=0x55741d67bc00 <flush_lock>, num=80006576, callback=0x0)
                at /data/Server/10.6X/storage/innobase/log/log0sync.cc:273
            #12 0x000055741bd57652 in log_write_up_to (lsn=80006576, flush_to_disk=true, rotate_key=false, callback=0x0)
                at /data/Server/10.6X/storage/innobase/log/log0log.cc:828
            #13 0x000055741bff0b1a in fil_delete_tablespace (id=88) at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:1692
            #14 0x000055741bfe604c in trx_t::commit (this=0x368d26821780, deleted=std::vector of length 0, capacity 0)
                at /data/Server/10.6X/storage/innobase/dict/drop.cc:274
            #15 0x000055741bcc9fc4 in commit_unlock_and_unlink (trx=0x368d26821780) at /data/Server/10.6X/storage/innobase/handler/handler0alter.cc:4180
            #16 0x000055741bcf20d0 in rollback_inplace_alter_table (ha_alter_info=0x7fb566aa5330, table=0x362b3027a7e8, prebuilt=0x362b302a0f30)
                at /data/Server/10.6X/storage/innobase/handler/handler0alter.cc:8854
            #17 0x000055741bce0ad6 in ha_innobase::commit_inplace_alter_table (this=0x362b302af650, altered_table=0x7fb566aa53f0, 
                ha_alter_info=0x7fb566aa5330, commit=false) at /data/Server/10.6X/storage/innobase/handler/handler0alter.cc:10863
            #18 0x000055741b8e0dde in handler::ha_commit_inplace_alter_table (this=0x362b302af650, altered_table=0x7fb566aa53f0, 
                ha_alter_info=0x7fb566aa5330, commit=false) at /data/Server/10.6X/sql/handler.cc:5219
            #19 0x000055741b654aa0 in mysql_inplace_alter_table (thd=0x362b300206a8, table_list=0x362b3002f1f0, table=0x362b3027a7e8, 
                altered_table=0x7fb566aa53f0, ha_alter_info=0x7fb566aa5330, target_mdl_request=0x7fb566aa5c30, ddl_log_state=0x7fb566aa52f0, 
                trigger_param=0x7fb566aa57e0, alter_ctx=0x7fb566aa6790) at /data/Server/10.6X/sql/sql_table.cc:7570
            #20 0x000055741b65cdf6 in mysql_alter_table (thd=0x362b300206a8, new_db=0x362b300250a8, new_name=0x362b300254c0, create_info=0x7fb566aa7580, 
                table_list=0x362b3002f1f0, alter_info=0x7fb566aa7490, order_num=0, order=0x0, ignore=false, if_exists=false)
                at /data/Server/10.6X/sql/sql_table.cc:10278
            #21 0x000055741b706cb6 in Sql_cmd_alter_table::execute (this=0x362b3002f920, thd=0x362b300206a8) at /data/Server/10.6X/sql/sql_alter.cc:542
            #22 0x000055741b564e23 in mysql_execute_command (thd=0x362b300206a8, is_called_from_prepared_stmt=false)
                at /data/Server/10.6X/sql/sql_parse.cc:5996
            #23 0x000055741b56ab3c in mysql_parse (thd=0x362b300206a8, 
                rawbuf=0x362b3002f0c0 "ALTER TABLE `t3` ENCRYPTED=YES /* E_R Thread7 QNO 14 CON_ID 33 */", length=65, parser_state=0x7fb566aa8580)
            --Type <RET> for more, q to quit, c to continue without paging--
                at /data/Server/10.6X/sql/sql_parse.cc:8029
            #24 0x000055741b557741 in dispatch_command (command=COM_QUERY, thd=0x362b300206a8, 
                packet=0x362b30027049 "ALTER TABLE `t3` ENCRYPTED=YES /* E_R Thread7 QNO 14 CON_ID 33 */ ", packet_length=66, blocking=true)
                at /data/Server/10.6X/sql/sql_parse.cc:1896
            #25 0x000055741b556290 in do_command (thd=0x362b300206a8, blocking=true) at /data/Server/10.6X/sql/sql_parse.cc:1409
            #26 0x000055741b6fc792 in do_handle_one_connection (connect=0x48fc7a172168, put_in_cache=true) at /data/Server/10.6X/sql/sql_connect.cc:1418
            #27 0x000055741b6fc427 in handle_one_connection (arg=0x48fc7a172168) at /data/Server/10.6X/sql/sql_connect.cc:1312
            #28 0x0000056114927609 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #29 0x000055741ab6c293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            (rr) when
            Current event: 1333838
            

            Reassigning this issue to marko like he suggested.

            thiru Thirunarayanan Balathandayuthapani added a comment - - edited It looks like FILE_DELETE record is eaten by FILE_CHECKPOINT . So during recovery, InnoDB encounters FILE_MODIFY for the tablespace, not FILE_DELETE . let me paste the stack trace: (rr) t 4 [Switching to thread 4 (Thread 1178881.1179183)] #0 mtr_t::log_file_op (this=0x1b7635450800, type=FILE_MODIFY, space_id=88, path=0x362b30042690 "./test/#sql-alter-11fd01-21.ibd", new_path=0x0) at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:1475 1475 ut_ad((new_path != nullptr) == (type == FILE_RENAME)); (rr) where #0 mtr_t::log_file_op (this=0x1b7635450800, type=FILE_MODIFY, space_id=88, path=0x362b30042690 "./test/#sql-alter-11fd01-21.ibd", new_path=0x0) at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:1475 #1 0x000055741bff04ce in fil_name_write (space_id=88, name=0x362b30042690 "./test/#sql-alter-11fd01-21.ibd", mtr=0x1b7635450800) at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:1572 #2 0x000055741bff639e in fil_names_write (space=0x362b3001e2e0, mtr=0x1b7635450800) at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:3121 #3 0x000055741bff6c62 in fil_names_clear (lsn=77132331, do_write=true) at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:3214 #4 0x000055741bf828c7 in log_checkpoint_low (oldest_lsn=77132331, end_lsn=80695284) at /data/Server/10.6X/storage/innobase/buf/buf0flu.cc:1745 #5 0x000055741bf82d6b in log_checkpoint () at /data/Server/10.6X/storage/innobase/buf/buf0flu.cc:1804 #6 0x000055741bf85516 in buf_flush_page_cleaner () at /data/Server/10.6X/storage/innobase/buf/buf0flu.cc:2341 #7 0x000055741bf895e5 in std::__invoke_impl<void, void (*)()> (__f=@0x705b035a4348: 0x55741bf84bb0 <buf_flush_page_cleaner()>) at /usr/include/c++/9/bits/invoke.h:60 #8 0x000055741bf89591 in std::__invoke<void (*)()> (__fn=@0x705b035a4348: 0x55741bf84bb0 <buf_flush_page_cleaner()>) at /usr/include/c++/9/bits/invoke.h:95 #9 0x000055741bf89532 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x705b035a4348) at /usr/include/c++/9/thread:244 #10 0x000055741bf89504 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x705b035a4348) at /usr/include/c++/9/thread:251 #11 0x000055741bf894e4 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x705b035a4340) at /usr/include/c++/9/thread:195 #12 0x00007fb567642de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #13 0x0000056114927609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #14 0x000055741ab6c293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (rr) when Current event: 1333838           (rr) t 16 [Switching to thread 16 (Thread 1178881.1296841)] #0 0x0000000070000002 in syscall_traced () (rr) where #0 0x0000000070000002 in syscall_traced () #1 0x00000ed43c97c1a4 in _raw_syscall () at /home/roc/rr/rr/src/preload/raw_syscall.S:120 #2 0x00000ed43c9772ce in traced_raw_syscall (call=<optimized out>) at /home/roc/rr/rr/src/preload/syscallbuf.c:278 #3 0x00000ed43c97b0c3 in sys_fcntl (call=<optimized out>) at /home/roc/rr/rr/src/preload/syscallbuf.c:1529 #4 syscall_hook_internal (call=0x7fb561a7cfa0) at /home/roc/rr/rr/src/preload/syscallbuf.c:3293 #5 syscall_hook (call=0x7fb561a7cfa0) at /home/roc/rr/rr/src/preload/syscallbuf.c:3454 #6 0x00000ed43c9770b0 in _syscall_hook_trampoline () at /home/roc/rr/rr/src/preload/syscall_hook.S:313 #7 0x00000ed43c97710f in __morestack () at /home/roc/rr/rr/src/preload/syscall_hook.S:458 #8 0x00000ed43c977116 in _syscall_hook_trampoline_48_3d_01_f0_ff_ff () at /home/roc/rr/rr/src/preload/syscall_hook.S:472 #9 0x000055741ab658a3 in syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #10 0x000055741bd85e97 in binary_semaphore::wait (this=0x7fb566aa96d0) at /data/Server/10.6X/storage/innobase/log/log0sync.cc:127 #11 0x000055741bd86376 in group_commit_lock::acquire (this=0x55741d67bc00 <flush_lock>, num=80006576, callback=0x0) at /data/Server/10.6X/storage/innobase/log/log0sync.cc:273 #12 0x000055741bd57652 in log_write_up_to (lsn=80006576, flush_to_disk=true, rotate_key=false, callback=0x0) at /data/Server/10.6X/storage/innobase/log/log0log.cc:828 #13 0x000055741bff0b1a in fil_delete_tablespace (id=88) at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:1692 #14 0x000055741bfe604c in trx_t::commit (this=0x368d26821780, deleted=std::vector of length 0, capacity 0) at /data/Server/10.6X/storage/innobase/dict/drop.cc:274 #15 0x000055741bcc9fc4 in commit_unlock_and_unlink (trx=0x368d26821780) at /data/Server/10.6X/storage/innobase/handler/handler0alter.cc:4180 #16 0x000055741bcf20d0 in rollback_inplace_alter_table (ha_alter_info=0x7fb566aa5330, table=0x362b3027a7e8, prebuilt=0x362b302a0f30) at /data/Server/10.6X/storage/innobase/handler/handler0alter.cc:8854 #17 0x000055741bce0ad6 in ha_innobase::commit_inplace_alter_table (this=0x362b302af650, altered_table=0x7fb566aa53f0, ha_alter_info=0x7fb566aa5330, commit=false) at /data/Server/10.6X/storage/innobase/handler/handler0alter.cc:10863 #18 0x000055741b8e0dde in handler::ha_commit_inplace_alter_table (this=0x362b302af650, altered_table=0x7fb566aa53f0, ha_alter_info=0x7fb566aa5330, commit=false) at /data/Server/10.6X/sql/handler.cc:5219 #19 0x000055741b654aa0 in mysql_inplace_alter_table (thd=0x362b300206a8, table_list=0x362b3002f1f0, table=0x362b3027a7e8, altered_table=0x7fb566aa53f0, ha_alter_info=0x7fb566aa5330, target_mdl_request=0x7fb566aa5c30, ddl_log_state=0x7fb566aa52f0, trigger_param=0x7fb566aa57e0, alter_ctx=0x7fb566aa6790) at /data/Server/10.6X/sql/sql_table.cc:7570 #20 0x000055741b65cdf6 in mysql_alter_table (thd=0x362b300206a8, new_db=0x362b300250a8, new_name=0x362b300254c0, create_info=0x7fb566aa7580, table_list=0x362b3002f1f0, alter_info=0x7fb566aa7490, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/Server/10.6X/sql/sql_table.cc:10278 #21 0x000055741b706cb6 in Sql_cmd_alter_table::execute (this=0x362b3002f920, thd=0x362b300206a8) at /data/Server/10.6X/sql/sql_alter.cc:542 #22 0x000055741b564e23 in mysql_execute_command (thd=0x362b300206a8, is_called_from_prepared_stmt=false) at /data/Server/10.6X/sql/sql_parse.cc:5996 #23 0x000055741b56ab3c in mysql_parse (thd=0x362b300206a8, rawbuf=0x362b3002f0c0 "ALTER TABLE `t3` ENCRYPTED=YES /* E_R Thread7 QNO 14 CON_ID 33 */", length=65, parser_state=0x7fb566aa8580) --Type <RET> for more, q to quit, c to continue without paging-- at /data/Server/10.6X/sql/sql_parse.cc:8029 #24 0x000055741b557741 in dispatch_command (command=COM_QUERY, thd=0x362b300206a8, packet=0x362b30027049 "ALTER TABLE `t3` ENCRYPTED=YES /* E_R Thread7 QNO 14 CON_ID 33 */ ", packet_length=66, blocking=true) at /data/Server/10.6X/sql/sql_parse.cc:1896 #25 0x000055741b556290 in do_command (thd=0x362b300206a8, blocking=true) at /data/Server/10.6X/sql/sql_parse.cc:1409 #26 0x000055741b6fc792 in do_handle_one_connection (connect=0x48fc7a172168, put_in_cache=true) at /data/Server/10.6X/sql/sql_connect.cc:1418 #27 0x000055741b6fc427 in handle_one_connection (arg=0x48fc7a172168) at /data/Server/10.6X/sql/sql_connect.cc:1312 #28 0x0000056114927609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #29 0x000055741ab6c293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (rr) when Current event: 1333838 Reassigning this issue to marko like he suggested.
            thiru Thirunarayanan Balathandayuthapani made changes -
            Assignee Thirunarayanan Balathandayuthapani [ thiru ] Marko Mäkelä [ marko ]

            Thanks to further debugging, the problem appears to be insufficient mutual exclusion between log checkpoint and fil_delete_tablespace(). I think that this race condition may have been caused by MDEV-21534.

            In this particular execution, the log checkpoint violated ACID by essentially deleting a FILE_DELETE record that had been written earlier. Had the server been killed before the checkpoint was updated, the FILE_DELETE record would have been processed by crash recovery and the file would have been deleted as expected.

            If such a race condition is possible also with regard to FILE_RENAME and renaming files, that should explain failures of the test atomic.rename_table (MDEV-27111).

            marko Marko Mäkelä added a comment - Thanks to further debugging, the problem appears to be insufficient mutual exclusion between log checkpoint and fil_delete_tablespace() . I think that this race condition may have been caused by MDEV-21534 . In this particular execution, the log checkpoint violated ACID by essentially deleting a FILE_DELETE record that had been written earlier. Had the server been killed before the checkpoint was updated, the FILE_DELETE record would have been processed by crash recovery and the file would have been deleted as expected. If such a race condition is possible also with regard to FILE_RENAME and renaming files, that should explain failures of the test atomic.rename_table ( MDEV-27111 ).
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Affects Version/s 10.5 [ 23123 ]
            Affects Version/s 10.6 [ 24028 ]
            Affects Version/s 10.7 [ 24805 ]
            Affects Version/s 10.8 [ 26121 ]
            Affects Version/s 10.9 [ 26905 ]
            Affects Version/s 10.10 [ 27530 ]
            Affects Version/s 10.6.9 [ 27507 ]
            Labels rr-profile corruption recovery rr-profile
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] In Progress [ 3 ]

            It looks like fil_delete_tablespace() must employ some logic similar to mtr_t::commit_shrink() that is part of innodb_undo_log_truncate. Possibly something like this:

            log_write_and_flush_prepare();
            // write the FILE_DELETE record
            fil_system.named_spaces.remove(*space);
            // …
            mysql_mutex_lock(&log_sys.flush_order_mutex);
            log_write_and_flush();
            

            Something similar is necessary around fil_name_write_rename() as well, to ensure that a log checkpoint may not occur between the write of a FILE_RENAME record and the actual rename operation in the file system. Only after the file has been deleted or renamed in the file system, we may allow a log checkpoint to discard the record.

            marko Marko Mäkelä added a comment - It looks like fil_delete_tablespace() must employ some logic similar to mtr_t::commit_shrink() that is part of innodb_undo_log_truncate . Possibly something like this: log_write_and_flush_prepare(); // write the FILE_DELETE record fil_system.named_spaces. remove (*space); // … mysql_mutex_lock(&log_sys.flush_order_mutex); log_write_and_flush(); Something similar is necessary around fil_name_write_rename() as well, to ensure that a log checkpoint may not occur between the write of a FILE_RENAME record and the actual rename operation in the file system. Only after the file has been deleted or renamed in the file system, we may allow a log checkpoint to discard the record.

            In addition to being similar to MDEV-27711, this bug is also similar to MDEV-27111 in the sense that the test atomic.rename_table would still occasionally fail. I repeated a failure of that test without my tentative fix, and did not repeat any failure with the fix.

            marko Marko Mäkelä added a comment - In addition to being similar to MDEV-27711 , this bug is also similar to MDEV-27111 in the sense that the test atomic.rename_table would still occasionally fail. I repeated a failure of that test without my tentative fix, and did not repeat any failure with the fix.

            I ended up not fixing this in 10.5, because this affects the recovery DDL operations, and those operations were not crash-safe before 10.6.

            marko Marko Mäkelä added a comment - I ended up not fixing this in 10.5, because this affects the recovery DDL operations, and those operations were not crash-safe before 10.6.
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2022-06-21 14:54:26.0 2022-06-21 14:54:26.808
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6.9 [ 27507 ]
            Fix Version/s 10.7.5 [ 27505 ]
            Fix Version/s 10.8.4 [ 27503 ]
            Fix Version/s 10.9.2 [ 27115 ]
            Fix Version/s 10.10.1 [ 27913 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            jwagner Jörn Wagner added a comment - - edited

            Hi,

            I'm currently affected by this bug as it seems. Running 10.6.8. After a system restart, my local MariaDB instance will not start up with the aforementioned error. Will this fix only prevent the error from happening in the future or will it also fix a broken recovery log? Do I need to rebuild my whole database?

            jwagner Jörn Wagner added a comment - - edited Hi, I'm currently affected by this bug as it seems. Running 10.6.8. After a system restart, my local MariaDB instance will not start up with the aforementioned error. Will this fix only prevent the error from happening in the future or will it also fix a broken recovery log? Do I need to rebuild my whole database?

            The recently released MariaDB Server 10.6.9 includes a number of recovery fixes, such as MDEV-28923. The bug that we reproduced internally occurred due to a race condition in the write side, and it cannot be easily worked around in recovery. If 10.6.9 cannot start up with that data directory, one thing that could still be done to work around this bug would be to remove the latest log checkpoint from the redo log file (to let the server recover from the previous log checkpoint).

            marko Marko Mäkelä added a comment - The recently released MariaDB Server 10.6.9 includes a number of recovery fixes, such as MDEV-28923 . The bug that we reproduced internally occurred due to a race condition in the write side, and it cannot be easily worked around in recovery. If 10.6.9 cannot start up with that data directory, one thing that could still be done to work around this bug would be to remove the latest log checkpoint from the redo log file (to let the server recover from the previous log checkpoint).

            Unfortunately, I do not see MariaDB 10.6.9 available for download. https://jira.mariadb.org/projects/MDEV/versions/27507 says it's unreleased but all issues are done. https://mariadb.com/kb/en/mariadb-1069-changelog/ also states "There are currently no official packages or binaries available for download which contain the features". Where can we find or when can we expect binaries for that version?

            jwagner Jörn Wagner added a comment - Unfortunately, I do not see MariaDB 10.6.9 available for download. https://jira.mariadb.org/projects/MDEV/versions/27507 says it's unreleased but all issues are done. https://mariadb.com/kb/en/mariadb-1069-changelog/ also states "There are currently no official packages or binaries available for download which contain the features". Where can we find or when can we expect binaries for that version?
            danblack Daniel Black added a comment -

            Release planning is on this jira page.

            danblack Daniel Black added a comment - Release planning is on this jira page .
            marko Marko Mäkelä made changes -
            danblack Daniel Black made changes -

            One more reason for these symptoms could be that multiple instances of InnoDB were running on the same data files. This would be a regression due to implementing MDEV-24393. This regression was fixed in MDEV-28495 and MDEV-31568, by ensuring that an advisory lock on the InnoDB system tablespace will always be held.

            marko Marko Mäkelä added a comment - One more reason for these symptoms could be that multiple instances of InnoDB were running on the same data files. This would be a regression due to implementing MDEV-24393 . This regression was fixed in MDEV-28495 and MDEV-31568 , by ensuring that an advisory lock on the InnoDB system tablespace will always be held.
            marko Marko Mäkelä made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 203055
            Zendesk active tickets 203055
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 203055 203055 203048
            Zendesk active tickets 203055 203055 203048
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk active tickets 203048 203055 203055
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk active tickets 203055

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.