[MDEV-28870] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT during crash recovery Created: 2022-06-16  Updated: 2024-01-31  Resolved: 2022-06-21

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5, 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.6.9, 10.7.5, 10.8.4, 10.9.2, 10.10.1

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: corruption, recovery, rr-profile

Issue Links:
Blocks
blocks MDEV-28923 atomic.rename_table occasionally fails Closed
Problem/Incident
causes MDEV-30209 Race condition between fil_node_open_... Closed
Relates
relates to MDEV-21534 improve locking/waiting in log_write_... Closed
relates to MDEV-24626 Remove synchronous write of page0 and... Closed
relates to MDEV-27711 [ERROR] InnoDB: Missing FILE_CREATE, ... Closed
relates to MDEV-31410 mariadb-backup prepare crash with Inn... Closed

 Description   

origin/10.6 5bb90cb2ac7b93f40c8ca4672ec7e2c18f0ea238 2022-06-16T10:01:29+03:00
Scenario:
1. Start the server and load some initial data.
2. Several sessions run a DDL/DML mix
3. During 2. is ongoing the server gets intentional killed (SIGSEGV) 
4. The restart attempt fails with
[rr 1379888 3340]2022-06-16 13:05:09 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=80040121,81223111
[rr 1379888 5048]2022-06-16 13:05:16 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 88
[rr 1379888 5052]2022-06-16 13:05:16 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1447] with error Data structure corruption
 
pluto:/data/results/1655382284/TBR-1547
_RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-1
     # Fate of the server till the SIGKILL arrives.
_RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysql-2
    # The restart attempt
pluto:/data/results/1655382284/TBR-1547/1/fbackup
    # File backup of the server datadirectory made after the SIGKILL
    # and before the restart attempt.
 
RQG
===
# git clone https://github.com/mleich1/rqg --branch experimental RQG
#
# GIT_SHOW: HEAD -> experimental, origin/experimental a32979c6d789ddf0982e6a56af687565be00590c 2022-06-14T17:21:56+02:00
# rqg.pl  : Version 4.0.6 (2022-05)
#
# $RQG_HOME/rqg.pl \
# --grammar=conf/mariadb/innodb_compression_encryption.yy \
# --gendata=conf/mariadb/innodb_compression_encryption.zz \
# --max_gd_duration=1800 \
# --mysqld=--innodb-encrypt-log \
# --mysqld=--innodb-encrypt-tables \
# --reporters=CrashRecovery1 \
# --redefine=conf/mariadb/redefine_innodb_undo.yy \
# --mysqld=--innodb-immediate-scrub-data-uncompressed=1 \
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --no-mask \
# --queries=10000000 \
# --seed=random \
# --reporters=Backtrace \
# --reporters=ErrorLog \
# --reporters=Deadlock1 \
# --validators=None \
# --mysqld=--log_output=none \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --engine=InnoDB \
# --restart_timeout=240 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--plugin-load-add=provider_lzo.so \
# --mysqld=--plugin-load-add=provider_bzip2.so \
# --mysqld=--plugin-load-add=provider_lzma.so \
# --mysqld=--plugin-load-add=provider_snappy.so \
# --mysqld=--plugin-load-add=provider_lz4.so \
# --duration=300 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--innodb_file_per_table=1 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--loose-innodb-sync-debug \
# --mysqld=--innodb_stats_persistent=off \
# --mysqld=--innodb_adaptive_hash_index=on \
# --mysqld=--log-bin \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=on \
# --mysqld=--loose-max-statement-time=30 \
# --threads=33 \
# --mysqld=--innodb-use-native-aio=0 \
# --mysqld=--loose-gdb \
# --mysqld=--loose-debug-gdb \
# --rr=Extended \
# --rr_options=--chaos --wait \
# --mysqld=--loose_innodb_change_buffering=purges \
# --mysqld=--innodb_rollback_on_timeout=OFF \
# --vardir_type=slow \
# --mysqld=--innodb_page_size=16K \
# --mysqld=--innodb-buffer-pool-size=256M \
# --no_mask \
# <local settings>



 Comments   
Comment by Matthias Leich [ 2022-06-16 ]

The bug is quite similar to MDEV-27711.

Comment by Marko Mäkelä [ 2022-06-17 ]

Thank you! This might also explain intermittent failures of the test atomic.rename_table where InnoDB fails to recover.

It turns out that a FILE_MODIFY record was parsed for the file that will be complained about, but the first page of the tablespace had not been written before the server was killed:

ssh pluto
od -Ax -t x1 /data/results/1655382284/TBR-1547/1/data/test/#sql-alter-11fd01-21.ibd
000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
01a000
rr replay /data/results/1655382284/TBR-1547/1/rr/mysqld-2
watch -l recv_sys.found_corrupt_log
continue
reverse-continue
up
watch -l rs.first
reverse-continue
break fil_ibd_load
continue
finish
Run till exit from #0  fil_ibd_load (space_id=88, 
    filename=0x55affd515394 "./test/#sql-alter-11fd01-21.ibd", 
    space=@0x7ffd55619558: 0x0)
    at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:2527
0x000055affbb908f8 in fil_name_process (
    name=0x55affd515394 "./test/#sql-alter-11fd01-21.ibd", len=31, 
    space_id=88, deleted=false, lsn=80695284, store=0x7ffd55619cfc)
    at /data/Server/10.6X/storage/innobase/log/log0recv.cc:1253
1253			switch (fil_ibd_load(space_id, name, space)) {
Value returned is $1 = FIL_LOAD_DEFER

This is related to MDEV-24626. thiru, can you please debug this further?

Comment by Thirunarayanan Balathandayuthapani [ 2022-06-17 ]

It looks like FILE_DELETE record is eaten by FILE_CHECKPOINT. So during recovery, InnoDB encounters FILE_MODIFY for the tablespace, not FILE_DELETE.
let me paste the stack trace:

(rr) t 4
[Switching to thread 4 (Thread 1178881.1179183)]
#0  mtr_t::log_file_op (this=0x1b7635450800, type=FILE_MODIFY, space_id=88, path=0x362b30042690 "./test/#sql-alter-11fd01-21.ibd", new_path=0x0)
    at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:1475
1475	  ut_ad((new_path != nullptr) == (type == FILE_RENAME));
(rr) where
#0  mtr_t::log_file_op (this=0x1b7635450800, type=FILE_MODIFY, space_id=88, path=0x362b30042690 "./test/#sql-alter-11fd01-21.ibd", new_path=0x0)
    at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:1475
#1  0x000055741bff04ce in fil_name_write (space_id=88, name=0x362b30042690 "./test/#sql-alter-11fd01-21.ibd", mtr=0x1b7635450800)
    at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:1572
#2  0x000055741bff639e in fil_names_write (space=0x362b3001e2e0, mtr=0x1b7635450800) at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:3121
#3  0x000055741bff6c62 in fil_names_clear (lsn=77132331, do_write=true) at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:3214
#4  0x000055741bf828c7 in log_checkpoint_low (oldest_lsn=77132331, end_lsn=80695284) at /data/Server/10.6X/storage/innobase/buf/buf0flu.cc:1745
#5  0x000055741bf82d6b in log_checkpoint () at /data/Server/10.6X/storage/innobase/buf/buf0flu.cc:1804
#6  0x000055741bf85516 in buf_flush_page_cleaner () at /data/Server/10.6X/storage/innobase/buf/buf0flu.cc:2341
#7  0x000055741bf895e5 in std::__invoke_impl<void, void (*)()> (__f=@0x705b035a4348: 0x55741bf84bb0 <buf_flush_page_cleaner()>)
    at /usr/include/c++/9/bits/invoke.h:60
#8  0x000055741bf89591 in std::__invoke<void (*)()> (__fn=@0x705b035a4348: 0x55741bf84bb0 <buf_flush_page_cleaner()>)
    at /usr/include/c++/9/bits/invoke.h:95
#9  0x000055741bf89532 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x705b035a4348) at /usr/include/c++/9/thread:244
#10 0x000055741bf89504 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x705b035a4348) at /usr/include/c++/9/thread:251
#11 0x000055741bf894e4 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x705b035a4340)
    at /usr/include/c++/9/thread:195
#12 0x00007fb567642de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x0000056114927609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#14 0x000055741ab6c293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) when
Current event: 1333838
 
 
 
 
 
(rr) t 16
[Switching to thread 16 (Thread 1178881.1296841)]
#0  0x0000000070000002 in syscall_traced ()
(rr) where
#0  0x0000000070000002 in syscall_traced ()
#1  0x00000ed43c97c1a4 in _raw_syscall () at /home/roc/rr/rr/src/preload/raw_syscall.S:120
#2  0x00000ed43c9772ce in traced_raw_syscall (call=<optimized out>) at /home/roc/rr/rr/src/preload/syscallbuf.c:278
#3  0x00000ed43c97b0c3 in sys_fcntl (call=<optimized out>) at /home/roc/rr/rr/src/preload/syscallbuf.c:1529
#4  syscall_hook_internal (call=0x7fb561a7cfa0) at /home/roc/rr/rr/src/preload/syscallbuf.c:3293
#5  syscall_hook (call=0x7fb561a7cfa0) at /home/roc/rr/rr/src/preload/syscallbuf.c:3454
#6  0x00000ed43c9770b0 in _syscall_hook_trampoline () at /home/roc/rr/rr/src/preload/syscall_hook.S:313
#7  0x00000ed43c97710f in __morestack () at /home/roc/rr/rr/src/preload/syscall_hook.S:458
#8  0x00000ed43c977116 in _syscall_hook_trampoline_48_3d_01_f0_ff_ff () at /home/roc/rr/rr/src/preload/syscall_hook.S:472
#9  0x000055741ab658a3 in syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#10 0x000055741bd85e97 in binary_semaphore::wait (this=0x7fb566aa96d0) at /data/Server/10.6X/storage/innobase/log/log0sync.cc:127
#11 0x000055741bd86376 in group_commit_lock::acquire (this=0x55741d67bc00 <flush_lock>, num=80006576, callback=0x0)
    at /data/Server/10.6X/storage/innobase/log/log0sync.cc:273
#12 0x000055741bd57652 in log_write_up_to (lsn=80006576, flush_to_disk=true, rotate_key=false, callback=0x0)
    at /data/Server/10.6X/storage/innobase/log/log0log.cc:828
#13 0x000055741bff0b1a in fil_delete_tablespace (id=88) at /data/Server/10.6X/storage/innobase/fil/fil0fil.cc:1692
#14 0x000055741bfe604c in trx_t::commit (this=0x368d26821780, deleted=std::vector of length 0, capacity 0)
    at /data/Server/10.6X/storage/innobase/dict/drop.cc:274
#15 0x000055741bcc9fc4 in commit_unlock_and_unlink (trx=0x368d26821780) at /data/Server/10.6X/storage/innobase/handler/handler0alter.cc:4180
#16 0x000055741bcf20d0 in rollback_inplace_alter_table (ha_alter_info=0x7fb566aa5330, table=0x362b3027a7e8, prebuilt=0x362b302a0f30)
    at /data/Server/10.6X/storage/innobase/handler/handler0alter.cc:8854
#17 0x000055741bce0ad6 in ha_innobase::commit_inplace_alter_table (this=0x362b302af650, altered_table=0x7fb566aa53f0, 
    ha_alter_info=0x7fb566aa5330, commit=false) at /data/Server/10.6X/storage/innobase/handler/handler0alter.cc:10863
#18 0x000055741b8e0dde in handler::ha_commit_inplace_alter_table (this=0x362b302af650, altered_table=0x7fb566aa53f0, 
    ha_alter_info=0x7fb566aa5330, commit=false) at /data/Server/10.6X/sql/handler.cc:5219
#19 0x000055741b654aa0 in mysql_inplace_alter_table (thd=0x362b300206a8, table_list=0x362b3002f1f0, table=0x362b3027a7e8, 
    altered_table=0x7fb566aa53f0, ha_alter_info=0x7fb566aa5330, target_mdl_request=0x7fb566aa5c30, ddl_log_state=0x7fb566aa52f0, 
    trigger_param=0x7fb566aa57e0, alter_ctx=0x7fb566aa6790) at /data/Server/10.6X/sql/sql_table.cc:7570
#20 0x000055741b65cdf6 in mysql_alter_table (thd=0x362b300206a8, new_db=0x362b300250a8, new_name=0x362b300254c0, create_info=0x7fb566aa7580, 
    table_list=0x362b3002f1f0, alter_info=0x7fb566aa7490, order_num=0, order=0x0, ignore=false, if_exists=false)
    at /data/Server/10.6X/sql/sql_table.cc:10278
#21 0x000055741b706cb6 in Sql_cmd_alter_table::execute (this=0x362b3002f920, thd=0x362b300206a8) at /data/Server/10.6X/sql/sql_alter.cc:542
#22 0x000055741b564e23 in mysql_execute_command (thd=0x362b300206a8, is_called_from_prepared_stmt=false)
    at /data/Server/10.6X/sql/sql_parse.cc:5996
#23 0x000055741b56ab3c in mysql_parse (thd=0x362b300206a8, 
    rawbuf=0x362b3002f0c0 "ALTER TABLE `t3` ENCRYPTED=YES /* E_R Thread7 QNO 14 CON_ID 33 */", length=65, parser_state=0x7fb566aa8580)
--Type <RET> for more, q to quit, c to continue without paging--
    at /data/Server/10.6X/sql/sql_parse.cc:8029
#24 0x000055741b557741 in dispatch_command (command=COM_QUERY, thd=0x362b300206a8, 
    packet=0x362b30027049 "ALTER TABLE `t3` ENCRYPTED=YES /* E_R Thread7 QNO 14 CON_ID 33 */ ", packet_length=66, blocking=true)
    at /data/Server/10.6X/sql/sql_parse.cc:1896
#25 0x000055741b556290 in do_command (thd=0x362b300206a8, blocking=true) at /data/Server/10.6X/sql/sql_parse.cc:1409
#26 0x000055741b6fc792 in do_handle_one_connection (connect=0x48fc7a172168, put_in_cache=true) at /data/Server/10.6X/sql/sql_connect.cc:1418
#27 0x000055741b6fc427 in handle_one_connection (arg=0x48fc7a172168) at /data/Server/10.6X/sql/sql_connect.cc:1312
#28 0x0000056114927609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#29 0x000055741ab6c293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) when
Current event: 1333838

Reassigning this issue to marko like he suggested.

Comment by Marko Mäkelä [ 2022-06-17 ]

Thanks to further debugging, the problem appears to be insufficient mutual exclusion between log checkpoint and fil_delete_tablespace(). I think that this race condition may have been caused by MDEV-21534.

In this particular execution, the log checkpoint violated ACID by essentially deleting a FILE_DELETE record that had been written earlier. Had the server been killed before the checkpoint was updated, the FILE_DELETE record would have been processed by crash recovery and the file would have been deleted as expected.

If such a race condition is possible also with regard to FILE_RENAME and renaming files, that should explain failures of the test atomic.rename_table (MDEV-27111).

Comment by Marko Mäkelä [ 2022-06-20 ]

It looks like fil_delete_tablespace() must employ some logic similar to mtr_t::commit_shrink() that is part of innodb_undo_log_truncate. Possibly something like this:

log_write_and_flush_prepare();
// write the FILE_DELETE record
fil_system.named_spaces.remove(*space);
// …
mysql_mutex_lock(&log_sys.flush_order_mutex);
log_write_and_flush();

Something similar is necessary around fil_name_write_rename() as well, to ensure that a log checkpoint may not occur between the write of a FILE_RENAME record and the actual rename operation in the file system. Only after the file has been deleted or renamed in the file system, we may allow a log checkpoint to discard the record.

Comment by Marko Mäkelä [ 2022-06-20 ]

In addition to being similar to MDEV-27711, this bug is also similar to MDEV-27111 in the sense that the test atomic.rename_table would still occasionally fail. I repeated a failure of that test without my tentative fix, and did not repeat any failure with the fix.

Comment by Marko Mäkelä [ 2022-06-21 ]

I ended up not fixing this in 10.5, because this affects the recovery DDL operations, and those operations were not crash-safe before 10.6.

Comment by Jörn Wagner [ 2022-08-15 ]

Hi,

I'm currently affected by this bug as it seems. Running 10.6.8. After a system restart, my local MariaDB instance will not start up with the aforementioned error. Will this fix only prevent the error from happening in the future or will it also fix a broken recovery log? Do I need to rebuild my whole database?

Comment by Marko Mäkelä [ 2022-08-15 ]

The recently released MariaDB Server 10.6.9 includes a number of recovery fixes, such as MDEV-28923. The bug that we reproduced internally occurred due to a race condition in the write side, and it cannot be easily worked around in recovery. If 10.6.9 cannot start up with that data directory, one thing that could still be done to work around this bug would be to remove the latest log checkpoint from the redo log file (to let the server recover from the previous log checkpoint).

Comment by Jörn Wagner [ 2022-08-15 ]

Unfortunately, I do not see MariaDB 10.6.9 available for download. https://jira.mariadb.org/projects/MDEV/versions/27507 says it's unreleased but all issues are done. https://mariadb.com/kb/en/mariadb-1069-changelog/ also states "There are currently no official packages or binaries available for download which contain the features". Where can we find or when can we expect binaries for that version?

Comment by Daniel Black [ 2022-09-18 ]

Release planning is on this jira page.

Comment by Marko Mäkelä [ 2023-09-11 ]

One more reason for these symptoms could be that multiple instances of InnoDB were running on the same data files. This would be a regression due to implementing MDEV-24393. This regression was fixed in MDEV-28495 and MDEV-31568, by ensuring that an advisory lock on the InnoDB system tablespace will always be held.

Generated at Thu Feb 08 10:04:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.