Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28870

InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT during crash recovery

Details

    Description

      origin/10.6 5bb90cb2ac7b93f40c8ca4672ec7e2c18f0ea238 2022-06-16T10:01:29+03:00
      Scenario:
      1. Start the server and load some initial data.
      2. Several sessions run a DDL/DML mix
      3. During 2. is ongoing the server gets intentional killed (SIGSEGV) 
      4. The restart attempt fails with
      [rr 1379888 3340]2022-06-16 13:05:09 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=80040121,81223111
      [rr 1379888 5048]2022-06-16 13:05:16 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 88
      [rr 1379888 5052]2022-06-16 13:05:16 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1447] with error Data structure corruption
       
      pluto:/data/results/1655382284/TBR-1547
      _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysqld-1
           # Fate of the server till the SIGKILL arrives.
      _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio mysql-2
          # The restart attempt
      pluto:/data/results/1655382284/TBR-1547/1/fbackup
          # File backup of the server datadirectory made after the SIGKILL
          # and before the restart attempt.
       
      RQG
      ===
      # git clone https://github.com/mleich1/rqg --branch experimental RQG
      #
      # GIT_SHOW: HEAD -> experimental, origin/experimental a32979c6d789ddf0982e6a56af687565be00590c 2022-06-14T17:21:56+02:00
      # rqg.pl  : Version 4.0.6 (2022-05)
      #
      # $RQG_HOME/rqg.pl \
      # --grammar=conf/mariadb/innodb_compression_encryption.yy \
      # --gendata=conf/mariadb/innodb_compression_encryption.zz \
      # --max_gd_duration=1800 \
      # --mysqld=--innodb-encrypt-log \
      # --mysqld=--innodb-encrypt-tables \
      # --reporters=CrashRecovery1 \
      # --redefine=conf/mariadb/redefine_innodb_undo.yy \
      # --mysqld=--innodb-immediate-scrub-data-uncompressed=1 \
      # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
      # --mysqld=--loose-idle_write_transaction_timeout=0 \
      # --mysqld=--loose-idle_transaction_timeout=0 \
      # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      # --mysqld=--connect_timeout=60 \
      # --mysqld=--interactive_timeout=28800 \
      # --mysqld=--slave_net_timeout=60 \
      # --mysqld=--net_read_timeout=30 \
      # --mysqld=--net_write_timeout=60 \
      # --mysqld=--loose-table_lock_wait_timeout=50 \
      # --mysqld=--wait_timeout=28800 \
      # --mysqld=--lock-wait-timeout=86400 \
      # --mysqld=--innodb-lock-wait-timeout=50 \
      # --no-mask \
      # --queries=10000000 \
      # --seed=random \
      # --reporters=Backtrace \
      # --reporters=ErrorLog \
      # --reporters=Deadlock1 \
      # --validators=None \
      # --mysqld=--log_output=none \
      # --mysqld=--log_bin_trust_function_creators=1 \
      # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      # --engine=InnoDB \
      # --restart_timeout=240 \
      # --mysqld=--plugin-load-add=file_key_management.so \
      # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      # --mysqld=--plugin-load-add=provider_lzo.so \
      # --mysqld=--plugin-load-add=provider_bzip2.so \
      # --mysqld=--plugin-load-add=provider_lzma.so \
      # --mysqld=--plugin-load-add=provider_snappy.so \
      # --mysqld=--plugin-load-add=provider_lz4.so \
      # --duration=300 \
      # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      # --mysqld=--innodb_file_per_table=1 \
      # --mysqld=--loose-innodb_read_only_compressed=OFF \
      # --mysqld=--loose-innodb-sync-debug \
      # --mysqld=--innodb_stats_persistent=off \
      # --mysqld=--innodb_adaptive_hash_index=on \
      # --mysqld=--log-bin \
      # --mysqld=--sync-binlog=1 \
      # --mysqld=--loose-innodb_evict_tables_on_commit_debug=on \
      # --mysqld=--loose-max-statement-time=30 \
      # --threads=33 \
      # --mysqld=--innodb-use-native-aio=0 \
      # --mysqld=--loose-gdb \
      # --mysqld=--loose-debug-gdb \
      # --rr=Extended \
      # --rr_options=--chaos --wait \
      # --mysqld=--loose_innodb_change_buffering=purges \
      # --mysqld=--innodb_rollback_on_timeout=OFF \
      # --vardir_type=slow \
      # --mysqld=--innodb_page_size=16K \
      # --mysqld=--innodb-buffer-pool-size=256M \
      # --no_mask \
      # <local settings>
      

      Attachments

        Issue Links

          Activity

            jwagner Jörn Wagner added a comment - - edited

            Hi,

            I'm currently affected by this bug as it seems. Running 10.6.8. After a system restart, my local MariaDB instance will not start up with the aforementioned error. Will this fix only prevent the error from happening in the future or will it also fix a broken recovery log? Do I need to rebuild my whole database?

            jwagner Jörn Wagner added a comment - - edited Hi, I'm currently affected by this bug as it seems. Running 10.6.8. After a system restart, my local MariaDB instance will not start up with the aforementioned error. Will this fix only prevent the error from happening in the future or will it also fix a broken recovery log? Do I need to rebuild my whole database?

            The recently released MariaDB Server 10.6.9 includes a number of recovery fixes, such as MDEV-28923. The bug that we reproduced internally occurred due to a race condition in the write side, and it cannot be easily worked around in recovery. If 10.6.9 cannot start up with that data directory, one thing that could still be done to work around this bug would be to remove the latest log checkpoint from the redo log file (to let the server recover from the previous log checkpoint).

            marko Marko Mäkelä added a comment - The recently released MariaDB Server 10.6.9 includes a number of recovery fixes, such as MDEV-28923 . The bug that we reproduced internally occurred due to a race condition in the write side, and it cannot be easily worked around in recovery. If 10.6.9 cannot start up with that data directory, one thing that could still be done to work around this bug would be to remove the latest log checkpoint from the redo log file (to let the server recover from the previous log checkpoint).
            jwagner Jörn Wagner added a comment -

            Unfortunately, I do not see MariaDB 10.6.9 available for download. https://jira.mariadb.org/projects/MDEV/versions/27507 says it's unreleased but all issues are done. https://mariadb.com/kb/en/mariadb-1069-changelog/ also states "There are currently no official packages or binaries available for download which contain the features". Where can we find or when can we expect binaries for that version?

            jwagner Jörn Wagner added a comment - Unfortunately, I do not see MariaDB 10.6.9 available for download. https://jira.mariadb.org/projects/MDEV/versions/27507 says it's unreleased but all issues are done. https://mariadb.com/kb/en/mariadb-1069-changelog/ also states "There are currently no official packages or binaries available for download which contain the features". Where can we find or when can we expect binaries for that version?
            danblack Daniel Black added a comment -

            Release planning is on this jira page.

            danblack Daniel Black added a comment - Release planning is on this jira page .

            One more reason for these symptoms could be that multiple instances of InnoDB were running on the same data files. This would be a regression due to implementing MDEV-24393. This regression was fixed in MDEV-28495 and MDEV-31568, by ensuring that an advisory lock on the InnoDB system tablespace will always be held.

            marko Marko Mäkelä added a comment - One more reason for these symptoms could be that multiple instances of InnoDB were running on the same data files. This would be a regression due to implementing MDEV-24393 . This regression was fixed in MDEV-28495 and MDEV-31568 , by ensuring that an advisory lock on the InnoDB system tablespace will always be held.

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.