Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16866

InnoDB fails to start upon crash recovery with "[ERROR] InnoDB: Redo log crypto: failed to decrypt log block"

Details

    Description

      The initial test flow was this:

      • start 10.1.34 server with the following options:

        `
        --no-defaults --file-key-management --file-key-management-filename=keys.txt --plugin-load-add=file_key_management.so --innodb-encrypt-tables --innodb-encrypt-log --innodb-encryption-threads=4 --aria-encrypt-tables=1 --encrypt-tmp-disk-tables=1 --encrypt-binlog --innodb-page-size=8K --innodb-compression-algorithm=none --loose-innodb-file-format=Barracuda --loose-innodb-file-per-table=1 --loose-max-statement-time=20 --loose-lock-wait-timeout=20 --loose-innodb-lock-wait-timeout=10 --loose-innodb_log_compressed_pages=on --loose-innodb-change-buffering=none
        

      • run a mix of DDL/DML on the server for a few minutes;
      • SIGKILL the server;
      • start the same server with the exact same options and additionally --innodb-force-recovery=3

      InnoDB doesn't start:

      10.1.34

      2018-07-31 16:08:31 140279075214848 [Note] InnoDB:  innodb-page-size has been changed from the default value 16384 to 8192 .
      2018-07-31 16:08:31 140279075214848 [Note] InnoDB: Using mutexes to ref count buffer pool pages
      2018-07-31 16:08:31 140279075214848 [Note] InnoDB: The InnoDB memory heap is disabled
      2018-07-31 16:08:31 140279075214848 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2018-07-31 16:08:31 140279075214848 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
      2018-07-31 16:08:31 140279075214848 [Note] InnoDB: Compressed tables use zlib 1.2.3
      2018-07-31 16:08:31 140279075214848 [Note] InnoDB: Using Linux native AIO
      2018-07-31 16:08:31 140279075214848 [Note] InnoDB: Using SSE crc32 instructions
      2018-07-31 16:08:31 140279075214848 [Note] InnoDB: Initializing buffer pool, size = 128.0M
      2018-07-31 16:08:31 140279075214848 [Note] InnoDB: Completed initialization of buffer pool
      2018-07-31 16:08:31 140279075214848 [Note] InnoDB: Highest supported file format is Barracuda.
      2018-07-31 16:08:31 140279075214848 [Note] InnoDB: Starting crash recovery from checkpoint LSN=23790689
      2018-07-31 16:08:31 140279075214848 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer...
      InnoDB: redo log checkpoint: 0 [ chk key ]: [ 4 1 ] 
      2018-07-31 16:08:32 140279075214848 [ERROR] InnoDB: Redo log crypto: failed to decrypt log block. Reason could be that requested key version is not found, required encryption key management plugin is not found or configured encryption algorithm and/or method does not match.
      2018-07-31 16:08:32 140279075214848 [ERROR] Plugin 'InnoDB' init function returned error.
      2018-07-31 16:08:32 140279075214848 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
      2018-07-31 16:08:32 140279075214848 [Note] Plugin 'FEEDBACK' is disabled.
      2018-07-31 16:08:32 140279075214848 [ERROR] Unknown/unsupported storage engine: InnoDB
      2018-07-31 16:08:32 140279075214848 [ERROR] Aborting
       
      Error in my_thread_global_end(): 1 threads didn't exit
      

      The key file is present and is the same for both starts.

      The datadir and the key file are available, nothing is confidential, the datadir just doesn't fit into JIRA.

      Attachments

        Activity

          The following has been observed on 10.2.16, might be related?

          10.2.16

          2018-08-05 18:20:13 139628292781952 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
          2018-08-05 18:20:13 139628292781952 [Note] InnoDB: Completed initialization of buffer pool
          2018-08-05 18:20:13 139627563095808 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
          2018-08-05 18:20:13 139628292781952 [Note] InnoDB: Highest supported file format is Barracuda.
          2018-08-05 18:20:13 139628292781952 [Note] InnoDB: Starting crash recovery from checkpoint LSN=34668996
          2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=16] from the doublewrite buffer.
          2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=17] from the doublewrite buffer.
          2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=18] from the doublewrite buffer.
          2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=19] from the doublewrite buffer.
          2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=20] from the doublewrite buffer.
          2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=21] from the doublewrite buffer.
          2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=22] from the doublewrite buffer.
          2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=23] from the doublewrite buffer.
          2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=24] from the doublewrite buffer.
          2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=25] from the doublewrite buffer.
          2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=26] from the doublewrite buffer.
          2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=27] from the doublewrite buffer.
          2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Starting final batch to recover 1576 pages from redo log.
          2018-08-05 18:20:14 139628036220672 [ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=14, page number=1], should be [page id: space=16, page number=16]
          2018-08-05 18:20:14 139628036220672 [ERROR] InnoDB: The page [page id: space=16, page number=16] in file './test/t004.ibd' cannot be decrypted.
          2018-08-05 18:20:14 139628036220672 [Note] InnoDB: However key management plugin or used key_version 1 is not found or used encryption algorithm or method does not match.
          2018-08-05 18:20:14 139628036220672 [Note] InnoDB: Marking tablespace as missing. You may drop this table or install correct key management plugin and key file.
          2018-08-05 18:20:14 0x7efdaf7fe700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/storage/innobase/rem/rem0rec.cc line 580
          InnoDB: We intentionally generate a memory trap.
          InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
          InnoDB: If you get repeated assertion failures or crashes, even
          InnoDB: immediately after the mysqld startup, there may be
          InnoDB: corruption in the InnoDB tablespace. Please refer to
          InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/
          InnoDB: about forcing recovery.
          180805 18:20:14 [ERROR] mysqld got signal 6 ;
          This could be because you hit a bug. It is also possible that this binary
          or one of the libraries it was linked against is corrupt, improperly built,
          or misconfigured. This error can also be caused by malfunctioning hardware.
           
          To report this bug, see https://mariadb.com/kb/en/reporting-bugs
           
          We will try our best to scrape up some info that will hopefully help
          diagnose the problem, but since we have already crashed, 
          something is definitely wrong and this may fail.
           
          Server version: 10.2.16-MariaDB-log
          key_buffer_size=134217728
          read_buffer_size=131072
          max_used_connections=0
          max_threads=153
          thread_count=0
          It is possible that mysqld could use up to 
          key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467246 K  bytes of memory
          Hope that's ok; if not, decrease some variables in the equation.
           
          Thread pointer: 0x0
          Attempting backtrace. You can use the following information to find out
          where mysqld died. If you see no messages after this, something went
          terribly wrong...
          stack_bottom = 0x0 thread_stack 0x49000
          mysys/stacktrace.c:268(my_print_stacktrace)[0x557774f9c9ab]
          sql/signal_handler.cc:168(handle_fatal_signal)[0x557774a6df65]
          /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7efdbde42330]
          /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7efdbd495c37]
          /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7efdbd499028]
          ut/ut0rbt.cc:461(rbt_eject_node)[0x55777483d88a]
          rem/rem0rec.cc:580(rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long, mem_block_info_t**))[0x557774cb1686]
          page/page0cur.cc:2267(page_cur_parse_delete_rec(unsigned char*, unsigned char*, buf_block_t*, dict_index_t*, mtr_t*))[0x557774c915d9]
          log/log0recv.cc:1480(recv_parse_or_apply_log_rec_body)[0x557774c70912]
          log/log0recv.cc:1900(recv_recover_page(bool, buf_block_t*))[0x557774c7290d]
          buf/buf0buf.cc:6198(buf_page_io_complete(buf_page_t*, bool, bool))[0x557774db095c]
          fil/fil0fil.cc:5291(fil_aio_wait(unsigned long))[0x557774e1fc62]
          srv/srv0start.cc:334(io_handler_thread)[0x557774d21a98]
          /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7efdbde3a184]
          /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7efdbd55cffd]
          The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
          information that should help you find out what is causing the crash.
          Writing a core file at /home/travis/logs/current1_1/data/
          Aborted (core dumped)
          

          elenst Elena Stepanova added a comment - The following has been observed on 10.2.16, might be related? 10.2.16 2018-08-05 18:20:13 139628292781952 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M 2018-08-05 18:20:13 139628292781952 [Note] InnoDB: Completed initialization of buffer pool 2018-08-05 18:20:13 139627563095808 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2018-08-05 18:20:13 139628292781952 [Note] InnoDB: Highest supported file format is Barracuda. 2018-08-05 18:20:13 139628292781952 [Note] InnoDB: Starting crash recovery from checkpoint LSN=34668996 2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=16] from the doublewrite buffer. 2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=17] from the doublewrite buffer. 2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=18] from the doublewrite buffer. 2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=19] from the doublewrite buffer. 2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=20] from the doublewrite buffer. 2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=21] from the doublewrite buffer. 2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=22] from the doublewrite buffer. 2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=23] from the doublewrite buffer. 2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=24] from the doublewrite buffer. 2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=25] from the doublewrite buffer. 2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=26] from the doublewrite buffer. 2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Recovered page [page id: space=16, page number=27] from the doublewrite buffer. 2018-08-05 18:20:14 139628292781952 [Note] InnoDB: Starting final batch to recover 1576 pages from redo log. 2018-08-05 18:20:14 139628036220672 [ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=14, page number=1], should be [page id: space=16, page number=16] 2018-08-05 18:20:14 139628036220672 [ERROR] InnoDB: The page [page id: space=16, page number=16] in file './test/t004.ibd' cannot be decrypted. 2018-08-05 18:20:14 139628036220672 [Note] InnoDB: However key management plugin or used key_version 1 is not found or used encryption algorithm or method does not match. 2018-08-05 18:20:14 139628036220672 [Note] InnoDB: Marking tablespace as missing. You may drop this table or install correct key management plugin and key file. 2018-08-05 18:20:14 0x7efdaf7fe700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/storage/innobase/rem/rem0rec.cc line 580 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/ InnoDB: about forcing recovery. 180805 18:20:14 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware.   To report this bug, see https://mariadb.com/kb/en/reporting-bugs   We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail.   Server version: 10.2.16-MariaDB-log key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=0 max_threads=153 thread_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467246 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.   Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x49000 mysys/stacktrace.c:268(my_print_stacktrace)[0x557774f9c9ab] sql/signal_handler.cc:168(handle_fatal_signal)[0x557774a6df65] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7efdbde42330] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7efdbd495c37] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7efdbd499028] ut/ut0rbt.cc:461(rbt_eject_node)[0x55777483d88a] rem/rem0rec.cc:580(rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long, mem_block_info_t**))[0x557774cb1686] page/page0cur.cc:2267(page_cur_parse_delete_rec(unsigned char*, unsigned char*, buf_block_t*, dict_index_t*, mtr_t*))[0x557774c915d9] log/log0recv.cc:1480(recv_parse_or_apply_log_rec_body)[0x557774c70912] log/log0recv.cc:1900(recv_recover_page(bool, buf_block_t*))[0x557774c7290d] buf/buf0buf.cc:6198(buf_page_io_complete(buf_page_t*, bool, bool))[0x557774db095c] fil/fil0fil.cc:5291(fil_aio_wait(unsigned long))[0x557774e1fc62] srv/srv0start.cc:334(io_handler_thread)[0x557774d21a98] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7efdbde3a184] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7efdbd55cffd] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. Writing a core file at /home/travis/logs/current1_1/data/ Aborted (core dumped)

          https://travis-ci.org/elenst/travis-tests/jobs/446626672

          10.2 76318d55aa0f5cb2ef5282538d8997b308c48204

          2018-10-26 11:27:50 139718894253824 [ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=48, page number=1], should be [page id: space=46, page number=30]
          

          elenst Elena Stepanova added a comment - https://travis-ci.org/elenst/travis-tests/jobs/446626672 10.2 76318d55aa0f5cb2ef5282538d8997b308c48204 2018-10-26 11:27:50 139718894253824 [ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=48, page number=1], should be [page id: space=46, page number=30]

          The problem that was initially reported for 10.1 appears to be that decrypting redo log fails.
          Originally, the redo log encryption key rotation was broken, and it was disabled in 10.1.14 by MDEV-9840 and finally enabled in 10.4 by MDEV-12041.

          The 10.2 problem could be duplicating MDEV-12699.

          I would like to see the data set for 10.1.

          marko Marko Mäkelä added a comment - The problem that was initially reported for 10.1 appears to be that decrypting redo log fails. Originally, the redo log encryption key rotation was broken, and it was disabled in 10.1.14 by MDEV-9840 and finally enabled in 10.4 by MDEV-12041 . The 10.2 problem could be duplicating MDEV-12699 . I would like to see the data set for 10.1.

          The datadir for the original 10.1 report is here: ftp://ftp.askmonty.org/public/mdev16866.tar.gz

          The archive contains:

          • mdev16866/data – the datadir
          • mdev16866/keys.txt – the key file
          • cmd_options – command line options (for the reference)

          Unpack into the basedir and start the current 10.1 server with

          --datadir=`pwd`/mdev16866/data --file-key-management-filename=`pwd`/mdev16866/keys.txt --general-log --server-id=111 --file-key-management --plugin-load-add=file_key_management.so --innodb-encrypt-tables --innodb-encrypt-log --innodb-encryption-threads=4 --aria-encrypt-tables=1 --encrypt-tmp-disk-tables=1 --encrypt-binlog --innodb-page-size=8K --innodb-compression-algorithm=none --loose-innodb-file-format=Barracuda --loose-innodb-file-per-table=1 --loose-max-statement-time=20 --loose-lock-wait-timeout=20 --loose-innodb-lock-wait-timeout=10 --loose-innodb_log_compressed_pages=on --loose-innodb-change-buffering=none --innodb-force-recovery=3
          

          Observe the error:

          2018-10-29 13:42:06 139971190631360 [Note] InnoDB: Starting crash recovery from checkpoint LSN=23790689
          2018-10-29 13:42:06 139971190631360 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer...
          InnoDB: redo log checkpoint: 0 [ chk key ]: [ 4 1 ] 
          2018-10-29 13:42:08 139971190631360 [ERROR] InnoDB: Redo log crypto: failed to decrypt log block. Reason could be that requested key version is not found, required encryption key management plugin is not found or configured encryption algorithm and/or method does not match.
          2018-10-29 13:42:08 139971190631360 [ERROR] Plugin 'InnoDB' init function returned error.
          2018-10-29 13:42:08 139971190631360 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
          2018-10-29 13:42:08 139971190631360 [Note] Plugin 'FEEDBACK' is disabled.
          2018-10-29 13:42:08 139971190631360 [ERROR] Unknown/unsupported storage engine: InnoDB
          2018-10-29 13:42:08 139971190631360 [ERROR] Aborting
          

          elenst Elena Stepanova added a comment - The datadir for the original 10.1 report is here: ftp://ftp.askmonty.org/public/mdev16866.tar.gz The archive contains: mdev16866/data – the datadir mdev16866/keys.txt – the key file cmd_options – command line options (for the reference) Unpack into the basedir and start the current 10.1 server with --datadir=`pwd`/mdev16866/data --file-key-management-filename=`pwd`/mdev16866/keys.txt --general-log --server-id=111 --file-key-management --plugin-load-add=file_key_management.so --innodb-encrypt-tables --innodb-encrypt-log --innodb-encryption-threads=4 --aria-encrypt-tables=1 --encrypt-tmp-disk-tables=1 --encrypt-binlog --innodb-page-size=8K --innodb-compression-algorithm=none --loose-innodb-file-format=Barracuda --loose-innodb-file-per-table=1 --loose-max-statement-time=20 --loose-lock-wait-timeout=20 --loose-innodb-lock-wait-timeout=10 --loose-innodb_log_compressed_pages=on --loose-innodb-change-buffering=none --innodb-force-recovery=3 Observe the error: 2018-10-29 13:42:06 139971190631360 [Note] InnoDB: Starting crash recovery from checkpoint LSN=23790689 2018-10-29 13:42:06 139971190631360 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer... InnoDB: redo log checkpoint: 0 [ chk key ]: [ 4 1 ] 2018-10-29 13:42:08 139971190631360 [ERROR] InnoDB: Redo log crypto: failed to decrypt log block. Reason could be that requested key version is not found, required encryption key management plugin is not found or configured encryption algorithm and/or method does not match. 2018-10-29 13:42:08 139971190631360 [ERROR] Plugin 'InnoDB' init function returned error. 2018-10-29 13:42:08 139971190631360 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2018-10-29 13:42:08 139971190631360 [Note] Plugin 'FEEDBACK' is disabled. 2018-10-29 13:42:08 139971190631360 [ERROR] Unknown/unsupported storage engine: InnoDB 2018-10-29 13:42:08 139971190631360 [ERROR] Aborting

          I can able to repeat the issue from the given data directory. It looks like the recovery finds a zero-filled block and fails to decrypt it.
          In 10.2, log_group_read_log_seg() has the condition check to whether the log block number matches with corresponding LSN.
          That could happen when the server was killed while writing the redo log.

          thiru Thirunarayanan Balathandayuthapani added a comment - I can able to repeat the issue from the given data directory. It looks like the recovery finds a zero-filled block and fails to decrypt it. In 10.2, log_group_read_log_seg() has the condition check to whether the log block number matches with corresponding LSN. That could happen when the server was killed while writing the redo log.

          Looks OK to me.

          marko Marko Mäkelä added a comment - Looks OK to me.

          To be clear, this bug only affected 10.1, and not 10.2+.

          ccalender Chris Calender (Inactive) added a comment - To be clear, this bug only affected 10.1, and not 10.2+.

          People

            thiru Thirunarayanan Balathandayuthapani
            elenst Elena Stepanova
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.