Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18529

InnoDB wrongly skips decryption of encrypted page if unencrypted and post-encrypted checksums match

    Details

      Description

      When data-at-rest encryption is in use, InnoDB currently writes two checksums to the page--a checksum for the contents of the unencrypted page and a checksum for the contents of the post-encrypted page. There have been cases where these two checksums for a single page have had the same value. In those cases, InnoDB would wrongly assume that the page could not be encrypted, so it would not decrypt the page. If the page actually was encrypted, then the server would likely crash and the error log would contain error messages like the following:

      2018-11-20 1:21:42 139881111836416 [ERROR] InnoDB: Page 66856:5253952 may be corrupted. Post encryption checksum 474116392 stored [474116392:474116392] key_version 1
      2018-11-20 01:21:42 0x7f389bfb5700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/storage/innobase/rem/rem0rec.cc line 580
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mysqld startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
      InnoDB: about forcing recovery.
      181120 1:21:42 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.2.12-MariaDB-log
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=520
      max_threads=3102
      thread_count=389
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3832659 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7f369421b9f8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f389bfb4e18 thread_stack 0x49000
      /usr/bin/mysqld(my_print_stacktrace+0x2e)[0xde610e]
      /usr/bin/mysqld(handle_fatal_signal+0x471)[0x7de601]
      /lib64/libpthread.so.0(+0xf7e0)[0x7f5bba7577e0]
      /lib64/libc.so.6(gsignal+0x35)[0x7f5bb9618495]
      /lib64/libc.so.6(abort+0x175)[0x7f5bb9619c75]
      /usr/bin/mysqld[0xbafc13]
      /usr/bin/mysqld[0xadf400]
      /usr/bin/mysqld[0xabb063]
      /usr/bin/mysqld[0xbe14a6]
      /usr/bin/mysqld[0xb337d0]
      /usr/bin/mysqld[0xa43956]
      /usr/bin/mysqld(_ZN7handler17ha_index_read_mapEPhPKhm16ha_rkey_function+0x99)[0x7ea149]
      /usr/bin/mysqld(_ZN7handler16read_range_firstEPK12st_key_rangeS2_bb+0xbb)[0x7ea3fb]
      /usr/bin/mysqld(_ZN7handler21multi_range_read_nextEPPv+0x95)[0x7216b5]
      /usr/bin/mysqld(_ZN23Mrr_simple_index_reader8get_nextEPPv+0x47)[0x721767]
      /usr/bin/mysqld(_ZN10DsMrr_impl10dsmrr_nextEPPv+0x2f)[0x72112f]
      /usr/bin/mysqld(_ZN18QUICK_RANGE_SELECT8get_nextEv+0x32)[0x93d6b2]
      /usr/bin/mysqld[0x960653]
      /usr/bin/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x147)[0x6368c7]
      /usr/bin/mysqld(_ZN4JOIN10exec_innerEv+0xd0f)[0x64ee7f]
      /usr/bin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x1fc)[0x64d67c]
      /usr/bin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x254)[0x64f294]
      /usr/bin/mysqld[0x5e71c2]
      /usr/bin/mysqld(_Z21mysql_execute_commandP3THD+0x187c)[0x5eab4c]
      /usr/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x3a2)[0x5f4272]
      /usr/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x26f0)[0x5f7190]
      /usr/bin/mysqld(_Z10do_commandP3THD+0x15d)[0x5f84cd]
      /usr/bin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x235)[0x6dc865]
      /usr/bin/mysqld(handle_one_connection+0x3f)[0x6dca2f]
      /usr/bin/mysqld[0xa054a9]
      /lib64/libpthread.so.0(+0x7aa1)[0x7f5bba74faa1]
      

      Here is an analysis from Marko Mäkelä:


      The error message was originally introduced in MDEV-11759 (MariaDB 10.1.22, 10.2.5) and slightly changed in MDEV-11939 (MariaDB 10.1.26, 10.2.8). (The original wording was "Page %lu in space %s (%lu) maybe corrupted."

      The presence of this error message in an error log spells big trouble. The logic that was introduced in MDEV-11759 was based on the wrong assumption that a data page whose contents appears to carry a valid ‘before encryption’ checksum cannot have been encrypted. Then, MariaDB would wrongly skip the page decryption and hand over the garbled page contents to the caller of buf_page_get_gen(), which would then crash basically anywhere. My debugging session in MariaDB Server 10.1 showed that buf_page_get_gen() would return NULL in this case. Some of its callers are prepared for that error situation.

      To make matters worse, there are several checksum algorithm implementations in InnoDB. It sufficed for one of the various algorithms to produce a matching checksum. MDEV-17957 and MDEV-17958 improved innodb_checksum_algorithm=strict_crc32 so that only one checksum will be attempted.

      The data-at-rest encryption in MariaDB was developed before I joined the company. I have identified and fixed several problems with it in the past. This issue revealed more problems, and I believe that there is not much more that we could fix without changing the design and the file format.

      I believe that the biggest challenge is the design that allows encryption to be enabled or disabled on data files, one page at a
      time. Because of this, a data file can contain encrypted or unencrypted pages.

      To make things even more challenging, InnoDB did not initialize unused data fields in buffer pool pages, until MySQL 5.1.48 where I finally
      added the initialization due to an external bug report. I had identified and pointed out this problem years earlier, but Heikki Tuuri (the author of InnoDB) thought that initializing memory could reduce performance.

      In MariaDB data-at-rest encryption, the unused (and possibly uninitialized) fields were reused for encryption key version and post-encryption checksum. These fields would usually be zero in unencrypted data files, but they could sometimes be nonzero due to the missing initialization, if an unencrypted page was originally created before MySQL 5.1.48.

      I think that MariaDB should use only one checksum variant in the future, and should not allow mixing unencrypted and encrypted pages in
      a data file. This will be complicated a little due to supporting upgrades from older versions. For each data file separately, we should keep track of which scheme is being used. New files would be created in the clean way. Perhaps at the same time with this, we could improve the way how key rotation works.

      Today, I debugged what exactly is going on. My test creates a fake encrypted page (mostly initialized with NUL bytes, and containing valid checksum). I wrote the same checksum value both to the 2 "before encryption" and the 1 "after encryption" fields. This caused fil_space_verify_crypt_checksum() to identify the page as not encrypted:

      if (encrypted && valid) {
      /* [snip "may be corrupted" message] */
      encrypted = false;
      }
      

      I started with a recent 10.2 revision that is before any of the fixes:

      git checkout 447e4931795a0ae9525005e8fb37bb7347d8ae52
      git checkout 610e4034d7ba34c6b4cba1e1b5faf45eba835498 \
      mysql-test/suite/innodb/include/crc32.pl \
      mysql-test/suite/mariabackup/encrypted_page_corruption.{result,test}
       
      patch -p1<<'EOF'
      diff --git a/mysql-test/suite/mariabackup/encrypted_page_corruption.test b/mysql-test/suite/mariabackup/encrypted_page_corruption.test
      index 12aabc5cfd9..479281285e9 100644
      --- a/mysql-test/suite/mariabackup/encrypted_page_corruption.test
      +++ b/mysql-test/suite/mariabackup/encrypted_page_corruption.test
      @@ -49,7 +49,7 @@
      let $backuplog=$MYSQLTEST_VARDIR/tmp/backup.log;
       
      --disable_result_log
      ---error 1
      +--error 0,1
      exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf --backup --target-dir=$targetdir > $backuplog;
      --enable_result_log
      EOF
      

      The test is primarily checking that mariabackup will detect the corruption. Because I am running it against older revisions, we have to relax that part of the test. The interesting part is what happens when the MariaDB server is restarted and DROP TABLE is executed.

      With commit 447e4931795a0ae9525005e8fb37bb7347d8ae52, there is no attempt to decrypt the page. fil_space_verify_crypt_checksum() and buf_page_decrypt_after_read() will return false. buf_page_io_complete() will notice this, set the flag table->file_unreadable, and finally
      return DB_DECRYPTION_FAILED. Finally, after buf_page_get_gen() has exhausted its 100 retries to read the page, it will return NULL to the caller. The caller in DROP TABLE is prepared for that return value (not all code paths are; that will be eventually fixed in MDEV-13542), and the DROP TABLE operation on the corrupted table succeeds. My test corrupted the root page of the table.

      Then, I moved to test a later commit 7d245083a43e34d94822e580037727bdbb50b6f0, which I believe fixed this issue. It made fil_space_verify_crypt_checksum() only check the post-encryption checksum. The callers would ensure that the page carries the correct checksum after decryption. With this change, fil_space_verify_crypt_checksum() will return true (because the post-encryption checksum matches) without ever issuing any "may be corrupted" message, and buf_page_decrypt_after_read() will invoke fil_space_decrypt() and buf_page_check_corrupt(). In my test case, the decrypted data will be gibberish, and an error message will be output:

      2018-12-20 10:06:38 140737288554240 [ERROR] InnoDB: The page [page id: space=4, page number=3] in file './test/t1.ibd' cannot be decrypted.
      

      After this, DB_DECRYPTION_FAILED will be returned all the way up to buf_page_get_gen(), which will retry the reads, and finally return NULL to the caller.

      Until my fix of MDEV-13103 (MariaDB 10.2.16 and later), the function buf_page_io_complete() ignored the return value of buf_page_decrypt_after_read(). Let us next see what would happen there:

      git checkout 2ca904f0ca981666b90151ff3fcdad4307b96a4d~
      

      In this version, buf_page_decrypt_after_read() invokes fil_space_verify_crypt_checksum(), which will wrongly issue the
      "may be corrupted" and return false as noted above. Due to this, buf_page_decrypt_after_read() will return false without invoking fil_space_decrypt() to decrypt the page.

      Because no decryption was performed, the page contents will still match the "before encryption" checksum, and DB_SUCCESS will be returned to the caller, even though we kind-of decided that the page is corrupted.

      Finally, I tested the version that you are running, and saw exactly the same behavior:

      git checkout mariadb-10.2.12
      

      I think that with this, we have an exact explanation what must have happened with the problematic page 5253952. The MariaDB 10.2.12 server never attempted to decrypt it, and it returned the gibberish page contents to the caller of buf_page_get_gen().

      Some corruption could also be attributed to the sloppy checksum validation in Mariabackup, which may have caused it to copy inconsistent pages as they were being simultaneously written by the running MariaDB server.

      For this particular page, I think that it is possible that the page is actually valid (after all, the new Mariabackup did not report any checksum mismatch after decrypting the page), and the problem was that the server wrongly skipped the decryption step, and returned the gibberish page to the caller without flagging any corruption. This should now work in MariaDB 10.2.20.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                marko Marko Mäkelä
                Reporter:
                GeoffMontee Geoff Montee
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: