Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32746

SIGSEGV on recovery when using innodb_encrypt_log and PMEM

Details

    Description

      mleich provided an rr replay trace where encryption_crypt() hits SIGSEGV because it is being invoked with *dlen==0:

      #0  0x000055d2158ee6b8 in log_decrypt_buf (iv=iv@entry=0x7ffcb3d01710 "", 
          buf=buf@entry=0x7ffcb3cfd645 "\037\002\347", ' ' <repeats 110 times>, "qul", ' ' <repeats 57 times>, "\220\b\\\b(\a\364\a\300\a\214\aX\a$\006\360\006\274\006\210\006T\006 ", <incomplete sequence \354>..., 
          data=data@entry=0x7f9c1bffc5db "\241\350G\200Y\202\005\341,~\263\321%\371\071\200Y\202\005\341$\201\214\365\262\277>\260\200@v\214P;\302\373\n!.>\270\207\345]\344\301(\333\327NE\032\360\227z\027\215\256}4\005\236F\362\217\220\036\311?\272", len=len@entry=0) at /data/Server/bb-11.2-MDEV-32452/storage/innobase/log/log0crypt.cc:473
      #1  0x000055d2158d8ec3 in recv_ring::copy_if_needed (this=this@entry=0x7ffcb3d01908, iv=iv@entry=0x7ffcb3d01710 "", 
          tmp=tmp@entry=0x7ffcb3cfd640 "\024\200Y\202\005\037\002\347", ' ' <repeats 110 times>, "qul", ' ' <repeats 57 times>, "\220\b\\\b(\a\364\a\300\a\214\aX\a$\006\360\006\274\006\210\006"..., start=..., 
          start@entry=..., len=len@entry=0) at /data/Server/bb-11.2-MDEV-32452/storage/innobase/log/log0recv.cc:2334
      #2  0x000055d2158ebcc0 in recv_sys_t::parse<recv_ring, true> (this=this@entry=0x55d2167df780 <recv_sys>, l=..., if_exists=if_exists@entry=false) at /usr/include/c++/9/bits/stl_tree.h:348
      #3  0x000055d2158ed421 in recv_sys_t::parse_pmem<true> (if_exists=if_exists@entry=false) at /data/Server/bb-11.2-MDEV-32452/storage/innobase/log/log0recv.cc:2211
      #4  0x000055d2158d5f36 in recv_scan_log (last_phase=last_phase@entry=false) at /data/Server/bb-11.2-MDEV-32452/storage/innobase/log/log0recv.cc:4060
      #5  0x000055d2158d6c88 in recv_recovery_from_checkpoint_start () at /data/Server/bb-11.2-MDEV-32452/storage/innobase/log/log0recv.cc:4489
      #6  0x000055d215a30590 in srv_start (create_new_db=<optimized out>) at /data/Server/bb-11.2-MDEV-32452/storage/innobase/srv/srv0start.cc:1523
      #7  0x000055d215836ccf in innodb_init (p=<optimized out>) at /data/Server/bb-11.2-MDEV-32452/storage/innobase/handler/ha_innodb.cc:4166
      

      I was not able to reproduce this crash with the 2 copies of a data directory that I found in the environment. However, I think that the following patch should fix this:

      diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
      index f479428d987..ac4a68a0569 100644
      --- a/storage/innobase/log/log0recv.cc
      +++ b/storage/innobase/log/log0recv.cc
      @@ -2409,7 +2409,7 @@ struct recv_ring : public recv_buf
         {
           const size_t s(*this - start);
           ut_ad(s + len <= srv_page_size);
      -    if (!log_sys.is_encrypted())
      +    if (!len || !log_sys.is_encrypted())
           {
             if (start.ptr + s == ptr && ptr + len <= end())
               return ptr;
      

      A corresponding condition exists in recv_buf::copy_if_needed(). That is, if there is no actual payload in a MDEV-14425 redo log record, nothing needs to be encrypted. The page numbers and file names are never encrypted. For an INIT_PAGE or FREE_PAGE record, we only need to know the page identifier, nothing else.

      Attachments

        Issue Links

          Activity

            mleich, can you please try to reproduce this bug (on the same branch that you were using so far) and test the fix?

            marko Marko Mäkelä added a comment - mleich , can you please try to reproduce this bug (on the same branch that you were using so far) and test the fix?
            mleich Matthias Leich added a comment - - edited

            Replay of the problem with optimized test battery

            • original tree: 4 times, 218 finished RQG tests
            • original tree + patch: never, 1134 finished tests
              Hence I assume the problem is fixed.
            mleich Matthias Leich added a comment - - edited Replay of the problem with optimized test battery original tree: 4 times, 218 finished RQG tests original tree + patch: never, 1134 finished tests Hence I assume the problem is fixed.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.