Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15764

InnoDB may write uninitialized garbage to redo log

Details

    Description

      According to Valgrind, InnoDB is writing uninitialized data to redo log:

      ./mtr --valgrind sql_sequence.next
      

      Stack traces vary, but they have log_write_up_to() in common:

      10.3 19bb7fdcd6e68aca6e41a2e30ffb3c2ad1d14cb6

      sql_sequence.next 'innodb'               [ fail ]  Found warnings/errors in server log file!
              Test ended at 2018-04-03 10:15:24
      line
      ==21186== Thread 10:
      ==21186== Syscall param pwrite64(buf) points to uninitialised byte(s)
      ==21186==    at 0x4E4A6CF: pwrite (pwrite64.c:29)
      ==21186==    by 0xD8FBEE: SyncFileIO::execute(IORequest const&) (os0file.cc:1596)
      ==21186==    by 0xD93BF6: os_file_io(IORequest const&, int, void*, unsigned long, unsigned long, dberr_t*) (os0file.cc:4847)
      ==21186==    by 0xD9413E: os_file_pwrite(IORequest const&, int, unsigned char const*, unsigned long, unsigned long, dberr_t*) (os0file.cc:4931)
      ==21186==    by 0xD942B1: os_file_write_func(IORequest const&, char const*, int, void const*, unsigned long, unsigned long) (os0file.cc:4962)
      ==21186==    by 0xD974CF: os_aio_func(IORequest&, unsigned long, char const*, pfs_os_file_t, void*, unsigned long, unsigned long, bool, fil_node_t*, void*) (os0file.cc:6606)
      ==21186==    by 0x1017530: pfs_os_aio_func(IORequest&, unsigned long, char const*, pfs_os_file_t, void*, unsigned long, unsigned long, bool, fil_node_t*, void*, char const*, unsigned int) (os0file.ic:255)
      ==21186==    by 0x102692A: fil_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*, bool) (fil0fil.cc:4632)
      ==21186==    by 0xD6B841: log_group_write_buf(log_group_t*, unsigned char*, unsigned long, unsigned long, unsigned long, unsigned long) (log0log.cc:1023)
      ==21186==    by 0xD6C459: log_write_up_to(unsigned long, bool) (log0log.cc:1270)
      

      The reason is that the tail of the first buffer is not being initialized.
      This should also affect MariaDB 10.2, even though I was unable to repeat the problem there. By default, innodb_log_write_ahead_size=8192 already in 10.2.
      The following patch fixes the problem:

      diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc
      index 7da66c04c18..68d15665b04 100644
      --- a/storage/innobase/log/log0log.cc
      +++ b/storage/innobase/log/log0log.cc
      @@ -1235,6 +1235,9 @@ log_write_up_to(
       	log_group_set_fields(&log_sys->log, log_sys->write_lsn);
       
       	log_mutex_exit();
      +	/* Erase the end of the last log block. */
      +	memset(write_buf + end_offset, 0, ~end_offset & OS_FILE_LOG_BLOCK_SIZE);
       	/* Calculate pad_size if needed. */
       	pad_size = 0;
       	if (write_ahead_size > OS_FILE_LOG_BLOCK_SIZE) {
      

      Attachments

        Activity

          My suggested patch is obviously wrong: it should have been ~end_offset & (OS_FILE_LOG_BLOCK_SIZE - 1), or otherwise the result could only be 0 or 512.
          The problem cannot be repeated deterministically. @monty noted that it does repeat randomly in sql_sequence.create.

          To better catch this even outside Valgrind, I would add also non-Valgrind instrumentation, so that crash recovery would have a better chance of failing when the log has been written by debug builds. Often the allocated memory would be zero-initialized by the runtime environment, and the NUL byte is a dummy padding byte in the InnoDB redo log:

          diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc
          index af2a51dbd65..b6e4e487696 100644
          --- a/storage/innobase/log/log0log.cc
          +++ b/storage/innobase/log/log0log.cc
          @@ -609,6 +609,7 @@ void log_t::create()
             ut_ad(srv_log_buffer_size >= 4U << srv_page_size_shift);
           
             buf= static_cast<byte*>(ut_malloc_dontdump(srv_log_buffer_size * 2));
          +  TRASH_ALLOC(buf, srv_log_buffer_size * 2);
           
             first_in_use= true;
           
          @@ -1069,7 +1070,8 @@ log_write_up_to(
           
           	log_mutex_exit();
           	/* Erase the end of the last log block. */
          -	memset(write_buf + end_offset, 0, ~end_offset & OS_FILE_LOG_BLOCK_SIZE);
          +	memset(write_buf + end_offset, 0,
          +	       ~end_offset & (OS_FILE_LOG_BLOCK_SIZE - 1));
           
           	/* Calculate pad_size if needed. */
           	pad_size = 0;
          

          marko Marko Mäkelä added a comment - My suggested patch is obviously wrong: it should have been ~end_offset & (OS_FILE_LOG_BLOCK_SIZE - 1) , or otherwise the result could only be 0 or 512. The problem cannot be repeated deterministically. @monty noted that it does repeat randomly in sql_sequence.create . To better catch this even outside Valgrind, I would add also non-Valgrind instrumentation, so that crash recovery would have a better chance of failing when the log has been written by debug builds. Often the allocated memory would be zero-initialized by the runtime environment, and the NUL byte is a dummy padding byte in the InnoDB redo log: diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc index af2a51dbd65..b6e4e487696 100644 --- a/storage/innobase/log/log0log.cc +++ b/storage/innobase/log/log0log.cc @@ -609,6 +609,7 @@ void log_t::create() ut_ad(srv_log_buffer_size >= 4U << srv_page_size_shift); buf= static_cast<byte*>(ut_malloc_dontdump(srv_log_buffer_size * 2)); + TRASH_ALLOC(buf, srv_log_buffer_size * 2); first_in_use= true; @@ -1069,7 +1070,8 @@ log_write_up_to( log_mutex_exit(); /* Erase the end of the last log block. */ - memset(write_buf + end_offset, 0, ~end_offset & OS_FILE_LOG_BLOCK_SIZE); + memset(write_buf + end_offset, 0, + ~end_offset & (OS_FILE_LOG_BLOCK_SIZE - 1)); /* Calculate pad_size if needed. */ pad_size = 0;

          People

            marko Marko Mäkelä
            marko Marko Mäkelä
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.