[MDEV-15764] InnoDB may write uninitialized garbage to redo log Created: 2018-04-03  Updated: 2018-05-04  Resolved: 2018-05-04

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.2, 10.3.0
Fix Version/s: 10.2.15, 10.3.7

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: recovery, valgrind


 Description   

According to Valgrind, InnoDB is writing uninitialized data to redo log:

./mtr --valgrind sql_sequence.next

Stack traces vary, but they have log_write_up_to() in common:

10.3 19bb7fdcd6e68aca6e41a2e30ffb3c2ad1d14cb6

sql_sequence.next 'innodb'               [ fail ]  Found warnings/errors in server log file!
        Test ended at 2018-04-03 10:15:24
line
==21186== Thread 10:
==21186== Syscall param pwrite64(buf) points to uninitialised byte(s)
==21186==    at 0x4E4A6CF: pwrite (pwrite64.c:29)
==21186==    by 0xD8FBEE: SyncFileIO::execute(IORequest const&) (os0file.cc:1596)
==21186==    by 0xD93BF6: os_file_io(IORequest const&, int, void*, unsigned long, unsigned long, dberr_t*) (os0file.cc:4847)
==21186==    by 0xD9413E: os_file_pwrite(IORequest const&, int, unsigned char const*, unsigned long, unsigned long, dberr_t*) (os0file.cc:4931)
==21186==    by 0xD942B1: os_file_write_func(IORequest const&, char const*, int, void const*, unsigned long, unsigned long) (os0file.cc:4962)
==21186==    by 0xD974CF: os_aio_func(IORequest&, unsigned long, char const*, pfs_os_file_t, void*, unsigned long, unsigned long, bool, fil_node_t*, void*) (os0file.cc:6606)
==21186==    by 0x1017530: pfs_os_aio_func(IORequest&, unsigned long, char const*, pfs_os_file_t, void*, unsigned long, unsigned long, bool, fil_node_t*, void*, char const*, unsigned int) (os0file.ic:255)
==21186==    by 0x102692A: fil_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*, bool) (fil0fil.cc:4632)
==21186==    by 0xD6B841: log_group_write_buf(log_group_t*, unsigned char*, unsigned long, unsigned long, unsigned long, unsigned long) (log0log.cc:1023)
==21186==    by 0xD6C459: log_write_up_to(unsigned long, bool) (log0log.cc:1270)

The reason is that the tail of the first buffer is not being initialized.
This should also affect MariaDB 10.2, even though I was unable to repeat the problem there. By default, innodb_log_write_ahead_size=8192 already in 10.2.
The following patch fixes the problem:

diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc
index 7da66c04c18..68d15665b04 100644
--- a/storage/innobase/log/log0log.cc
+++ b/storage/innobase/log/log0log.cc
@@ -1235,6 +1235,9 @@ log_write_up_to(
 	log_group_set_fields(&log_sys->log, log_sys->write_lsn);
 
 	log_mutex_exit();
+	/* Erase the end of the last log block. */
+	memset(write_buf + end_offset, 0, ~end_offset & OS_FILE_LOG_BLOCK_SIZE);
 	/* Calculate pad_size if needed. */
 	pad_size = 0;
 	if (write_ahead_size > OS_FILE_LOG_BLOCK_SIZE) {



 Comments   
Comment by Marko Mäkelä [ 2018-05-04 ]

My suggested patch is obviously wrong: it should have been ~end_offset & (OS_FILE_LOG_BLOCK_SIZE - 1), or otherwise the result could only be 0 or 512.
The problem cannot be repeated deterministically. @monty noted that it does repeat randomly in sql_sequence.create.

To better catch this even outside Valgrind, I would add also non-Valgrind instrumentation, so that crash recovery would have a better chance of failing when the log has been written by debug builds. Often the allocated memory would be zero-initialized by the runtime environment, and the NUL byte is a dummy padding byte in the InnoDB redo log:

diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc
index af2a51dbd65..b6e4e487696 100644
--- a/storage/innobase/log/log0log.cc
+++ b/storage/innobase/log/log0log.cc
@@ -609,6 +609,7 @@ void log_t::create()
   ut_ad(srv_log_buffer_size >= 4U << srv_page_size_shift);
 
   buf= static_cast<byte*>(ut_malloc_dontdump(srv_log_buffer_size * 2));
+  TRASH_ALLOC(buf, srv_log_buffer_size * 2);
 
   first_in_use= true;
 
@@ -1069,7 +1070,8 @@ log_write_up_to(
 
 	log_mutex_exit();
 	/* Erase the end of the last log block. */
-	memset(write_buf + end_offset, 0, ~end_offset & OS_FILE_LOG_BLOCK_SIZE);
+	memset(write_buf + end_offset, 0,
+	       ~end_offset & (OS_FILE_LOG_BLOCK_SIZE - 1));
 
 	/* Calculate pad_size if needed. */
 	pad_size = 0;

Generated at Thu Feb 08 08:23:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.