[MDEV-20309] InnoDB encryption accesses memory outside of allocated block Created: 2019-08-10  Updated: 2020-01-24  Resolved: 2020-01-24

Status: Closed
Project: MariaDB Server
Component/s: Encryption, Storage Engine - InnoDB
Affects Version/s: 10.4.7, 10.5.0
Fix Version/s: 10.4.9, 10.5.0

Type: Bug Priority: Major
Reporter: Michael Widenius Assignee: Vladislav Vaintroub
Resolution: Fixed Votes: 0
Labels: None
Environment:

Server built with/BUILD/compile-pentium64-valgrind-max


Issue Links:
PartOf
is part of MDEV-20310 valgrind bugs found in 10.5 Open
Relates
relates to MDEV-20312 InnoDB encryption accesses memory out... Closed
relates to MDEV-20377 Make WITH_MSAN more usable Closed

 Description   

mtr --valgrind encryption.innodb_encryption

This is log is from MariaDB 10.5 compiled with
./BUILD/compile-pentium64-valgrind-max
We get similar errors (and more) in 10.4)

encryption.innodb_encryption 'innodb,undo0' [ fail ] Found warnings/errors in server log file!
Test ended at 2019-08-10 16:21:55
line
==12768== Invalid read of size 2
==12768== at 0x4C356A7: memmove (vg_replace_strmem.c:1271)
==12768== by 0x15763C7: wolfSSL_EVP_CipherUpdate (evp.c:361)
==12768== by 0x1437725: MyCTX::update(unsigned char const*, unsigned int, unsigned char*, unsigned int*) (my_crypt.cc:85)
==12768== by 0x143796B: MyCTX_nopad::update(unsigned char const*, unsigned int, unsigned char*, unsigned int*) (my_crypt.cc:143)
==12768== by 0x14372ED: my_aes_crypt_update (my_crypt.cc:310)
==12768== by 0xF2CFDF1: ctx_update(void*, unsigned char const*, unsigned int, unsigned char*, unsigned int*) (example_key_management_plugin.cc:126)
==12768== by 0xADF5FA: encryption_crypt (service_encryption.h:119)
==12768== by 0xADFF2F: do_crypt(unsigned char const*, unsigned int, unsigned char*, unsigned int*, st_encryption_scheme*, unsigned int, unsigned int, unsigned int, unsigned long long, int) (encryption.cc:223)
==12768== by 0xADFFD1: encryption_scheme_decrypt (encryption.cc:244)
==12768== by 0x12CDC06: fil_space_decrypt_full_crc32(unsigned long, fil_space_crypt_t*, unsigned char*, unsigned char*, dberr_t*) (fil0crypt.cc:842)
==12768== by 0x12CE1A7: fil_space_decrypt(unsigned long, fil_space_crypt_t*, unsigned char*, unsigned long, unsigned long, unsigned char*, dberr_t*) (fil0crypt.cc:976)
==12768== by 0x12CE30D: fil_space_decrypt(fil_space_t const*, unsigned char*, unsigned char*, bool*) (fil0crypt.cc:1011)
==12768== by 0x122565E: buf_page_decrypt_after_read(buf_page_t*, fil_space_t*) (buf0buf.cc:555)
==12768== by 0x1236C14: buf_page_io_complete(buf_page_t*, bool, bool) (buf0buf.cc:6019)
==12768== by 0x1261339: buf_read_page_low(dberr_t*, bool, unsigned long, unsigned long, page_id_t, unsigned long, bool, bool) (buf0rea.cc:203)
==12768== by 0x1261AE3: buf_read_page(page_id_t, unsigned long) (buf0rea.cc:411)
==12768== Address 0x10324000 is 0 bytes after a block of size 16,384 alloc'd
==12768== at 0x4C30833: memalign (vg_replace_malloc.c:908)
==12768== by 0x4C30936: posix_memalign (vg_replace_malloc.c:1072)
==12768== by 0x123B9B2: aligned_malloc(unsigned long, unsigned long) (buf0buf.cc:130)
==12768== by 0x12250BD: buf_tmp_reserve_crypt_buf(buf_tmp_buffer_t*) (buf0buf.cc:441)
==12768== by 0x1225622: buf_page_decrypt_after_read(buf_page_t*, fil_space_t*) (buf0buf.cc:550)
==12768== by 0x1236C14: buf_page_io_complete(buf_page_t*, bool, bool) (buf0buf.cc:6019)
==12768== by 0x1261339: buf_read_page_low(dberr_t*, bool, unsigned long, unsigned long, page_id_t, unsigned long, bool, bool) (buf0rea.cc:203)
==12768== by 0x1261AE3: buf_read_page(page_id_t, unsigned long) (buf0rea.cc:411)
==12768== by 0x1231BA4: buf_page_get_gen(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned int, mtr_t*, dberr_t*) (buf0buf.cc:4380)
==12768== by 0x126825F: dict_hdr_get(mtr_t*) (dict0boot.cc:49)
==12768== by 0x1268936: dict_boot() (dict0boot.cc:276)
==12768== by 0x116D1E6: srv_start(bool) (srv0start.cc:1881)
==12768== by 0xF83B68: innodb_init(void*) (ha_innodb.cc:4127)
==12768== by 0xBA2FC4: ha_initialize_handlerton(st_plugin_int*) (handler.cc:550)
==12768== by 0x8A3030: plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) (sql_plugin.cc:1437)
==12768== by 0x8A3C2C: plugin_init(int*, char**, int) (sql_plugin.cc:1719)
==12768== Invalid read of size 2
==12768== at 0x4C35698: memmove (vg_replace_strmem.c:1271)
==12768== by 0x15763C7: wolfSSL_EVP_CipherUpdate (evp.c:361)
==12768== by 0x1437725: MyCTX::update(unsigned char const*, unsigned int, unsigned char*, unsigned int*) (my_crypt.cc:85)
==12768== by 0x143796B: MyCTX_nopad::update(unsigned char const*, unsigned int, unsigned char*, unsigned int*) (my_crypt.cc:143)
==12768== by 0x14372ED: my_aes_crypt_update (my_crypt.cc:310)
==12768== by 0xF2CFDF1: ctx_update(void*, unsigned char const*, unsigned int, unsigned char*, unsigned int*) (example_key_management_plugin.cc:126)
==12768== by 0xADF5FA: encryption_crypt (service_encryption.h:119)
==12768== by 0xADFF2F: do_crypt(unsigned char const*, unsigned int, unsigned char*, unsigned int*, st_encryption_scheme*, unsigned int, unsigned int, unsigned int, unsigned long long, int) (encryption.cc:223)
==12768== by 0xADFFD1: encryption_scheme_decrypt (encryption.cc:244)
==12768== by 0x12CDC06: fil_space_decrypt_full_crc32(unsigned long, fil_space_crypt_t*, unsigned char*, unsigned char*, dberr_t*) (fil0crypt.cc:842)
==12768== by 0x12CE1A7: fil_space_decrypt(unsigned long, fil_space_crypt_t*, unsigned char*, unsigned long, unsigned long, unsigned char*, dberr_t*) (fil0crypt.cc:976)
==12768== by 0x12CE30D: fil_space_decrypt(fil_space_t const*, unsigned char*, unsigned char*, bool*) (fil0crypt.cc:1011)
==12768== by 0x122565E: buf_page_decrypt_after_read(buf_page_t*, fil_space_t*) (buf0buf.cc:555)
==12768== by 0x1236C14: buf_page_io_complete(buf_page_t*, bool, bool) (buf0buf.cc:6019)
==12768== by 0x1261339: buf_read_page_low(dberr_t*, bool, unsigned long, unsigned long, page_id_t, unsigned long, bool, bool) (buf0rea.cc:203)
==12768== by 0x1261AE3: buf_read_page(page_id_t, unsigned long) (buf0rea.cc:411)
==12768== Address 0x10324004 is 4 bytes after a block of size 16,384 alloc'd
==12768== at 0x4C30833: memalign (vg_replace_malloc.c:908)
==12768== by 0x4C30936: posix_memalign (vg_replace_malloc.c:1072)
==12768== by 0x123B9B2: aligned_malloc(unsigned long, unsigned long) (buf0buf.cc:130)
==12768== by 0x12250BD: buf_tmp_reserve_crypt_buf(buf_tmp_buffer_t*) (buf0buf.cc:441)
==12768== by 0x1225622: buf_page_decrypt_after_read(buf_page_t*, fil_space_t*) (buf0buf.cc:550)
==12768== by 0x1236C14: buf_page_io_complete(buf_page_t*, bool, bool) (buf0buf.cc:6019)
==12768== by 0x1261339: buf_read_page_low(dberr_t*, bool, unsigned long, unsigned long, page_id_t, unsigned long, bool, bool) (buf0rea.cc:203)
==12768== by 0x1261AE3: buf_read_page(page_id_t, unsigned long) (buf0rea.cc:411)
==12768== by 0x1231BA4: buf_page_get_gen(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned int, mtr_t*, dberr_t*) (buf0buf.cc:4380)
==12768== by 0x126825F: dict_hdr_get(mtr_t*) (dict0boot.cc:49)
==12768== by 0x1268936: dict_boot() (dict0boot.cc:276)
==12768== by 0x116D1E6: srv_start(bool) (srv0start.cc:1881)
==12768== by 0xF83B68: innodb_init(void*) (ha_innodb.cc:4127)
==12768== by 0xBA2FC4: ha_initialize_handlerton(st_plugin_int*) (handler.cc:550)
==12768== by 0x8A3030: plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) (sql_plugin.cc:1437)
==12768== by 0x8A3C2C: plugin_init(int*, char**, int) (sql_plugin.cc:1719)



 Comments   
Comment by Marko Mäkelä [ 2019-08-19 ]

I can repeat this with MSAN (enabled as described in MDEV-20377):

10.5 da53fb6d7de906fd8bd73d5f244bac4d77b687aa

==11249==WARNING: MemorySanitizer: use-of-uninitialized-value
2019-08-19 13:15:46 0 [Note] Server socket created on IP: '127.0.0.1'.
2019-08-19 13:15:46 0 [Note] Reading of all Master_info entries succeeded
2019-08-19 13:15:46 0 [Note] Added new Master_info '' to hash table
2019-08-19 13:15:46 0 [Note] /dev/shm/10.5/sql/mysqld: ready for connections.
Version: '10.5.0-MariaDB-debug-log'  socket: '/dev/shm/10.5/mysql-test/var/tmp/2/mysqld.1.sock'  port: 16020  Source distribution
    #0 0x33350be in fil_space_encrypt(fil_space_t const*, unsigned long, unsigned long, unsigned char*, unsigned char*) /mariadb/10.5/storage/innobase/fil/fil0crypt.cc:756:4
    #1 0x30f35c6 in buf_page_encrypt(fil_space_t*, buf_page_t*, unsigned char*) /mariadb/10.5/storage/innobase/buf/buf0buf.cc:7547:10
    #2 0x312f71d in buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:1026:11
    #3 0x312f71d in buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:1211
    #4 0x3136b54 in buf_flush_try_neighbors(page_id_t, buf_flush_t, unsigned long, unsigned long) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:1437:9
    #5 0x3136b54 in buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:1511
    #6 0x3139f7c in buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:1769:3
    #7 0x3139f7c in buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, flush_counters_t*) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:1840
    #8 0x3139f7c in buf_flush_do_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, flush_counters_t*) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:1998
    #9 0x313d030 in buf_flush_lists(unsigned long, unsigned long, unsigned long*) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:2100:8
    #10 0x33413ee in fil_crypt_flush_space(rotate_thread_t*) /mariadb/10.5/storage/innobase/fil/fil0crypt.cc:2179:14
    #11 0x33413ee in fil_crypt_complete_rotate_space(rotate_thread_t*) /mariadb/10.5/storage/innobase/fil/fil0crypt.cc:2290
    #12 0x333e598 in fil_crypt_thread /mariadb/10.5/storage/innobase/fil/fil0crypt.cc:2393:5

The error is reported for the following comparison:

ut_ad(!memcmp(src, tmp_mem,
	      (size - FIL_PAGE_FCRC32_CHECKSUM)));

We have size=16384 and FIL_PAGE_FCRC32_CHECKSUM=4. According to the following, the entire src is initialized, but tmp_mem is only partly initialized:

call (void)__msan_print_shadow(src,16384)
call (void)__msan_print_shadow(tmp_mem,16384)

Only the first 26 and last 12 bytes of tmp_mem are initialized.

Comment by Marko Mäkelä [ 2019-08-19 ]

The bulk of tmp_mem should be initialized by the decryption call:

			ut_ad(fil_space_decrypt(space->id, crypt_data, tmp_mem,
						size, space->flags, tmp,
						&err));

Also tmp is fully initialized. For me, the error is reported for page 1 in the system tablespace. We have innodb_checksum_algorithm=full_crc32 in this case (it is the default on 10.5). The function fil_space_decrypt_full_crc32() does get invoked, and as far as I can tell, it is invoking encryption_scheme_decrypt() a.k.a. do_crypt() correctly. (If it were not, that assertion on memcmp() would likely fail.)

It looks like the issue is an allegedly uninitialized key.key in do_crypt(). call (void)__msan_print_shadow(&key,sizeof key) produces the following:

Shadow map of [0x2fffe06edd18, 0x2fffe06edd2c), 20 bytes:
0x2fffe06edd18: 00000000 ffffffff ffffffff ffffffff  |. A A A|
0x2fffe06edd28: ffffffff ........ ........ ........  |A . . .|

The following work-around makes the report go away:

diff --git a/sql/encryption.cc b/sql/encryption.cc
index fc07623883e..ca31569de38 100644
--- a/sql/encryption.cc
+++ b/sql/encryption.cc
@@ -186,6 +186,7 @@ static uint scheme_get_key(st_encryption_scheme *scheme,
   return rc;
 }
 
+#include <sanitizer/msan_interface.h>
 int do_crypt(const unsigned char* src, unsigned int slen,
              unsigned char* dst, unsigned int* dlen,
              struct st_encryption_scheme *scheme,
@@ -213,14 +214,18 @@ int do_crypt(const unsigned char* src, unsigned int slen,
   uint rc= scheme_get_key(scheme, &key);
   if (rc)
     return (int)rc;
+  __msan_unpoison(key.key, sizeof key.key);
 
   unsigned char iv[4 + 4 + 8];
   int4store(iv + 0, i32_1);
   int4store(iv + 4, i32_2);
   int8store(iv + 8, i64);
 
-  return encryption_crypt(src, slen, dst, dlen, key.key, sizeof(key.key),
+  int r= encryption_crypt(src, slen, dst, dlen, key.key, sizeof(key.key),
                           iv, sizeof(iv), flag, scheme->key_id, key_version);
+  __msan_check_mem_is_initialized(src, slen);
+  __msan_unpoison(dst, *dlen);
+  return r;
 }
 
 int encryption_scheme_encrypt(const unsigned char* src, unsigned int slen,

I think that wlad should study this code and find out a proper way to silence the error. (As noted in MDEV-20377, I was using WITH_SSL=bundled, hence WolfSSL.)

With the above work-around in place, and compiled with -O2, the test will pass. Without -O2 I get a stack overflow in MYSQLparse() during parsing SHOW SLAVE STATUS.

I still get crashes on shutdown due to another (possibly related) error: Note: this is with the __builtin_ia32_crc32qi() patch of MDEV-20377 that avoids false MSAN positives in ut_crc32_hw:

10.5 da53fb6d7de906fd8bd73d5f244bac4d77b687aa

==8278==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x2eed1bc in ut_crc32_8_hw(unsigned int*, unsigned char const**, unsigned long*) /mariadb/10.5/storage/innobase/ut/ut0crc32.cc:223:9
    #1 0x2eed1bc in ut_crc32_hw(unsigned char const*, unsigned long) /mariadb/10.5/storage/innobase/ut/ut0crc32.cc:381
    #2 0x293d940 in log_block_calc_checksum_crc32(unsigned char const*) /mariadb/10.5/storage/innobase/include/log0log.ic:227:9
    #3 0x293d940 in log_block_store_checksum(unsigned char*) /mariadb/10.5/storage/innobase/log/log0log.cc:609
    #4 0x293d940 in log_write_buf(unsigned char*, unsigned long, unsigned long, unsigned long, unsigned long) /mariadb/10.5/storage/innobase/log/log0log.cc:751
    #5 0x293d940 in log_write_up_to(unsigned long, bool, bool) /mariadb/10.5/storage/innobase/log/log0log.cc:1012
    #6 0x312f640 in buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:997:3
    #7 0x312f640 in buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:1211
    #8 0x3136c04 in buf_flush_try_neighbors(page_id_t, buf_flush_t, unsigned long, unsigned long) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:1437:9
    #9 0x3136c04 in buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:1511
    #10 0x313a02c in buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:1769:3
    #11 0x313a02c in buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, flush_counters_t*) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:1840
    #12 0x313a02c in buf_flush_do_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, flush_counters_t*) /mariadb/10.5/storage/innobase/buf/buf0flu.cc:1998
    #13 0x314af8f in pc_flush_slot() /mariadb/10.5/storage/innobase/buf/buf0flu.cc:2797:27
    #14 0x314795c in buf_flush_page_cleaner_coordinator /mariadb/10.5/storage/innobase/buf/buf0flu.cc:3301:10
    #15 0x7ff72bb05fa2 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7fa2)
    #16 0x7ff72b0ed4ce in clone (/lib/x86_64-linux-gnu/libc.so.6+0xf94ce)

Comment by Marko Mäkelä [ 2019-08-19 ]

Note: What I repeated with MSAN is something else: alleged use of uninitialized data. Valgrind complained about accessing unaddressible memory, not about the use of uninitialized data. Unaddressible memory should be caught with ASAN, but it does not report anything for me.

Comment by Marko Mäkelä [ 2020-01-24 ]

For me, WITH_ASAN on a recent 10.4, I cannot see any encryption trouble.

monty, it seems possible that this problem was fixed in an update of WolfSSL in MariaDB 10.4.9 (and 10.5.0).

I also tried WITH_MSAN on 10.5 with the recipe in MDEV-20377. No matter if I include the above work-around to do_crypt(), it looks like any tests that use encryption would fail.

Comment by Vladislav Vaintroub [ 2020-01-24 ]

I reported it back in the day as https://github.com/wolfSSL/wolfssl/issues/2264 , with self-contained test-case, and a analysis.

I tested their fix, too

Generated at Thu Feb 08 08:58:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.