Details
-
Bug
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Fixed
-
10.1(EOL), 10.2(EOL)
-
None
-
10.2.4-1, 10.2.4-2
Description
The new valgrind builder is still experimental, but the failure is reproducible locally for me, so I don't think it's the builder's fault.
10.1 23cc1be270c7304963643947d8e5ab88f4e312ee |
encryption.innodb_encryption_is 'cbc,xtradb' [ fail ] Found warnings/errors in server log file!
|
Test ended at 2016-12-30 00:17:28
|
line
|
==24755== Thread 16:
|
==24755== Conditional jump or move depends on uninitialised value(s)
|
==24755== at 0xC2A7FD: buf_page_is_checksum_valid_innodb(unsigned char const*, unsigned long, unsigned long) (buf0buf.cc:653)
|
==24755== by 0xC2AC60: buf_page_is_corrupted(bool, unsigned char const*, unsigned long) (buf0buf.cc:859)
|
==24755== by 0xCB74C3: fil_space_encrypt(unsigned long, unsigned long, unsigned long, unsigned char*, unsigned long, unsigned char*) (fil0crypt.cc:697)
|
==24755== by 0xC37059: buf_page_encrypt_before_write(buf_page_t*, unsigned char*, unsigned long) (buf0buf.cc:6366)
|
==24755== by 0xC4304E: buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) (buf0flu.cc:950)
|
==24755== by 0xC435B7: buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) (buf0flu.cc:1109)
|
==24755== by 0xC43B89: buf_flush_try_neighbors(unsigned long, unsigned long, buf_flush_t, unsigned long, unsigned long) (buf0flu.cc:1324)
|
==24755== by 0xC43EAE: buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) (buf0flu.cc:1412)
|
==24755== by 0xC4499B: buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) (buf0flu.cc:1741)
|
==24755== by 0xC44D35: buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, bool, flush_counters_t*) (buf0flu.cc:1817)
|
==24755== by 0xC45461: buf_flush_list(unsigned long, unsigned long, unsigned long*) (buf0flu.cc:2097)
|
==24755== by 0xC460F4: page_cleaner_do_flush_batch(unsigned long, unsigned long) (buf0flu.cc:2410)
|
==24755== by 0xC47376: buf_flush_page_cleaner_thread (buf0flu.cc:2792)
|
==24755== by 0x4E3D0A3: start_thread (pthread_create.c:309)
|
==24755== by 0x6CB787C: clone (clone.S:111)
|
==24755== Conditional jump or move depends on uninitialised value(s)
|
==24755== at 0x4C2ED52: __memcmp_sse4_1 (vg_replace_strmem.c:972)
|
==24755== by 0xCB74E3: fil_space_encrypt(unsigned long, unsigned long, unsigned long, unsigned char*, unsigned long, unsigned char*) (fil0crypt.cc:698)
|
==24755== by 0xC37059: buf_page_encrypt_before_write(buf_page_t*, unsigned char*, unsigned long) (buf0buf.cc:6366)
|
==24755== by 0xC4304E: buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) (buf0flu.cc:950)
|
==24755== by 0xC435B7: buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) (buf0flu.cc:1109)
|
==24755== by 0xC43B89: buf_flush_try_neighbors(unsigned long, unsigned long, buf_flush_t, unsigned long, unsigned long) (buf0flu.cc:1324)
|
==24755== by 0xC43EAE: buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) (buf0flu.cc:1412)
|
==24755== by 0xC4499B: buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) (buf0flu.cc:1741)
|
==24755== by 0xC44D35: buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, bool, flush_counters_t*) (buf0flu.cc:1817)
|
==24755== by 0xC45461: buf_flush_list(unsigned long, unsigned long, unsigned long*) (buf0flu.cc:2097)
|
==24755== by 0xC460F4: page_cleaner_do_flush_batch(unsigned long, unsigned long) (buf0flu.cc:2410)
|
==24755== by 0xC47376: buf_flush_page_cleaner_thread (buf0flu.cc:2792)
|
==24755== by 0x4E3D0A3: start_thread (pthread_create.c:309)
|
==24755== by 0x6CB787C: clone (clone.S:111)
|
Attachments
Issue Links
- relates to
-
MDEV-7069 Fix buildbot failures in main server trees
-
- Closed
-
-
MDEV-22650 Dirty compressed page checksum validation fails
-
- Closed
-
elenst, indeed, if I try to start the test with ./mtr --valgrind (which I would never use when debugging), then valgrind 3.7.0 will indeed complain about the unrecognized option --soname-synonyms which was added already in 2015.
It looks like I used an almost year-old revision of 10.1.14 where I was unable to repeat the problem.
I can repeat the problem with 10.1-mdev11686 on perro, when running --manual-gdb and starting Valgrind+gdb instead of gdb.
Just like with the older revision that I tested, there is a fault at startup, which might be suppressed in --valgrind (I did not use any suppressions). And according to Valgrind, the crypt_data.iv is fully set.
…
2017-03-06 19:17:54 366200576 [Note] InnoDB: Dumping buffer pool(s) not yet started
==57931== Conditional jump or move depends on uninitialised value(s)
==57931== at 0x5C385BC: ASN1_STRING_set (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==57931== by 0x5C262AC: ASN1_mbstring_ncopy (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==57931== by 0x5C264A3: ASN1_mbstring_copy (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==57931== by 0x5C2740C: ASN1_STRING_to_UTF8 (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
…
==57931== Continuing ...
2017-03-06 19:18:15 67314496 [Note] Server socket created on IP: '127.0.0.1'.
2017-03-06 19:18:15 67314496 [Note] /home/mariadb/git/10.1-mdev11686/sql/mysqld: ready for connections.
Version: '10.1.22-MariaDB-debug' socket: '/home/mariadb/git/10.1-mdev11686/mysql-test/var/tmp/mysqld.1.sock' port: 16000 Source distribution
2017-03-06 19:18:16 68254464 [Note] InnoDB: Created tablespace for space 4 name test/t1 key_id 1 encryption 1.
==57931== Thread 10:
==57931== Conditional jump or move depends on uninitialised value(s)
==57931== at 0xC519F40: buf_page_is_checksum_valid_innodb(unsigned char const*, unsigned long, unsigned long) (buf0buf.cc:572)
…
Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 58112]
0x000000000c519f40 in buf_page_is_checksum_valid_innodb (
read_buf=0xd61a780 "!\265\224\222", checksum_field1=565548178,
checksum_field2=4272144276)
at /home/mariadb/git/10.1-mdev11686/storage/innobase/buf/buf0buf.cc:572
572 && checksum_field1 != buf_calc_page_new_checksum(read_buf)) {
(gdb) up
#1 0x000000000c51a440 in buf_page_is_corrupted (check_lsn=true,
read_buf=0xd61a780 "!\265\224\222", zip_size=0, space=0xd60d468)
at /home/mariadb/git/10.1-mdev11686/storage/innobase/buf/buf0buf.cc:780
780 if (buf_page_is_checksum_valid_innodb(read_buf,
(gdb) up
#2 0x000000000c5a86c0 in fil_space_encrypt (space=4, offset=1, lsn=1629842,
src_frame=0xe7b0000 "!\265\224\222", zip_size=0,
dst_frame=0xd614000 "!\265\224\222")
at /home/mariadb/git/10.1-mdev11686/storage/innobase/fil/fil0crypt.cc:700
700 bool corrupted = buf_page_is_corrupted(true, tmp_mem, zip_size, tspace);
(gdb) p crypt_data.iv
$1 = "\245\373\302\n&!=\202MWL\243\357)8}"
(gdb) p crypt_data
$2 = (fil_space_crypt_t *) 0x11dbe308
(gdb) p &crypt_data.iv
$3 = (unsigned char (*)[16]) 0x11dbe308
(gdb) monitor get_vbits 0x11dbe308 16
00000000 00000000 00000000 00000000
(gdb) monitor get_vbits 0xe7b000 16384
[snip an output of 16384 zeros, indicating that src_frame in fil_space_encrypt() is fully initialized]
(gdb) p/x *src_frame@16384
$5 = {0x21, 0xb5, 0x94, 0x92, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 13 times>,
0x18, 0xde, 0x92, 0x0, 0x5, 0x0 <repeats 11 times>, 0x4,
0x0 <repeats 16338 times>, 0xfe, 0xa3, 0xbf, 0x94, 0x0, 0x18, 0xde, 0x92}
(gdb) p/x *tmp_mem@16384
$6 = {0x21, 0xb5, 0x94, 0x92, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 13 times>,
0x18, 0xde, 0x92, 0x0, 0x5, 0x0 <repeats 11 times>, 0x4,
0x0 <repeats 16338 times>, 0xfe, 0xa3, 0xbf, 0x94, 0x0, 0x18, 0xde, 0x92}
# Note that the two buffers above are identical!
(gdb) p tmp_mem
$7 = (unsigned char *) 0xd61a780 "!\265\224\222"
(gdb) monitor get_vbits 0xd61a780 16384
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 0000ffff ffffffff ffffffff ffffffff ffff0000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
…
This still looks like a wrong alarm, just like last time. This time I only confirmed that the cause cannot be that Valgrind would think that the initialization vector contains uninitialized bits.
I believe that some arithmetic operations in the libssl 1.0.1 AES implementation could be confusing the V-bit bookkeeping of Valgrind. The bookkeeping is not foolproof, and some work-arounds could have been optimized away by recent compilers; see for example MDEV-11349 commit 2/2.
Still, it is worth noting that I did not repeat the issue with the 10.1 commit from April 2016 nor with a July 2016 revision. It turns out that some debug code to decrypt a copy of the page immediately after encryption was added in
MDEV-9931, September 22, 2016.I suspect that Valgrind would complain about decryption even with earlier versions, but that would require a different type of a test:
Do we want to track this down further? Do we want to add some VALGRIND_MAKE_MEM_DEFINED() to MariaDB, conditional on the libssl1.0.0 version, to suppress this? (I would definitely not want to suppress anything for the 1.0.2 and later versions of libssl1.0.0.)