[MDEV-11686] Multiple encryption tests fail in buildbot with valgrind warnings (Conditional jump or move depends on uninitialised value) - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Minor
Resolution: Fixed
Affects Version/s: 10.1(EOL), 10.2(EOL)
Fix Version/s: 10.5.1, 10.1.45, 10.2.32, 10.3.23, 10.4.13
Component/s: Encryption, Storage Engine - InnoDB
Labels:
None

Sprint:
10.2.4-1, 10.2.4-2

Description

The new valgrind builder is still experimental, but the failure is reproducible locally for me, so I don't think it's the builder's fault.

10.1 23cc1be270c7304963643947d8e5ab88f4e312ee
encryption.innodb_encryption_is 'cbc,xtradb' [ fail ] Found warnings/errors in server log file!
Test ended at 2016-12-30 00:17:28
line
==24755== Thread 16:
==24755== Conditional jump or move depends on uninitialised value(s)
==24755== at 0xC2A7FD: buf_page_is_checksum_valid_innodb(unsigned char const*, unsigned long, unsigned long) (buf0buf.cc:653)
==24755== by 0xC2AC60: buf_page_is_corrupted(bool, unsigned char const*, unsigned long) (buf0buf.cc:859)
==24755== by 0xCB74C3: fil_space_encrypt(unsigned long, unsigned long, unsigned long, unsigned char, unsigned long, unsigned char) (fil0crypt.cc:697)
==24755== by 0xC37059: buf_page_encrypt_before_write(buf_page_t, unsigned char, unsigned long) (buf0buf.cc:6366)
==24755== by 0xC4304E: buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) (buf0flu.cc:950)
==24755== by 0xC435B7: buf_flush_page(buf_pool_t, buf_page_t, buf_flush_t, bool) (buf0flu.cc:1109)
==24755== by 0xC43B89: buf_flush_try_neighbors(unsigned long, unsigned long, buf_flush_t, unsigned long, unsigned long) (buf0flu.cc:1324)
==24755== by 0xC43EAE: buf_flush_page_and_try_neighbors(buf_page_t, buf_flush_t, unsigned long, unsigned long) (buf0flu.cc:1412)
==24755== by 0xC4499B: buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) (buf0flu.cc:1741)
==24755== by 0xC44D35: buf_flush_batch(buf_pool_t, buf_flush_t, unsigned long, unsigned long, bool, flush_counters_t) (buf0flu.cc:1817)
==24755== by 0xC45461: buf_flush_list(unsigned long, unsigned long, unsigned long*) (buf0flu.cc:2097)
==24755== by 0xC460F4: page_cleaner_do_flush_batch(unsigned long, unsigned long) (buf0flu.cc:2410)
==24755== by 0xC47376: buf_flush_page_cleaner_thread (buf0flu.cc:2792)
==24755== by 0x4E3D0A3: start_thread (pthread_create.c:309)
==24755== by 0x6CB787C: clone (clone.S:111)
==24755== Conditional jump or move depends on uninitialised value(s)
==24755== at 0x4C2ED52: __memcmp_sse4_1 (vg_replace_strmem.c:972)
==24755== by 0xCB74E3: fil_space_encrypt(unsigned long, unsigned long, unsigned long, unsigned char, unsigned long, unsigned char) (fil0crypt.cc:698)
==24755== by 0xC37059: buf_page_encrypt_before_write(buf_page_t, unsigned char, unsigned long) (buf0buf.cc:6366)
==24755== by 0xC4304E: buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) (buf0flu.cc:950)
==24755== by 0xC435B7: buf_flush_page(buf_pool_t, buf_page_t, buf_flush_t, bool) (buf0flu.cc:1109)
==24755== by 0xC43B89: buf_flush_try_neighbors(unsigned long, unsigned long, buf_flush_t, unsigned long, unsigned long) (buf0flu.cc:1324)
==24755== by 0xC43EAE: buf_flush_page_and_try_neighbors(buf_page_t, buf_flush_t, unsigned long, unsigned long) (buf0flu.cc:1412)
==24755== by 0xC4499B: buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) (buf0flu.cc:1741)
==24755== by 0xC44D35: buf_flush_batch(buf_pool_t, buf_flush_t, unsigned long, unsigned long, bool, flush_counters_t) (buf0flu.cc:1817)
==24755== by 0xC45461: buf_flush_list(unsigned long, unsigned long, unsigned long*) (buf0flu.cc:2097)
==24755== by 0xC460F4: page_cleaner_do_flush_batch(unsigned long, unsigned long) (buf0flu.cc:2410)
==24755== by 0xC47376: buf_flush_page_cleaner_thread (buf0flu.cc:2792)
==24755== by 0x4E3D0A3: start_thread (pthread_create.c:309)
==24755== by 0x6CB787C: clone (clone.S:111)

Attachments

Issue Links

relates to

MDEV-7069 Fix buildbot failures in main server trees

Closed

MDEV-22650 Dirty compressed page checksum validation fails

Closed

Activity

Ascending order - Click to sort in descending order

View 19 older comments

Marko Mäkelä added a comment - 2017-03-06 18:19

elenst, indeed, if I try to start the test with ./mtr --valgrind (which I would never use when debugging), then valgrind 3.7.0 will indeed complain about the unrecognized option --soname-synonyms which was added already in 2015.

It looks like I used an almost year-old revision of 10.1.14 where I was unable to repeat the problem.
I can repeat the problem with 10.1-mdev11686 on perro, when running --manual-gdb and starting Valgrind+gdb instead of gdb.
Just like with the older revision that I tested, there is a fault at startup, which might be suppressed in --valgrind (I did not use any suppressions). And according to Valgrind, the crypt_data.iv is fully set.

…

2017-03-06 19:17:54 366200576 [Note] InnoDB: Dumping buffer pool(s) not yet started

==57931== Conditional jump or move depends on uninitialised value(s)

==57931==    at 0x5C385BC: ASN1_STRING_set (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)

==57931==    by 0x5C262AC: ASN1_mbstring_ncopy (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)

==57931==    by 0x5C264A3: ASN1_mbstring_copy (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)

==57931==    by 0x5C2740C: ASN1_STRING_to_UTF8 (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)

…

==57931== Continuing ...

2017-03-06 19:18:15 67314496 [Note] Server socket created on IP: '127.0.0.1'.

2017-03-06 19:18:15 67314496 [Note] /home/mariadb/git/10.1-mdev11686/sql/mysqld: ready for connections.

Version: '10.1.22-MariaDB-debug'  socket: '/home/mariadb/git/10.1-mdev11686/mysql-test/var/tmp/mysqld.1.sock'  port: 16000  Source distribution

2017-03-06 19:18:16 68254464 [Note] InnoDB: Created tablespace for space 4 name test/t1 key_id 1 encryption 1.

==57931== Thread 10:

==57931== Conditional jump or move depends on uninitialised value(s)

==57931==    at 0xC519F40: buf_page_is_checksum_valid_innodb(unsigned char const*, unsigned long, unsigned long) (buf0buf.cc:572)

…

Program received signal SIGTRAP, Trace/breakpoint trap.

[Switching to Thread 58112]

0x000000000c519f40 in buf_page_is_checksum_valid_innodb (

    read_buf=0xd61a780 "!\265\224\222", checksum_field1=565548178,

    checksum_field2=4272144276)

    at /home/mariadb/git/10.1-mdev11686/storage/innobase/buf/buf0buf.cc:572

572		    && checksum_field1 != buf_calc_page_new_checksum(read_buf)) {

(gdb) up

#1  0x000000000c51a440 in buf_page_is_corrupted (check_lsn=true,

    read_buf=0xd61a780 "!\265\224\222", zip_size=0, space=0xd60d468)

    at /home/mariadb/git/10.1-mdev11686/storage/innobase/buf/buf0buf.cc:780

780			if (buf_page_is_checksum_valid_innodb(read_buf,

(gdb) up

#2  0x000000000c5a86c0 in fil_space_encrypt (space=4, offset=1, lsn=1629842,

    src_frame=0xe7b0000 "!\265\224\222", zip_size=0,

    dst_frame=0xd614000 "!\265\224\222")

    at /home/mariadb/git/10.1-mdev11686/storage/innobase/fil/fil0crypt.cc:700

700			bool corrupted = buf_page_is_corrupted(true, tmp_mem, zip_size, tspace);

(gdb) p crypt_data.iv

$1 = "\245\373\302\n&!=\202MWL\243\357)8}"

(gdb) p crypt_data

$2 = (fil_space_crypt_t *) 0x11dbe308

(gdb) p &crypt_data.iv

$3 = (unsigned char (*)[16]) 0x11dbe308

(gdb) monitor get_vbits 0x11dbe308 16

00000000 00000000 00000000 00000000

(gdb) monitor get_vbits 0xe7b000 16384

[snip an output of 16384 zeros, indicating that src_frame in fil_space_encrypt() is fully initialized]

(gdb) p/x *src_frame@16384

$5 = {0x21, 0xb5, 0x94, 0x92, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 13 times>,

  0x18, 0xde, 0x92, 0x0, 0x5, 0x0 <repeats 11 times>, 0x4,

  0x0 <repeats 16338 times>, 0xfe, 0xa3, 0xbf, 0x94, 0x0, 0x18, 0xde, 0x92}

(gdb) p/x *tmp_mem@16384

$6 = {0x21, 0xb5, 0x94, 0x92, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 13 times>,

  0x18, 0xde, 0x92, 0x0, 0x5, 0x0 <repeats 11 times>, 0x4,

  0x0 <repeats 16338 times>, 0xfe, 0xa3, 0xbf, 0x94, 0x0, 0x18, 0xde, 0x92}

# Note that the two buffers above are identical!

(gdb) p tmp_mem

$7 = (unsigned char *) 0xd61a780 "!\265\224\222"

(gdb) monitor get_vbits 0xd61a780 16384

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

00000000 0000ffff ffffffff ffffffff ffffffff ffff0000 00000000 00000000

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

…

This still looks like a wrong alarm, just like last time. This time I only confirmed that the cause cannot be that Valgrind would think that the initialization vector contains uninitialized bits.

I believe that some arithmetic operations in the libssl 1.0.1 AES implementation could be confusing the V-bit bookkeeping of Valgrind. The bookkeeping is not foolproof, and some work-arounds could have been optimized away by recent compilers; see for example MDEV-11349 commit 2/2.

Still, it is worth noting that I did not repeat the issue with the 10.1 commit from April 2016 nor with a July 2016 revision. It turns out that some debug code to decrypt a copy of the page immediately after encryption was added in ~~MDEV-9931~~, September 22, 2016.
I suspect that Valgrind would complain about decryption even with earlier versions, but that would require a different type of a test:

start the server with encryption
create and populate an encrypted table
restart the server
read from the encrypted table

Do we want to track this down further? Do we want to add some VALGRIND_MAKE_MEM_DEFINED() to MariaDB, conditional on the libssl1.0.0 version, to suppress this? (I would definitely not want to suppress anything for the 1.0.2 and later versions of libssl1.0.0.)

Marko Mäkelä added a comment - 2017-03-06 18:19 elenst , indeed, if I try to start the test with ./mtr --valgrind (which I would never use when debugging), then valgrind 3.7.0 will indeed complain about the unrecognized option --soname-synonyms which was added already in 2015. It looks like I used an almost year-old revision of 10.1.14 where I was unable to repeat the problem. I can repeat the problem with 10.1-mdev11686 on perro, when running --manual-gdb and starting Valgrind+gdb instead of gdb. Just like with the older revision that I tested, there is a fault at startup, which might be suppressed in --valgrind (I did not use any suppressions). And according to Valgrind, the crypt_data.iv is fully set. … 2017-03-06 19:17:54 366200576 [Note] InnoDB: Dumping buffer pool(s) not yet started ==57931== Conditional jump or move depends on uninitialised value(s) ==57931== at 0x5C385BC: ASN1_STRING_set (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0) ==57931== by 0x5C262AC: ASN1_mbstring_ncopy (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0) ==57931== by 0x5C264A3: ASN1_mbstring_copy (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0) ==57931== by 0x5C2740C: ASN1_STRING_to_UTF8 (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0) … ==57931== Continuing ... 2017-03-06 19:18:15 67314496 [Note] Server socket created on IP: '127.0.0.1'. 2017-03-06 19:18:15 67314496 [Note] /home/mariadb/git/10.1-mdev11686/sql/mysqld: ready for connections. Version: '10.1.22-MariaDB-debug' socket: '/home/mariadb/git/10.1-mdev11686/mysql-test/var/tmp/mysqld.1.sock' port: 16000 Source distribution 2017-03-06 19:18:16 68254464 [Note] InnoDB: Created tablespace for space 4 name test/t1 key_id 1 encryption 1. ==57931== Thread 10: ==57931== Conditional jump or move depends on uninitialised value(s) ==57931== at 0xC519F40: buf_page_is_checksum_valid_innodb(unsigned char const*, unsigned long, unsigned long) (buf0buf.cc:572) … Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to Thread 58112] 0x000000000c519f40 in buf_page_is_checksum_valid_innodb ( read_buf=0xd61a780 "!\265\224\222", checksum_field1=565548178, checksum_field2=4272144276) at /home/mariadb/git/10.1-mdev11686/storage/innobase/buf/buf0buf.cc:572 572 && checksum_field1 != buf_calc_page_new_checksum(read_buf)) { (gdb) up #1 0x000000000c51a440 in buf_page_is_corrupted (check_lsn=true, read_buf=0xd61a780 "!\265\224\222", zip_size=0, space=0xd60d468) at /home/mariadb/git/10.1-mdev11686/storage/innobase/buf/buf0buf.cc:780 780 if (buf_page_is_checksum_valid_innodb(read_buf, (gdb) up #2 0x000000000c5a86c0 in fil_space_encrypt (space=4, offset=1, lsn=1629842, src_frame=0xe7b0000 "!\265\224\222", zip_size=0, dst_frame=0xd614000 "!\265\224\222") at /home/mariadb/git/10.1-mdev11686/storage/innobase/fil/fil0crypt.cc:700 700 bool corrupted = buf_page_is_corrupted(true, tmp_mem, zip_size, tspace); (gdb) p crypt_data.iv $1 = "\245\373\302\n&!=\202MWL\243\357)8}" (gdb) p crypt_data $2 = (fil_space_crypt_t *) 0x11dbe308 (gdb) p &crypt_data.iv $3 = (unsigned char (*)[16]) 0x11dbe308 (gdb) monitor get_vbits 0x11dbe308 16 00000000 00000000 00000000 00000000 (gdb) monitor get_vbits 0xe7b000 16384 [snip an output of 16384 zeros, indicating that src_frame in fil_space_encrypt() is fully initialized] (gdb) p/x *src_frame@16384 $5 = {0x21, 0xb5, 0x94, 0x92, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 13 times>, 0x18, 0xde, 0x92, 0x0, 0x5, 0x0 <repeats 11 times>, 0x4, 0x0 <repeats 16338 times>, 0xfe, 0xa3, 0xbf, 0x94, 0x0, 0x18, 0xde, 0x92} (gdb) p/x *tmp_mem@16384 $6 = {0x21, 0xb5, 0x94, 0x92, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 13 times>, 0x18, 0xde, 0x92, 0x0, 0x5, 0x0 <repeats 11 times>, 0x4, 0x0 <repeats 16338 times>, 0xfe, 0xa3, 0xbf, 0x94, 0x0, 0x18, 0xde, 0x92} # Note that the two buffers above are identical! (gdb) p tmp_mem $7 = (unsigned char *) 0xd61a780 "!\265\224\222" (gdb) monitor get_vbits 0xd61a780 16384 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000ffff ffffffff ffffffff ffffffff ffff0000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 … This still looks like a wrong alarm, just like last time. This time I only confirmed that the cause cannot be that Valgrind would think that the initialization vector contains uninitialized bits. I believe that some arithmetic operations in the libssl 1.0.1 AES implementation could be confusing the V-bit bookkeeping of Valgrind. The bookkeeping is not foolproof, and some work-arounds could have been optimized away by recent compilers; see for example MDEV-11349 commit 2/2 . Still, it is worth noting that I did not repeat the issue with the 10.1 commit from April 2016 nor with a July 2016 revision . It turns out that some debug code to decrypt a copy of the page immediately after encryption was added in MDEV-9931 , September 22, 2016 . I suspect that Valgrind would complain about decryption even with earlier versions, but that would require a different type of a test: start the server with encryption create and populate an encrypted table restart the server read from the encrypted table Do we want to track this down further? Do we want to add some VALGRIND_MAKE_MEM_DEFINED() to MariaDB, conditional on the libssl1.0.0 version, to suppress this? (I would definitely not want to suppress anything for the 1.0.2 and later versions of libssl1.0.0.)

Elena Stepanova added a comment - 2017-04-23 16:27 - edited

Indeed, I cannot reproduce it on Xenial with libssl 1.0.2g.

I can still reproduce it on Jessie with libssl 1.0.1t and valgrind 3.12.0 (and 3.10.0). It happens reliably when I run the test on disk, and much less reliably when I run it in shm – maybe it just does not do flush in time?

I'm still not sure what to do about it – to try to add a suppression, or just tolerate it since it does not show up in valgrind tests on buildbot, – but since it's clearly not a 10.2 problem and, according to Marko's comment above, not a critical problem, I'm demoting it from Critical to Minor and removing 10.2-ga label.

Elena Stepanova added a comment - 2017-04-23 16:27 - edited Indeed, I cannot reproduce it on Xenial with libssl 1.0.2g. I can still reproduce it on Jessie with libssl 1.0.1t and valgrind 3.12.0 (and 3.10.0). It happens reliably when I run the test on disk, and much less reliably when I run it in shm – maybe it just does not do flush in time? I'm still not sure what to do about it – to try to add a suppression, or just tolerate it since it does not show up in valgrind tests on buildbot, – but since it's clearly not a 10.2 problem and, according to Marko's comment above, not a critical problem, I'm demoting it from Critical to Minor and removing 10.2-ga label.

Marko Mäkelä added a comment - 2020-01-24 08:24

I wonder if this one could be closed.

Marko Mäkelä added a comment - 2020-01-24 08:24 I wonder if this one could be closed.

Elena Stepanova added a comment - 2020-02-16 21:31

The failure disappeared in two steps.
First, after this commit in 10.1.34,

commit f5eb37129f24893ab095e78c6fd2ef87e2c460cf

Author: Marko Mäkelä <marko.makela@mariadb.com>

Date:   Wed Jun 13 16:15:21 2018 +0300

    MDEV-13103 Deal with page_compressed page corruption

the first part went away and only the second part left:

==15757== Thread 16:

==15757== Conditional jump or move depends on uninitialised value(s)

==15757==    at 0x4C2ED52: __memcmp_sse4_1 (vg_replace_strmem.c:972)

==15757==    by 0xC62C79: fil_space_encrypt(fil_space_t const*, unsigned long, unsigned long, unsigned char*, unsigned char*) (fil0crypt.cc:745)

==15757==    by 0xBEA6E2: buf_page_encrypt_before_write(fil_space_t*, buf_page_t*, unsigned char*) (buf0buf.cc:6413)

==15757==    by 0xBF611B: buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) (buf0flu.cc:964)

==15757==    by 0xBF6727: buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) (buf0flu.cc:1140)

==15757==    by 0xBF6CF9: buf_flush_try_neighbors(unsigned long, unsigned long, buf_flush_t, unsigned long, unsigned long) (buf0flu.cc:1355)

==15757==    by 0xBF701E: buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) (buf0flu.cc:1443)

==15757==    by 0xBF7B08: buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) (buf0flu.cc:1772)

==15757==    by 0xBF7E9F: buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, bool, flush_counters_t*) (buf0flu.cc:1848)

==15757==    by 0xBF85CB: buf_flush_list(unsigned long, unsigned long, unsigned long*) (buf0flu.cc:2128)

==15757==    by 0xBF9281: page_cleaner_do_flush_batch(unsigned long, unsigned long) (buf0flu.cc:2443)

==15757==    by 0xBFA562: buf_flush_page_cleaner_thread (buf0flu.cc:2842)

==15757==    by 0x4E3D0A3: start_thread (pthread_create.c:309)

==15757==    by 0x6E7262C: clone (clone.S:111)

Then very recently, after this commit,

commit 0b36c27e0c06b798b7322ab07d8464b69a7b716c

Author: Marko Mäkelä <marko.makela@mariadb.com>

Date:   Fri Jan 31 10:06:55 2020 +0200

    MDEV-20307: Remove a useless debug check to save stack space

the remaining part disappeared as well, and the test now passes.

marko, if you're okay with both – that is, if it's an expected result and not just masking effect – please feel free to close it.

Elena Stepanova added a comment - 2020-02-16 21:31 The failure disappeared in two steps. First, after this commit in 10.1.34, commit f5eb37129f24893ab095e78c6fd2ef87e2c460cf Author: Marko Mäkelä <marko.makela@mariadb.com> Date: Wed Jun 13 16:15:21 2018 +0300 MDEV-13103 Deal with page_compressed page corruption the first part went away and only the second part left: ==15757== Thread 16: ==15757== Conditional jump or move depends on uninitialised value(s) ==15757== at 0x4C2ED52: __memcmp_sse4_1 (vg_replace_strmem.c:972) ==15757== by 0xC62C79: fil_space_encrypt(fil_space_t const*, unsigned long, unsigned long, unsigned char*, unsigned char*) (fil0crypt.cc:745) ==15757== by 0xBEA6E2: buf_page_encrypt_before_write(fil_space_t*, buf_page_t*, unsigned char*) (buf0buf.cc:6413) ==15757== by 0xBF611B: buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) (buf0flu.cc:964) ==15757== by 0xBF6727: buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) (buf0flu.cc:1140) ==15757== by 0xBF6CF9: buf_flush_try_neighbors(unsigned long, unsigned long, buf_flush_t, unsigned long, unsigned long) (buf0flu.cc:1355) ==15757== by 0xBF701E: buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) (buf0flu.cc:1443) ==15757== by 0xBF7B08: buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) (buf0flu.cc:1772) ==15757== by 0xBF7E9F: buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, bool, flush_counters_t*) (buf0flu.cc:1848) ==15757== by 0xBF85CB: buf_flush_list(unsigned long, unsigned long, unsigned long*) (buf0flu.cc:2128) ==15757== by 0xBF9281: page_cleaner_do_flush_batch(unsigned long, unsigned long) (buf0flu.cc:2443) ==15757== by 0xBFA562: buf_flush_page_cleaner_thread (buf0flu.cc:2842) ==15757== by 0x4E3D0A3: start_thread (pthread_create.c:309) ==15757== by 0x6E7262C: clone (clone.S:111) Then very recently, after this commit, commit 0b36c27e0c06b798b7322ab07d8464b69a7b716c Author: Marko Mäkelä <marko.makela@mariadb.com> Date: Fri Jan 31 10:06:55 2020 +0200 MDEV-20307: Remove a useless debug check to save stack space the remaining part disappeared as well, and the test now passes. marko , if you're okay with both – that is, if it's an expected result and not just masking effect – please feel free to close it.

Marko Mäkelä added a comment - 2021-04-26 12:47

elenst, thank you for the observation. In ~~MDEV-20307~~ I indeed removed the debug check from fil_space_encrypt() that would ensure that the page decompresses correctly. The check was useless, because we do have enough test coverage where an encrypted page will be read and decrypted from a data file. Any test that involves server restart or recovery or backup of encrypted tables should do that.

Valgrind was not happy about that debug check, either because its V bits tracking got confused by the encryption code, or because there indeed was something wrong with that check. Either way, the code has been removed now.

I know that Valgrind has correctness problems with some bitwise operations, and my attempts at working around those problems only seem to work on GCC, not recent versions of clang. An example of attempting to please Valgrind when using clang is ~~MDEV-11349~~.

Marko Mäkelä added a comment - 2021-04-26 12:47 elenst , thank you for the observation. In MDEV-20307 I indeed removed the debug check from fil_space_encrypt() that would ensure that the page decompresses correctly. The check was useless, because we do have enough test coverage where an encrypted page will be read and decrypted from a data file. Any test that involves server restart or recovery or backup of encrypted tables should do that. Valgrind was not happy about that debug check, either because its V bits tracking got confused by the encryption code, or because there indeed was something wrong with that check. Either way, the code has been removed now. I know that Valgrind has correctness problems with some bitwise operations, and my attempts at working around those problems only seem to work on GCC, not recent versions of clang . An example of attempting to please Valgrind when using clang is MDEV-11349 .

MariaDB Server

Multiple encryption tests fail in buildbot with valgrind warnings (Conditional jump or move depends on uninitialised value)

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration