Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11686

Multiple encryption tests fail in buildbot with valgrind warnings (Conditional jump or move depends on uninitialised value)

Details

    • 10.2.4-1, 10.2.4-2

    Description

      The new valgrind builder is still experimental, but the failure is reproducible locally for me, so I don't think it's the builder's fault.

      10.1 23cc1be270c7304963643947d8e5ab88f4e312ee

      encryption.innodb_encryption_is 'cbc,xtradb' [ fail ]  Found warnings/errors in server log file!
              Test ended at 2016-12-30 00:17:28
      line
      ==24755== Thread 16:
      ==24755== Conditional jump or move depends on uninitialised value(s)
      ==24755==    at 0xC2A7FD: buf_page_is_checksum_valid_innodb(unsigned char const*, unsigned long, unsigned long) (buf0buf.cc:653)
      ==24755==    by 0xC2AC60: buf_page_is_corrupted(bool, unsigned char const*, unsigned long) (buf0buf.cc:859)
      ==24755==    by 0xCB74C3: fil_space_encrypt(unsigned long, unsigned long, unsigned long, unsigned char*, unsigned long, unsigned char*) (fil0crypt.cc:697)
      ==24755==    by 0xC37059: buf_page_encrypt_before_write(buf_page_t*, unsigned char*, unsigned long) (buf0buf.cc:6366)
      ==24755==    by 0xC4304E: buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) (buf0flu.cc:950)
      ==24755==    by 0xC435B7: buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) (buf0flu.cc:1109)
      ==24755==    by 0xC43B89: buf_flush_try_neighbors(unsigned long, unsigned long, buf_flush_t, unsigned long, unsigned long) (buf0flu.cc:1324)
      ==24755==    by 0xC43EAE: buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) (buf0flu.cc:1412)
      ==24755==    by 0xC4499B: buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) (buf0flu.cc:1741)
      ==24755==    by 0xC44D35: buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, bool, flush_counters_t*) (buf0flu.cc:1817)
      ==24755==    by 0xC45461: buf_flush_list(unsigned long, unsigned long, unsigned long*) (buf0flu.cc:2097)
      ==24755==    by 0xC460F4: page_cleaner_do_flush_batch(unsigned long, unsigned long) (buf0flu.cc:2410)
      ==24755==    by 0xC47376: buf_flush_page_cleaner_thread (buf0flu.cc:2792)
      ==24755==    by 0x4E3D0A3: start_thread (pthread_create.c:309)
      ==24755==    by 0x6CB787C: clone (clone.S:111)
      ==24755== Conditional jump or move depends on uninitialised value(s)
      ==24755==    at 0x4C2ED52: __memcmp_sse4_1 (vg_replace_strmem.c:972)
      ==24755==    by 0xCB74E3: fil_space_encrypt(unsigned long, unsigned long, unsigned long, unsigned char*, unsigned long, unsigned char*) (fil0crypt.cc:698)
      ==24755==    by 0xC37059: buf_page_encrypt_before_write(buf_page_t*, unsigned char*, unsigned long) (buf0buf.cc:6366)
      ==24755==    by 0xC4304E: buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) (buf0flu.cc:950)
      ==24755==    by 0xC435B7: buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) (buf0flu.cc:1109)
      ==24755==    by 0xC43B89: buf_flush_try_neighbors(unsigned long, unsigned long, buf_flush_t, unsigned long, unsigned long) (buf0flu.cc:1324)
      ==24755==    by 0xC43EAE: buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) (buf0flu.cc:1412)
      ==24755==    by 0xC4499B: buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) (buf0flu.cc:1741)
      ==24755==    by 0xC44D35: buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, bool, flush_counters_t*) (buf0flu.cc:1817)
      ==24755==    by 0xC45461: buf_flush_list(unsigned long, unsigned long, unsigned long*) (buf0flu.cc:2097)
      ==24755==    by 0xC460F4: page_cleaner_do_flush_batch(unsigned long, unsigned long) (buf0flu.cc:2410)
      ==24755==    by 0xC47376: buf_flush_page_cleaner_thread (buf0flu.cc:2792)
      ==24755==    by 0x4E3D0A3: start_thread (pthread_create.c:309)
      ==24755==    by 0x6CB787C: clone (clone.S:111)
      

      Attachments

        Issue Links

          Activity

            elenst, indeed, if I try to start the test with ./mtr --valgrind (which I would never use when debugging), then valgrind 3.7.0 will indeed complain about the unrecognized option --soname-synonyms which was added already in 2015.

            It looks like I used an almost year-old revision of 10.1.14 where I was unable to repeat the problem.
            I can repeat the problem with 10.1-mdev11686 on perro, when running --manual-gdb and starting Valgrind+gdb instead of gdb.
            Just like with the older revision that I tested, there is a fault at startup, which might be suppressed in --valgrind (I did not use any suppressions). And according to Valgrind, the crypt_data.iv is fully set.

            …
            2017-03-06 19:17:54 366200576 [Note] InnoDB: Dumping buffer pool(s) not yet started
            ==57931== Conditional jump or move depends on uninitialised value(s)
            ==57931==    at 0x5C385BC: ASN1_STRING_set (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
            ==57931==    by 0x5C262AC: ASN1_mbstring_ncopy (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
            ==57931==    by 0x5C264A3: ASN1_mbstring_copy (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
            ==57931==    by 0x5C2740C: ASN1_STRING_to_UTF8 (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
            …
            ==57931== Continuing ...
            2017-03-06 19:18:15 67314496 [Note] Server socket created on IP: '127.0.0.1'.
            2017-03-06 19:18:15 67314496 [Note] /home/mariadb/git/10.1-mdev11686/sql/mysqld: ready for connections.
            Version: '10.1.22-MariaDB-debug'  socket: '/home/mariadb/git/10.1-mdev11686/mysql-test/var/tmp/mysqld.1.sock'  port: 16000  Source distribution
            2017-03-06 19:18:16 68254464 [Note] InnoDB: Created tablespace for space 4 name test/t1 key_id 1 encryption 1.
            ==57931== Thread 10:
            ==57931== Conditional jump or move depends on uninitialised value(s)
            ==57931==    at 0xC519F40: buf_page_is_checksum_valid_innodb(unsigned char const*, unsigned long, unsigned long) (buf0buf.cc:572)
            …
            Program received signal SIGTRAP, Trace/breakpoint trap.
            [Switching to Thread 58112]
            0x000000000c519f40 in buf_page_is_checksum_valid_innodb (
                read_buf=0xd61a780 "!\265\224\222", checksum_field1=565548178, 
                checksum_field2=4272144276)
                at /home/mariadb/git/10.1-mdev11686/storage/innobase/buf/buf0buf.cc:572
            572		    && checksum_field1 != buf_calc_page_new_checksum(read_buf)) {
            (gdb) up
            #1  0x000000000c51a440 in buf_page_is_corrupted (check_lsn=true, 
                read_buf=0xd61a780 "!\265\224\222", zip_size=0, space=0xd60d468)
                at /home/mariadb/git/10.1-mdev11686/storage/innobase/buf/buf0buf.cc:780
            780			if (buf_page_is_checksum_valid_innodb(read_buf,
            (gdb) up 
            #2  0x000000000c5a86c0 in fil_space_encrypt (space=4, offset=1, lsn=1629842, 
                src_frame=0xe7b0000 "!\265\224\222", zip_size=0, 
                dst_frame=0xd614000 "!\265\224\222")
                at /home/mariadb/git/10.1-mdev11686/storage/innobase/fil/fil0crypt.cc:700
            700			bool corrupted = buf_page_is_corrupted(true, tmp_mem, zip_size, tspace);
            (gdb) p crypt_data.iv
            $1 = "\245\373\302\n&!=\202MWL\243\357)8}"
            (gdb) p crypt_data   
            $2 = (fil_space_crypt_t *) 0x11dbe308
            (gdb) p &crypt_data.iv
            $3 = (unsigned char (*)[16]) 0x11dbe308
            (gdb) monitor get_vbits 0x11dbe308 16
            00000000 00000000 00000000 00000000
            (gdb) monitor get_vbits 0xe7b000 16384
            [snip an output of 16384 zeros, indicating that src_frame in fil_space_encrypt() is fully initialized]
            (gdb) p/x *src_frame@16384
            $5 = {0x21, 0xb5, 0x94, 0x92, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 13 times>, 
              0x18, 0xde, 0x92, 0x0, 0x5, 0x0 <repeats 11 times>, 0x4, 
              0x0 <repeats 16338 times>, 0xfe, 0xa3, 0xbf, 0x94, 0x0, 0x18, 0xde, 0x92}
            (gdb) p/x *tmp_mem@16384
            $6 = {0x21, 0xb5, 0x94, 0x92, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 13 times>, 
              0x18, 0xde, 0x92, 0x0, 0x5, 0x0 <repeats 11 times>, 0x4, 
              0x0 <repeats 16338 times>, 0xfe, 0xa3, 0xbf, 0x94, 0x0, 0x18, 0xde, 0x92}
            # Note that the two buffers above are identical!
            (gdb) p tmp_mem
            $7 = (unsigned char *) 0xd61a780 "!\265\224\222"
            (gdb) monitor get_vbits 0xd61a780 16384
            00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
            00000000 0000ffff ffffffff ffffffff ffffffff ffff0000 00000000 00000000
            00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
            00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
            …
            

            This still looks like a wrong alarm, just like last time. This time I only confirmed that the cause cannot be that Valgrind would think that the initialization vector contains uninitialized bits.

            I believe that some arithmetic operations in the libssl 1.0.1 AES implementation could be confusing the V-bit bookkeeping of Valgrind. The bookkeeping is not foolproof, and some work-arounds could have been optimized away by recent compilers; see for example MDEV-11349 commit 2/2.

            Still, it is worth noting that I did not repeat the issue with the 10.1 commit from April 2016 nor with a July 2016 revision. It turns out that some debug code to decrypt a copy of the page immediately after encryption was added in MDEV-9931, September 22, 2016.
            I suspect that Valgrind would complain about decryption even with earlier versions, but that would require a different type of a test:

            1. start the server with encryption
            2. create and populate an encrypted table
            3. restart the server
            4. read from the encrypted table

            Do we want to track this down further? Do we want to add some VALGRIND_MAKE_MEM_DEFINED() to MariaDB, conditional on the libssl1.0.0 version, to suppress this? (I would definitely not want to suppress anything for the 1.0.2 and later versions of libssl1.0.0.)

            marko Marko Mäkelä added a comment - elenst , indeed, if I try to start the test with ./mtr --valgrind (which I would never use when debugging), then valgrind 3.7.0 will indeed complain about the unrecognized option --soname-synonyms which was added already in 2015. It looks like I used an almost year-old revision of 10.1.14 where I was unable to repeat the problem. I can repeat the problem with 10.1-mdev11686 on perro, when running --manual-gdb and starting Valgrind+gdb instead of gdb. Just like with the older revision that I tested, there is a fault at startup, which might be suppressed in --valgrind (I did not use any suppressions). And according to Valgrind, the crypt_data.iv is fully set. … 2017-03-06 19:17:54 366200576 [Note] InnoDB: Dumping buffer pool(s) not yet started ==57931== Conditional jump or move depends on uninitialised value(s) ==57931== at 0x5C385BC: ASN1_STRING_set (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0) ==57931== by 0x5C262AC: ASN1_mbstring_ncopy (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0) ==57931== by 0x5C264A3: ASN1_mbstring_copy (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0) ==57931== by 0x5C2740C: ASN1_STRING_to_UTF8 (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0) … ==57931== Continuing ... 2017-03-06 19:18:15 67314496 [Note] Server socket created on IP: '127.0.0.1'. 2017-03-06 19:18:15 67314496 [Note] /home/mariadb/git/10.1-mdev11686/sql/mysqld: ready for connections. Version: '10.1.22-MariaDB-debug' socket: '/home/mariadb/git/10.1-mdev11686/mysql-test/var/tmp/mysqld.1.sock' port: 16000 Source distribution 2017-03-06 19:18:16 68254464 [Note] InnoDB: Created tablespace for space 4 name test/t1 key_id 1 encryption 1. ==57931== Thread 10: ==57931== Conditional jump or move depends on uninitialised value(s) ==57931== at 0xC519F40: buf_page_is_checksum_valid_innodb(unsigned char const*, unsigned long, unsigned long) (buf0buf.cc:572) … Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to Thread 58112] 0x000000000c519f40 in buf_page_is_checksum_valid_innodb ( read_buf=0xd61a780 "!\265\224\222", checksum_field1=565548178, checksum_field2=4272144276) at /home/mariadb/git/10.1-mdev11686/storage/innobase/buf/buf0buf.cc:572 572 && checksum_field1 != buf_calc_page_new_checksum(read_buf)) { (gdb) up #1 0x000000000c51a440 in buf_page_is_corrupted (check_lsn=true, read_buf=0xd61a780 "!\265\224\222", zip_size=0, space=0xd60d468) at /home/mariadb/git/10.1-mdev11686/storage/innobase/buf/buf0buf.cc:780 780 if (buf_page_is_checksum_valid_innodb(read_buf, (gdb) up #2 0x000000000c5a86c0 in fil_space_encrypt (space=4, offset=1, lsn=1629842, src_frame=0xe7b0000 "!\265\224\222", zip_size=0, dst_frame=0xd614000 "!\265\224\222") at /home/mariadb/git/10.1-mdev11686/storage/innobase/fil/fil0crypt.cc:700 700 bool corrupted = buf_page_is_corrupted(true, tmp_mem, zip_size, tspace); (gdb) p crypt_data.iv $1 = "\245\373\302\n&!=\202MWL\243\357)8}" (gdb) p crypt_data $2 = (fil_space_crypt_t *) 0x11dbe308 (gdb) p &crypt_data.iv $3 = (unsigned char (*)[16]) 0x11dbe308 (gdb) monitor get_vbits 0x11dbe308 16 00000000 00000000 00000000 00000000 (gdb) monitor get_vbits 0xe7b000 16384 [snip an output of 16384 zeros, indicating that src_frame in fil_space_encrypt() is fully initialized] (gdb) p/x *src_frame@16384 $5 = {0x21, 0xb5, 0x94, 0x92, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 13 times>, 0x18, 0xde, 0x92, 0x0, 0x5, 0x0 <repeats 11 times>, 0x4, 0x0 <repeats 16338 times>, 0xfe, 0xa3, 0xbf, 0x94, 0x0, 0x18, 0xde, 0x92} (gdb) p/x *tmp_mem@16384 $6 = {0x21, 0xb5, 0x94, 0x92, 0x0, 0x0, 0x0, 0x1, 0x0 <repeats 13 times>, 0x18, 0xde, 0x92, 0x0, 0x5, 0x0 <repeats 11 times>, 0x4, 0x0 <repeats 16338 times>, 0xfe, 0xa3, 0xbf, 0x94, 0x0, 0x18, 0xde, 0x92} # Note that the two buffers above are identical! (gdb) p tmp_mem $7 = (unsigned char *) 0xd61a780 "!\265\224\222" (gdb) monitor get_vbits 0xd61a780 16384 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000ffff ffffffff ffffffff ffffffff ffff0000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 … This still looks like a wrong alarm, just like last time. This time I only confirmed that the cause cannot be that Valgrind would think that the initialization vector contains uninitialized bits. I believe that some arithmetic operations in the libssl 1.0.1 AES implementation could be confusing the V-bit bookkeeping of Valgrind. The bookkeeping is not foolproof, and some work-arounds could have been optimized away by recent compilers; see for example MDEV-11349 commit 2/2 . Still, it is worth noting that I did not repeat the issue with the 10.1 commit from April 2016 nor with a July 2016 revision . It turns out that some debug code to decrypt a copy of the page immediately after encryption was added in MDEV-9931 , September 22, 2016 . I suspect that Valgrind would complain about decryption even with earlier versions, but that would require a different type of a test: start the server with encryption create and populate an encrypted table restart the server read from the encrypted table Do we want to track this down further? Do we want to add some VALGRIND_MAKE_MEM_DEFINED() to MariaDB, conditional on the libssl1.0.0 version, to suppress this? (I would definitely not want to suppress anything for the 1.0.2 and later versions of libssl1.0.0.)
            elenst Elena Stepanova added a comment - - edited

            Indeed, I cannot reproduce it on Xenial with libssl 1.0.2g.

            I can still reproduce it on Jessie with libssl 1.0.1t and valgrind 3.12.0 (and 3.10.0). It happens reliably when I run the test on disk, and much less reliably when I run it in shm – maybe it just does not do flush in time?

            I'm still not sure what to do about it – to try to add a suppression, or just tolerate it since it does not show up in valgrind tests on buildbot, – but since it's clearly not a 10.2 problem and, according to Marko's comment above, not a critical problem, I'm demoting it from Critical to Minor and removing 10.2-ga label.

            elenst Elena Stepanova added a comment - - edited Indeed, I cannot reproduce it on Xenial with libssl 1.0.2g. I can still reproduce it on Jessie with libssl 1.0.1t and valgrind 3.12.0 (and 3.10.0). It happens reliably when I run the test on disk, and much less reliably when I run it in shm – maybe it just does not do flush in time? I'm still not sure what to do about it – to try to add a suppression, or just tolerate it since it does not show up in valgrind tests on buildbot, – but since it's clearly not a 10.2 problem and, according to Marko's comment above, not a critical problem, I'm demoting it from Critical to Minor and removing 10.2-ga label.

            I wonder if this one could be closed.

            marko Marko Mäkelä added a comment - I wonder if this one could be closed.

            The failure disappeared in two steps.
            First, after this commit in 10.1.34,

            commit f5eb37129f24893ab095e78c6fd2ef87e2c460cf
            Author: Marko Mäkelä <marko.makela@mariadb.com>
            Date:   Wed Jun 13 16:15:21 2018 +0300
             
                MDEV-13103 Deal with page_compressed page corruption
            

            the first part went away and only the second part left:

            ==15757== Thread 16:
            ==15757== Conditional jump or move depends on uninitialised value(s)
            ==15757==    at 0x4C2ED52: __memcmp_sse4_1 (vg_replace_strmem.c:972)
            ==15757==    by 0xC62C79: fil_space_encrypt(fil_space_t const*, unsigned long, unsigned long, unsigned char*, unsigned char*) (fil0crypt.cc:745)
            ==15757==    by 0xBEA6E2: buf_page_encrypt_before_write(fil_space_t*, buf_page_t*, unsigned char*) (buf0buf.cc:6413)
            ==15757==    by 0xBF611B: buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) (buf0flu.cc:964)
            ==15757==    by 0xBF6727: buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) (buf0flu.cc:1140)
            ==15757==    by 0xBF6CF9: buf_flush_try_neighbors(unsigned long, unsigned long, buf_flush_t, unsigned long, unsigned long) (buf0flu.cc:1355)
            ==15757==    by 0xBF701E: buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) (buf0flu.cc:1443)
            ==15757==    by 0xBF7B08: buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) (buf0flu.cc:1772)
            ==15757==    by 0xBF7E9F: buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, bool, flush_counters_t*) (buf0flu.cc:1848)
            ==15757==    by 0xBF85CB: buf_flush_list(unsigned long, unsigned long, unsigned long*) (buf0flu.cc:2128)
            ==15757==    by 0xBF9281: page_cleaner_do_flush_batch(unsigned long, unsigned long) (buf0flu.cc:2443)
            ==15757==    by 0xBFA562: buf_flush_page_cleaner_thread (buf0flu.cc:2842)
            ==15757==    by 0x4E3D0A3: start_thread (pthread_create.c:309)
            ==15757==    by 0x6E7262C: clone (clone.S:111)
            

            Then very recently, after this commit,

            commit 0b36c27e0c06b798b7322ab07d8464b69a7b716c
            Author: Marko Mäkelä <marko.makela@mariadb.com>
            Date:   Fri Jan 31 10:06:55 2020 +0200
             
                MDEV-20307: Remove a useless debug check to save stack space
            

            the remaining part disappeared as well, and the test now passes.

            marko, if you're okay with both – that is, if it's an expected result and not just masking effect – please feel free to close it.

            elenst Elena Stepanova added a comment - The failure disappeared in two steps. First, after this commit in 10.1.34, commit f5eb37129f24893ab095e78c6fd2ef87e2c460cf Author: Marko Mäkelä <marko.makela@mariadb.com> Date: Wed Jun 13 16:15:21 2018 +0300   MDEV-13103 Deal with page_compressed page corruption the first part went away and only the second part left: ==15757== Thread 16: ==15757== Conditional jump or move depends on uninitialised value(s) ==15757== at 0x4C2ED52: __memcmp_sse4_1 (vg_replace_strmem.c:972) ==15757== by 0xC62C79: fil_space_encrypt(fil_space_t const*, unsigned long, unsigned long, unsigned char*, unsigned char*) (fil0crypt.cc:745) ==15757== by 0xBEA6E2: buf_page_encrypt_before_write(fil_space_t*, buf_page_t*, unsigned char*) (buf0buf.cc:6413) ==15757== by 0xBF611B: buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) (buf0flu.cc:964) ==15757== by 0xBF6727: buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) (buf0flu.cc:1140) ==15757== by 0xBF6CF9: buf_flush_try_neighbors(unsigned long, unsigned long, buf_flush_t, unsigned long, unsigned long) (buf0flu.cc:1355) ==15757== by 0xBF701E: buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) (buf0flu.cc:1443) ==15757== by 0xBF7B08: buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) (buf0flu.cc:1772) ==15757== by 0xBF7E9F: buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, bool, flush_counters_t*) (buf0flu.cc:1848) ==15757== by 0xBF85CB: buf_flush_list(unsigned long, unsigned long, unsigned long*) (buf0flu.cc:2128) ==15757== by 0xBF9281: page_cleaner_do_flush_batch(unsigned long, unsigned long) (buf0flu.cc:2443) ==15757== by 0xBFA562: buf_flush_page_cleaner_thread (buf0flu.cc:2842) ==15757== by 0x4E3D0A3: start_thread (pthread_create.c:309) ==15757== by 0x6E7262C: clone (clone.S:111) Then very recently, after this commit, commit 0b36c27e0c06b798b7322ab07d8464b69a7b716c Author: Marko Mäkelä <marko.makela@mariadb.com> Date: Fri Jan 31 10:06:55 2020 +0200   MDEV-20307: Remove a useless debug check to save stack space the remaining part disappeared as well, and the test now passes. marko , if you're okay with both – that is, if it's an expected result and not just masking effect – please feel free to close it.

            elenst, thank you for the observation. In MDEV-20307 I indeed removed the debug check from fil_space_encrypt() that would ensure that the page decompresses correctly. The check was useless, because we do have enough test coverage where an encrypted page will be read and decrypted from a data file. Any test that involves server restart or recovery or backup of encrypted tables should do that.

            Valgrind was not happy about that debug check, either because its V bits tracking got confused by the encryption code, or because there indeed was something wrong with that check. Either way, the code has been removed now.

            I know that Valgrind has correctness problems with some bitwise operations, and my attempts at working around those problems only seem to work on GCC, not recent versions of clang. An example of attempting to please Valgrind when using clang is MDEV-11349.

            marko Marko Mäkelä added a comment - elenst , thank you for the observation. In MDEV-20307 I indeed removed the debug check from fil_space_encrypt() that would ensure that the page decompresses correctly. The check was useless, because we do have enough test coverage where an encrypted page will be read and decrypted from a data file. Any test that involves server restart or recovery or backup of encrypted tables should do that. Valgrind was not happy about that debug check, either because its V bits tracking got confused by the encryption code, or because there indeed was something wrong with that check. Either way, the code has been removed now. I know that Valgrind has correctness problems with some bitwise operations, and my attempts at working around those problems only seem to work on GCC, not recent versions of clang . An example of attempting to please Valgrind when using clang is MDEV-11349 .

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.