[MDEV-12737] tokudb_mariadb.mdev6657 fails in buildbot with different plan, and outside with valgrind warnings Created: 2017-05-08  Updated: 2018-03-20  Resolved: 2018-03-20

Status: Closed
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - TokuDB, Tests
Affects Version/s: 10.1, 10.2
Fix Version/s: 10.1.32

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Sprint: 10.1.32

 Description   

This happens on the valgrind builder in buildbot:
http://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/10030/steps/test/logs/stdio

tokudb_mariadb.mdev6657                  w2 [ fail ]
        Test ended at 2017-05-02 16:47:21
 
CURRENT_TEST: tokudb_mariadb.mdev6657
--- /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/storage/tokudb/mysql-test/tokudb_mariadb/r/mdev6657.result	2017-04-30 22:23:54.359096283 +0300
+++ /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/storage/tokudb/mysql-test/tokudb_mariadb/r/mdev6657.reject	2017-05-02 16:47:21.529139117 +0300
@@ -34,7 +34,7 @@
 where col1 <= 1410799999 
 order by col1 desc,col2 desc,col3 desc limit 1;
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t3	range	pk_ersatz,key1	pk_ersatz	4	NULL	2001	Using where; Using index
+1	SIMPLE	t3	range	pk_ersatz,key1	pk_ersatz	4	NULL	2797	Using where; Using index
 # The same query but the constant is bigger. 
 # The query should use range(PRIMARY), not full index scan:
 explain 
@@ -43,5 +43,5 @@
 where col1 <= 1412199999 
 order by col1 desc, col2 desc, col3 desc limit 1;
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t3	range	pk_ersatz,key1	pk_ersatz	4	NULL	15001	Using where; Using index
+1	SIMPLE	t3	range	pk_ersatz,key1	pk_ersatz	4	NULL	19777	Using where; Using index
 drop table t1,t2,t3;
 
mysqltest: Result content mismatch

We had a similar problem with this test before, MDEV-6846, but it was fixed long time ago.

When I run this test with valgrind locally, I get valgrind warnings instead:

8773f14677f492bcde06647a111732baf226f5ba

==29663== Thread 22:
==29663== Conditional jump or move depends on uninitialised value(s)
==29663==    at 0xBFACE0D: lz_encoder_prepare (lz_encoder.c:222)
==29663==    by 0xBFAD4CA: lzma_lz_encoder_init (lz_encoder.c:516)
==29663==    by 0xBFA420E: lzma_raw_coder_init (filter_common.c:212)
==29663==    by 0xBFA16E4: block_encode_normal (block_buffer_encoder.c:192)
==29663==    by 0xBFA16E4: lzma_block_buffer_encode (block_buffer_encoder.c:258)
==29663==    by 0xBF86878: lzma_stream_buffer_encode (stream_buffer_encoder.c:93)
==29663==    by 0xBF86523: lzma_easy_buffer_encode (easy_buffer_encoder.c:27)
==29663==    by 0xBF68428: toku_compress(toku_compression_method, unsigned char*, unsigned long*, unsigned char const*, unsigned long) (compress.cc:141)
==29663==    by 0xBF735E6: compress_nocrc_sub_block(sub_block*, void*, unsigned int, toku_compression_method) (sub_block.cc:204)
==29663==    by 0xBEF617E: compress_ftnode_sub_block(sub_block*, toku_compression_method) (ft_node-serialize.cc:387)
==29663==    by 0xBEF6528: serialize_and_compress_partition(ftnode*, int, toku_compression_method, sub_block*, serialize_times*) (ft_node-serialize.cc:494)
==29663==    by 0xBEF66E2: serialize_and_compress_serially(ftnode*, int, toku_compression_method, sub_block*, serialize_times*) (ft_node-serialize.cc:542)
==29663==    by 0xBEF719A: toku_serialize_ftnode_to_memory(ftnode*, ftnode_disk_data**, unsigned int, toku_compression_method, bool, bool, unsigned long*, unsigned long*, char**) (ft_node-serialize.cc:686)
==29663==    by 0xBEF77D2: toku_serialize_ftnode_to(int, blocknum_s, ftnode*, ftnode_disk_data**, bool, ft*, bool) (ft_node-serialize.cc:807)
==29663==    by 0xBF54443: toku_ftnode_flush_callback(cachefile*, int, blocknum_s, void*, void**, void*, pair_attr_s, pair_attr_s*, bool, bool, bool, bool) (ft-ops.cc:694)
==29663==    by 0xBF2994C: cachetable_only_write_locked_data(evictor*, ctpair*, bool, pair_attr_s*, bool) (cachetable.cc:658)
==29663==    by 0xBF2D03D: cachetable_flush_pair_for_close(void*) (cachetable.cc:2283)
==29663== Conditional jump or move depends on uninitialised value(s)
==29663==    at 0xBFACF1D: lz_encoder_prepare (lz_encoder.c:344)
==29663==    by 0xBFAD4CA: lzma_lz_encoder_init (lz_encoder.c:516)
==29663==    by 0xBFA420E: lzma_raw_coder_init (filter_common.c:212)
==29663==    by 0xBFA16E4: block_encode_normal (block_buffer_encoder.c:192)
==29663==    by 0xBFA16E4: lzma_block_buffer_encode (block_buffer_encoder.c:258)
==29663==    by 0xBF86878: lzma_stream_buffer_encode (stream_buffer_encoder.c:93)
==29663==    by 0xBF86523: lzma_easy_buffer_encode (easy_buffer_encoder.c:27)
==29663==    by 0xBF68428: toku_compress(toku_compression_method, unsigned char*, unsigned long*, unsigned char const*, unsigned long) (compress.cc:141)
==29663==    by 0xBF735E6: compress_nocrc_sub_block(sub_block*, void*, unsigned int, toku_compression_method) (sub_block.cc:204)
==29663==    by 0xBEF617E: compress_ftnode_sub_block(sub_block*, toku_compression_method) (ft_node-serialize.cc:387)
==29663==    by 0xBEF6528: serialize_and_compress_partition(ftnode*, int, toku_compression_method, sub_block*, serialize_times*) (ft_node-serialize.cc:494)
==29663==    by 0xBEF66E2: serialize_and_compress_serially(ftnode*, int, toku_compression_method, sub_block*, serialize_times*) (ft_node-serialize.cc:542)
==29663==    by 0xBEF719A: toku_serialize_ftnode_to_memory(ftnode*, ftnode_disk_data**, unsigned int, toku_compression_method, bool, bool, unsigned long*, unsigned long*, char**) (ft_node-serialize.cc:686)
==29663==    by 0xBEF77D2: toku_serialize_ftnode_to(int, blocknum_s, ftnode*, ftnode_disk_data**, bool, ft*, bool) (ft_node-serialize.cc:807)
==29663==    by 0xBF54443: toku_ftnode_flush_callback(cachefile*, int, blocknum_s, void*, void**, void*, pair_attr_s, pair_attr_s*, bool, bool, bool, bool) (ft-ops.cc:694)
==29663==    by 0xBF2994C: cachetable_only_write_locked_data(evictor*, ctpair*, bool, pair_attr_s*, bool) (cachetable.cc:658)
==29663==    by 0xBF2D03D: cachetable_flush_pair_for_close(void*) (cachetable.cc:2283)



 Comments   
Comment by Elena Stepanova [ 2017-05-24 ]

Occurrence in 10.1:
http://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/10064/steps/test/logs/stdio

tokudb_mariadb.mdev6657                  w4 [ fail ]
        Test ended at 2017-05-23 07:23:00
 
CURRENT_TEST: tokudb_mariadb.mdev6657
--- /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/storage/tokudb/mysql-test/tokudb_mariadb/r/mdev6657.result	2017-05-21 05:40:31.933504724 +0300
+++ /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/storage/tokudb/mysql-test/tokudb_mariadb/r/mdev6657.reject	2017-05-23 07:22:59.988631453 +0300
@@ -34,7 +34,7 @@
 where col1 <= 1410799999 
 order by col1 desc,col2 desc,col3 desc limit 1;
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t3	range	pk_ersatz,key1	pk_ersatz	4	NULL	2001	Using where; Using index
+1	SIMPLE	t3	range	pk_ersatz,key1	pk_ersatz	4	NULL	8512	Using where; Using index
 # The same query but the constant is bigger. 
 # The query should use range(PRIMARY), not full index scan:
 explain 
@@ -43,5 +43,5 @@
 where col1 <= 1412199999 
 order by col1 desc, col2 desc, col3 desc limit 1;
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t3	range	pk_ersatz,key1	pk_ersatz	4	NULL	15001	Using where; Using index
+1	SIMPLE	t3	range	pk_ersatz,key1	pk_ersatz	4	NULL	21512	Using where; Using index
 drop table t1,t2,t3;
 
mysqltest: Result content mismatch

Comment by Sergei Petrunia [ 2017-06-27 ]
  • I am not sure whether a difference in rows estimate for range access is a symptom of a bug for TokuDB. Anybody knows how records_in_range works in TokuDB? (Do they have --replace_column before all EXPLAINs like InnoDB does?)
  • For the valgrind warning part - can we get a full valgrind error output and try to guess whether this is a genuine problem or just some "end-of-buffer" garbage being packed? (Did Percona Server have a similar bug?)
Comment by Varun Gupta (Inactive) [ 2017-07-09 ]

For the valgrind warning, I see the same bug open in percona

https://jira.percona.com/browse/TDB-48

Comment by Varun Gupta (Inactive) [ 2017-09-15 ]

the commit stated in the description is
commit 90c52e5291b3ad0935df7da56ec0fcbf530733b4
Author: Jan Lindström <jan.lindstrom@mariadb.com>
Date:   Sat May 20 21:49:35 2017 +0300
 
    MDEV-12615: InnoDB page compression method snappy mostly does not compress pages

The warnings that I currently see for the above case

==21170== Invalid read of size 8
==21170==    at 0xEB205B: my_thread_dbug_id (my_thr_init.c:414)
==21170==    by 0xEC0504: default_sf_malloc_dbug_id (safemalloc.c:88)
==21170==    by 0xEC05B4: sf_malloc (safemalloc.c:129)
==21170==    by 0xEADC70: my_malloc (my_malloc.c:101)
==21170==    by 0xE8590D: init_dynamic_array2 (array.c:70)
==21170==    by 0x7D00A6: Dynamic_array<wsrep_member_info>::init(unsigned int, unsigned int) (sql_array.h:117)
==21170==    by 0x7D000D: Dynamic_array<wsrep_member_info>::Dynamic_array(unsigned int, unsigned int) (sql_array.h:105)
==21170==    by 0x7CFDDB: wsp::Config_state::Config_state() (wsrep_utils.h:173)
==21170==    by 0x7CF891: __static_initialization_and_destruction_0(int, int) (wsrep_mysqld.cc:208)
==21170==    by 0x7CF8C6: _GLOBAL__sub_I_wsrep_mysqld.cc (wsrep_mysqld.cc:2653)
==21170==    by 0xF1258C: __libc_csu_init (elf-init.c:88)
==21170==    by 0x6A63B74: (below main) (in /lib64/libc-2.18.so)
==21170==  Address 0x6e5c148 is 8 bytes inside a block of size 33 free'd
==21170==    at 0x4C291ED: free (vg_replace_malloc.c:530)
==21170==    by 0x5D53CD7: ??? (in /lib64/libcrypto.so.1.0.0)
==21170==    by 0x5D53DB4: ??? (in /lib64/libcrypto.so.1.0.0)
==21170==    by 0x5D54393: FIPS_module_mode_set (in /lib64/libcrypto.so.1.0.0)
==21170==    by 0x5BF775C: FIPS_mode_set (in /lib64/libcrypto.so.1.0.0)
==21170==    by 0x5BF3B19: OPENSSL_init_library (in /lib64/libcrypto.so.1.0.0)
==21170==    by 0x400E859: call_init.part.0 (in /lib64/ld-2.18.so)
==21170==    by 0x400E942: _dl_init (in /lib64/ld-2.18.so)
==21170==    by 0x40011C9: ??? (in /lib64/ld-2.18.so)
==21170==    by 0x9: ???
==21170==    by 0xFFEFFE2FA: ???
==21170==    by 0xFFEFFE31E: ???
==21170==    by 0xFFEFFE339: ???
==21170==    by 0xFFEFFE378: ???
==21170==    by 0xFFEFFE38A: ???
==21170==    by 0xFFEFFE393: ???
==21170==  Block was alloc'd at
==21170==    at 0x4C280F3: malloc (vg_replace_malloc.c:299)
==21170==    by 0x5D53AF9: ??? (in /lib64/libcrypto.so.1.0.0)
==21170==    by 0x5D53DB4: ??? (in /lib64/libcrypto.so.1.0.0)
==21170==    by 0x5D54393: FIPS_module_mode_set (in /lib64/libcrypto.so.1.0.0)
==21170==    by 0x5BF775C: FIPS_mode_set (in /lib64/libcrypto.so.1.0.0)
==21170==    by 0x5BF3B19: OPENSSL_init_library (in /lib64/libcrypto.so.1.0.0)
==21170==    by 0x400E859: call_init.part.0 (in /lib64/ld-2.18.so)
==21170==    by 0x400E942: _dl_init (in /lib64/ld-2.18.so)
==21170==    by 0x40011C9: ??? (in /lib64/ld-2.18.so)
==21170==    by 0x9: ???
==21170==    by 0xFFEFFE2FA: ???
==21170==    by 0xFFEFFE31E: ???
==21170==    by 0xFFEFFE339: ???
==21170==    by 0xFFEFFE378: ???
==21170==    by 0xFFEFFE38A: ???
==21170==    by 0xFFEFFE393: ???

Comment by Varun Gupta (Inactive) [ 2018-03-19 ]

Regarding this issue, the discussion with psergey resulted in that we should fix the test by using replace_column for the rows column in the explain.
The optimizer can give differens estimated and this depends on varying factors.

For the valgrind warnings I don't see any valgrind warning now.

Generated at Thu Feb 08 08:00:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.