[MDEV-14241] Server crash in key_copy / get_matching_chain_by_join_key or valgrind warnings Created: 2017-11-01  Updated: 2018-06-28  Resolved: 2018-01-19

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5, 10.0, 10.1, 10.2
Fix Version/s: 5.5.59

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-10397 Server crashes in key_copy with join_... Closed
Sprint: 10.1.30

 Description   

The stack trace is very similar to what we have in MDEV-10397, but there are behavioral differences, so I'm filing it separately to make sure both are fixed:

  • no BIT fields in this test case
  • no LEFT JOIN
  • at least for me, this one only crashes on a non-debug build, and produces valgring warnings on a valgrind build; no crash on debug build.

CREATE TABLE t1 (a VARCHAR(10)) ENGINE=MyISAM;
CREATE OR REPLACE ALGORITHM=TEMPTABLE VIEW v1 AS SELECT * FROM t1;
INSERT INTO t1 VALUES ('foo'),('bar');
 
CREATE TABLE t2 (b integer auto_increment primary key) ENGINE=MyISAM;
INSERT INTO t2 VALUES (NULL),(NULL);
 
CREATE TABLE t3 (c VARCHAR(1024) CHARACTER SET utf8, d INT) ENGINE=MyISAM;
CREATE OR REPLACE ALGORITHM=TEMPTABLE VIEW v3 AS SELECT * FROM t3;
INSERT INTO t3 VALUES ('abc',NULL),('def',4);
 
SET join_cache_level= 8;
SELECT * FROM v1, t2, v3 WHERE a = c AND b = d;
 
# Cleanup
DROP VIEW v1, v3;
DROP TABLE t1, t2, t3;

5.5.56 release

#2  <signal handler called>
#3  key_copy (to_key=0x7f1931dcda25 "", from_record=0x7f1931cfa018 "\375\003", key_info=<optimized out>, key_length=3070, with_zerofill=with_zerofill@entry=true) at /home/buildbot/buildbot/build/sql/key.cc:155
#4  0x0000000000679c8c in get_matching_chain_by_join_key (this=0x7f1931dcd870) at /home/buildbot/buildbot/build/sql/sql_join_cache.cc:3640
#5  JOIN_CACHE_BNLH::prepare_look_for_matches (this=0x7f1931dcd870, skip_last=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_join_cache.cc:3677
#6  0x0000000000678e69 in JOIN_CACHE::join_matching_records (this=0x7f1931dcd870, skip_last=false) at /home/buildbot/buildbot/build/sql/sql_join_cache.cc:2277
#7  0x0000000000678948 in JOIN_CACHE::join_records (this=this@entry=0x7f1931dcd870, skip_last=skip_last@entry=false) at /home/buildbot/buildbot/build/sql/sql_join_cache.cc:2091
#8  0x00000000005c381a in sub_select_cache (join=0x7f1931cf2018, join_tab=0x7f1931d63918, end_of_records=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_select.cc:16768
#9  0x00000000005da513 in sub_select (join_tab=0x7f1931d635f8, join=0x7f1931cf2018, end_of_records=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_select.cc:16938
#10 do_select (join=join@entry=0x7f1931cf2018, fields=fields@entry=0x7f1932603980, table=table@entry=0x0, procedure=0x0) at /home/buildbot/buildbot/build/sql/sql_select.cc:16649
#11 0x00000000005efd94 in JOIN::exec (this=this@entry=0x7f1931cf2018) at /home/buildbot/buildbot/build/sql/sql_select.cc:2878
#12 0x00000000005eb0a5 in mysql_select (thd=thd@entry=0x7f1932600000, rref_pointer_array=rref_pointer_array@entry=0x7f1932603ad8, tables=0x7f1931d02238, wild_num=<optimized out>, fields=..., conds=<optimized out>, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=result@entry=0x7f1931cf7f00, unit=0x7f1932603188, select_lex=select_lex@entry=0x7f1932603868) at /home/buildbot/buildbot/build/sql/sql_select.cc:3099
#13 0x00000000005f1cc3 in handle_select (thd=thd@entry=0x7f1932600000, lex=lex@entry=0x7f19326030d8, result=result@entry=0x7f1931cf7f00, setup_tables_done_option=setup_tables_done_option@entry=0) at /home/buildbot/buildbot/build/sql/sql_select.cc:319
#14 0x000000000059bd18 in execute_sqlcom_select (thd=thd@entry=0x7f1932600000, all_tables=0x7f1931d02238) at /home/buildbot/buildbot/build/sql/sql_parse.cc:4686
#15 0x00000000005a3ba3 in mysql_execute_command (thd=thd@entry=0x7f1932600000) at /home/buildbot/buildbot/build/sql/sql_parse.cc:2234
#16 0x00000000005a74c4 in mysql_parse (thd=thd@entry=0x7f1932600000, parser_state=parser_state@entry=0x7f19384bc6e0, length=<optimized out>, rawbuf=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_parse.cc:5931
#17 0x00000000005a9983 in dispatch_command (command=COM_QUERY, thd=0x7f1932600000, packet=<optimized out>, packet_length=46) at /home/buildbot/buildbot/build/sql/sql_parse.cc:1245
#18 0x00000000005a9afe in do_command (thd=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_parse.cc:793
#19 0x000000000065c58b in do_handle_one_connection (thd_arg=thd_arg@entry=0x7f1932600000) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1268
#20 0x000000000065c60c in handle_one_connection (arg=arg@entry=0x7f1932600000) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1184
#21 0x0000000000979828 in pfs_spawn_thread (arg=0x7f1936441d30) at /home/buildbot/buildbot/build/storage/perfschema/pfs.cc:1015
#22 0x00007f1937aaa494 in start_thread (arg=0x7f19384bd700) at pthread_create.c:333
#23 0x00007f193716793f in clone () from /lib/x86_64-linux-gnu/libc.so.6

5.5 93aadda513d8b5 valgrind build

==29579== Thread 4:
==29579== Invalid read of size 1
==29579==    at 0x87E2CA: key_copy(unsigned char*, unsigned char*, st_key*, unsigned int, bool) (key.cc:124)
==29579==    by 0x73B9A2: JOIN_CACHE_BNLH::get_matching_chain_by_join_key() (sql_join_cache.cc:3640)
==29579==    by 0x73BA33: JOIN_CACHE_BNLH::prepare_look_for_matches(bool) (sql_join_cache.cc:3677)
==29579==    by 0x739A68: JOIN_CACHE::join_matching_records(bool) (sql_join_cache.cc:2277)
==29579==    by 0x739532: JOIN_CACHE::join_records(bool) (sql_join_cache.cc:2091)
==29579==    by 0x660045: sub_select_cache(JOIN*, st_join_table*, bool) (sql_select.cc:16843)
==29579==    by 0x660250: sub_select(JOIN*, st_join_table*, bool) (sql_select.cc:17013)
==29579==    by 0x65FD39: do_select(JOIN*, List<Item>*, TABLE*, Procedure*) (sql_select.cc:16724)
==29579==    by 0x63DF57: JOIN::exec() (sql_select.cc:2882)
==29579==    by 0x63E735: mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) (sql_select.cc:3103)
==29579==    by 0x634DD9: handle_select(THD*, LEX*, select_result*, unsigned long) (sql_select.cc:319)
==29579==    by 0x60D4D0: execute_sqlcom_select(THD*, TABLE_LIST*) (sql_parse.cc:4685)
==29579==    by 0x606843: mysql_execute_command(THD*) (sql_parse.cc:2233)
==29579==    by 0x6100B9: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5930)
==29579==    by 0x603D9E: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1075)
==29579==    by 0x602FAA: do_command(THD*) (sql_parse.cc:793)
==29579==  Address 0xa957353 is 19 bytes after a block of size 64 in arena "client"



 Comments   
Comment by Varun Gupta (Inactive) [ 2017-12-04 ]

explain
SELECT * FROM v1, t2, v3 WHERE a = c AND b = d;
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	<derived2>	ALL	NULL	NULL	NULL	NULL	2	Using where
1	PRIMARY	<derived3>	hash_ALL	key0	#hash#key0	1027	v1.a	2	Using where; Using join buffer (flat, BNLH join)
1	PRIMARY	t2	eq_ref	PRIMARY	PRIMARY	4	v3.d	1	Using index
3	DERIVED	t3	ALL	NULL	NULL	NULL	NULL	2	
2	DERIVED	t1	ALL	NULL	NULL	NULL	NULL	2

Comment by Varun Gupta (Inactive) [ 2017-12-04 ]

In the function key_copy

(lldb) p key_info->key_parts
(uint) $6 = 1
(lldb) p key_length
(uint) $3 = 1027
(lldb) p (*key_part->field).field_name
(const char *) $2 = 0x0000000102849f88 "d"
(lldb) p key_info->key_part->store_length
(uint16) $8 = 4

So we get a crash here because the key_length= 1027 and there is only one key_part but the length of that key_part is 5, so we start accessing an illegal memory address in the next iteration
 
Actually according to the key_length, the field should actually be *c* instead of *d*
Also explain also tells that we have a ref on v1.a , so the field should actually be *c*

Comment by Varun Gupta (Inactive) [ 2017-12-04 ]

In the function JOIN_CACHE_BNLH::get_matching_chain_by_join_key()
we have

KEY *keyinfo= join_tab->get_keyinfo_by_key_no(ref->key);
 
KEY *get_keyinfo_by_key_no(uint key) 
{
  return (is_hash_join_key_no(key) ? hj_key : table->key_info+key);
}

(lldb) p ref->key
(int) $11 = 0

so with the ref->key set to 0 here we are not able to return the hash join key for the BNLH join
if i set the ref->key to MAX_KEY then there is no crash

Comment by Varun Gupta (Inactive) [ 2017-12-04 ]

In JOIN::drop_unused_derived_keys()
we see that tab->ref.key is set to 0

if (tab->ref.key >= 0)
    tab->ref.key= 0;

Comment by Sergei Petrunia [ 2017-12-05 ]

The suggestion

if i set the ref->key to MAX_KEY then there is no crash

looks logical, and I follow it (by setting tab->ref.key back to MAX_KEY in the debugger). Then, I observe another odd thing:

(gdb) wher
  #0  rr_sequential (info=0x7fff84051ac0) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/records.cc:485
  #1  0x0000555555c8f6d4 in JOIN_TAB_SCAN::next (this=0x7fff84054b50) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_join_cache.cc:3375
  #2  0x0000555555c8dd65 in JOIN_CACHE::join_matching_records (this=0x7fff84054998, skip_last=false) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_join_cache.cc:2259
  #3  0x0000555555c8d6f2 in JOIN_CACHE::join_records (this=0x7fff84054998, skip_last=false) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_join_cache.cc:2087
  #4  0x0000555555b7090e in sub_select_cache (join=0x7fff84044f10, join_tab=0x7fff840519f8, end_of_records=true) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:18417
  #5  0x0000555555b70b20 in sub_select (join=0x7fff84044f10, join_tab=0x7fff84051648, end_of_records=true) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:18589
  #6  0x0000555555b70336 in do_select (join=0x7fff84044f10, procedure=0x0) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:18184
  #7  0x0000555555b4a9e7 in JOIN::exec_inner (this=0x7fff84044f10) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:3514
  #8  0x0000555555b49e96 in JOIN::exec (this=0x7fff84044f10) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:3309
  #9  0x0000555555b4b05f in mysql_select (thd=0x7fff84000b00, tables=0x7fff84011198, wild_num=1, fields=..., conds=0x7fff84012b50, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fff84016d90, unit=0x7fff840046a0, select_lex=0x7fff84004dd8) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:3709
  #10 0x0000555555b3f8c0 in handle_select (thd=0x7fff84000b00, lex=0x7fff840045d8, result=0x7fff84016d90, setup_tables_done_option=0) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:373
  #11 0x0000555555b0b9b8 in execute_sqlcom_select (thd=0x7fff84000b00, all_tables=0x7fff84011198) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_parse.cc:6446
  #12 0x0000555555b019a3 in mysql_execute_command (thd=0x7fff84000b00) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_parse.cc:3462
  #13 0x0000555555b0f37a in mysql_parse (thd=0x7fff84000b00, rawbuf=0x7fff84010f88 "SELECT * FROM v1, t2, v3 WHERE a = c AND b = d", length=46, parser_state=0x7ffff40f7250, is_com_multi=false, is_next_command=false) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_parse.cc:7887

We see from the stacktrace that the hash join code is reading the temp.table sequentially.
However the table has an index:

(gdb) p info->table->alias.Ptr
  $136 = 0x7fff84011e50 "v3"
(gdb) p info->table->s->keys
  $137 = 1
(gdb) p info->table->file
  $138 = (ha_heap *) 0x7fff8404a5e8
(gdb) p info->table->file->active_index
  $139 = 64

Why would the optimizer create an index on a temp table if then the temp. table is only read sequentially?

Comment by Varun Gupta (Inactive) [ 2017-12-13 ]

After discussion during the optimizer call , it was decided that when we have derived table keys we should not use hash join key

Comment by Sergei Petrunia [ 2018-01-18 ]

So, with the patch I get:

+------+-------------+------------+----------+---------------+-----------+---------+------+------+--------------------------------------------------+
| id   | select_type | table      | type     | possible_keys | key       | key_len | ref  | rows | Extra                                            |
+------+-------------+------------+----------+---------------+-----------+---------+------+------+--------------------------------------------------+
|    1 | PRIMARY     | <derived2> | ALL      | NULL          | NULL      | NULL    | NULL |    2 |                                                  |
|    1 | PRIMARY     | <derived3> | hash_ALL | NULL          | #hash#$hj | 3075    | func |    2 | Using where; Using join buffer (flat, BNLH join) |
|    1 | PRIMARY     | t2         | eq_ref   | PRIMARY       | PRIMARY   | 4       | v3.d |    1 | Using index                                      |
|    3 | DERIVED     | t3         | ALL      | NULL          | NULL      | NULL    | NULL |    2 |                                                  |
|    2 | DERIVED     | t1         | ALL      | NULL          | NULL      | NULL    | NULL |    2 |                                                  |
+------+-------------+------------+----------+---------------+-----------+---------+------+------+--------------------------------------------------+

there's no key0 for table=<derived3>, good.

Comment by Sergei Petrunia [ 2018-01-18 ]

Ok to push.

Generated at Thu Feb 08 08:12:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.