Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14241

Server crash in key_copy / get_matching_chain_by_join_key or valgrind warnings

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL)
    • 5.5.59
    • Optimizer
    • None
    • 10.1.30

    Description

      The stack trace is very similar to what we have in MDEV-10397, but there are behavioral differences, so I'm filing it separately to make sure both are fixed:

      • no BIT fields in this test case
      • no LEFT JOIN
      • at least for me, this one only crashes on a non-debug build, and produces valgring warnings on a valgrind build; no crash on debug build.

      CREATE TABLE t1 (a VARCHAR(10)) ENGINE=MyISAM;
      CREATE OR REPLACE ALGORITHM=TEMPTABLE VIEW v1 AS SELECT * FROM t1;
      INSERT INTO t1 VALUES ('foo'),('bar');
       
      CREATE TABLE t2 (b integer auto_increment primary key) ENGINE=MyISAM;
      INSERT INTO t2 VALUES (NULL),(NULL);
       
      CREATE TABLE t3 (c VARCHAR(1024) CHARACTER SET utf8, d INT) ENGINE=MyISAM;
      CREATE OR REPLACE ALGORITHM=TEMPTABLE VIEW v3 AS SELECT * FROM t3;
      INSERT INTO t3 VALUES ('abc',NULL),('def',4);
       
      SET join_cache_level= 8;
      SELECT * FROM v1, t2, v3 WHERE a = c AND b = d;
       
      # Cleanup
      DROP VIEW v1, v3;
      DROP TABLE t1, t2, t3;
      

      5.5.56 release

      #2  <signal handler called>
      #3  key_copy (to_key=0x7f1931dcda25 "", from_record=0x7f1931cfa018 "\375\003", key_info=<optimized out>, key_length=3070, with_zerofill=with_zerofill@entry=true) at /home/buildbot/buildbot/build/sql/key.cc:155
      #4  0x0000000000679c8c in get_matching_chain_by_join_key (this=0x7f1931dcd870) at /home/buildbot/buildbot/build/sql/sql_join_cache.cc:3640
      #5  JOIN_CACHE_BNLH::prepare_look_for_matches (this=0x7f1931dcd870, skip_last=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_join_cache.cc:3677
      #6  0x0000000000678e69 in JOIN_CACHE::join_matching_records (this=0x7f1931dcd870, skip_last=false) at /home/buildbot/buildbot/build/sql/sql_join_cache.cc:2277
      #7  0x0000000000678948 in JOIN_CACHE::join_records (this=this@entry=0x7f1931dcd870, skip_last=skip_last@entry=false) at /home/buildbot/buildbot/build/sql/sql_join_cache.cc:2091
      #8  0x00000000005c381a in sub_select_cache (join=0x7f1931cf2018, join_tab=0x7f1931d63918, end_of_records=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_select.cc:16768
      #9  0x00000000005da513 in sub_select (join_tab=0x7f1931d635f8, join=0x7f1931cf2018, end_of_records=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_select.cc:16938
      #10 do_select (join=join@entry=0x7f1931cf2018, fields=fields@entry=0x7f1932603980, table=table@entry=0x0, procedure=0x0) at /home/buildbot/buildbot/build/sql/sql_select.cc:16649
      #11 0x00000000005efd94 in JOIN::exec (this=this@entry=0x7f1931cf2018) at /home/buildbot/buildbot/build/sql/sql_select.cc:2878
      #12 0x00000000005eb0a5 in mysql_select (thd=thd@entry=0x7f1932600000, rref_pointer_array=rref_pointer_array@entry=0x7f1932603ad8, tables=0x7f1931d02238, wild_num=<optimized out>, fields=..., conds=<optimized out>, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=result@entry=0x7f1931cf7f00, unit=0x7f1932603188, select_lex=select_lex@entry=0x7f1932603868) at /home/buildbot/buildbot/build/sql/sql_select.cc:3099
      #13 0x00000000005f1cc3 in handle_select (thd=thd@entry=0x7f1932600000, lex=lex@entry=0x7f19326030d8, result=result@entry=0x7f1931cf7f00, setup_tables_done_option=setup_tables_done_option@entry=0) at /home/buildbot/buildbot/build/sql/sql_select.cc:319
      #14 0x000000000059bd18 in execute_sqlcom_select (thd=thd@entry=0x7f1932600000, all_tables=0x7f1931d02238) at /home/buildbot/buildbot/build/sql/sql_parse.cc:4686
      #15 0x00000000005a3ba3 in mysql_execute_command (thd=thd@entry=0x7f1932600000) at /home/buildbot/buildbot/build/sql/sql_parse.cc:2234
      #16 0x00000000005a74c4 in mysql_parse (thd=thd@entry=0x7f1932600000, parser_state=parser_state@entry=0x7f19384bc6e0, length=<optimized out>, rawbuf=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_parse.cc:5931
      #17 0x00000000005a9983 in dispatch_command (command=COM_QUERY, thd=0x7f1932600000, packet=<optimized out>, packet_length=46) at /home/buildbot/buildbot/build/sql/sql_parse.cc:1245
      #18 0x00000000005a9afe in do_command (thd=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_parse.cc:793
      #19 0x000000000065c58b in do_handle_one_connection (thd_arg=thd_arg@entry=0x7f1932600000) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1268
      #20 0x000000000065c60c in handle_one_connection (arg=arg@entry=0x7f1932600000) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1184
      #21 0x0000000000979828 in pfs_spawn_thread (arg=0x7f1936441d30) at /home/buildbot/buildbot/build/storage/perfschema/pfs.cc:1015
      #22 0x00007f1937aaa494 in start_thread (arg=0x7f19384bd700) at pthread_create.c:333
      #23 0x00007f193716793f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      

      5.5 93aadda513d8b5 valgrind build

      ==29579== Thread 4:
      ==29579== Invalid read of size 1
      ==29579==    at 0x87E2CA: key_copy(unsigned char*, unsigned char*, st_key*, unsigned int, bool) (key.cc:124)
      ==29579==    by 0x73B9A2: JOIN_CACHE_BNLH::get_matching_chain_by_join_key() (sql_join_cache.cc:3640)
      ==29579==    by 0x73BA33: JOIN_CACHE_BNLH::prepare_look_for_matches(bool) (sql_join_cache.cc:3677)
      ==29579==    by 0x739A68: JOIN_CACHE::join_matching_records(bool) (sql_join_cache.cc:2277)
      ==29579==    by 0x739532: JOIN_CACHE::join_records(bool) (sql_join_cache.cc:2091)
      ==29579==    by 0x660045: sub_select_cache(JOIN*, st_join_table*, bool) (sql_select.cc:16843)
      ==29579==    by 0x660250: sub_select(JOIN*, st_join_table*, bool) (sql_select.cc:17013)
      ==29579==    by 0x65FD39: do_select(JOIN*, List<Item>*, TABLE*, Procedure*) (sql_select.cc:16724)
      ==29579==    by 0x63DF57: JOIN::exec() (sql_select.cc:2882)
      ==29579==    by 0x63E735: mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) (sql_select.cc:3103)
      ==29579==    by 0x634DD9: handle_select(THD*, LEX*, select_result*, unsigned long) (sql_select.cc:319)
      ==29579==    by 0x60D4D0: execute_sqlcom_select(THD*, TABLE_LIST*) (sql_parse.cc:4685)
      ==29579==    by 0x606843: mysql_execute_command(THD*) (sql_parse.cc:2233)
      ==29579==    by 0x6100B9: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5930)
      ==29579==    by 0x603D9E: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1075)
      ==29579==    by 0x602FAA: do_command(THD*) (sql_parse.cc:793)
      ==29579==  Address 0xa957353 is 19 bytes after a block of size 64 in arena "client"
      

      Attachments

        Issue Links

          Activity

            explain
            SELECT * FROM v1, t2, v3 WHERE a = c AND b = d;
            id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
            1	PRIMARY	<derived2>	ALL	NULL	NULL	NULL	NULL	2	Using where
            1	PRIMARY	<derived3>	hash_ALL	key0	#hash#key0	1027	v1.a	2	Using where; Using join buffer (flat, BNLH join)
            1	PRIMARY	t2	eq_ref	PRIMARY	PRIMARY	4	v3.d	1	Using index
            3	DERIVED	t3	ALL	NULL	NULL	NULL	NULL	2	
            2	DERIVED	t1	ALL	NULL	NULL	NULL	NULL	2
            

            varun Varun Gupta (Inactive) added a comment - explain SELECT * FROM v1, t2, v3 WHERE a = c AND b = d; id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY <derived2> ALL NULL NULL NULL NULL 2 Using where 1 PRIMARY <derived3> hash_ALL key0 #hash#key0 1027 v1.a 2 Using where; Using join buffer (flat, BNLH join) 1 PRIMARY t2 eq_ref PRIMARY PRIMARY 4 v3.d 1 Using index 3 DERIVED t3 ALL NULL NULL NULL NULL 2 2 DERIVED t1 ALL NULL NULL NULL NULL 2
            varun Varun Gupta (Inactive) added a comment - - edited

            In the function key_copy

            (lldb) p key_info->key_parts
            (uint) $6 = 1
            (lldb) p key_length
            (uint) $3 = 1027
            (lldb) p (*key_part->field).field_name
            (const char *) $2 = 0x0000000102849f88 "d"
            (lldb) p key_info->key_part->store_length
            (uint16) $8 = 4
            

            So we get a crash here because the key_length= 1027 and there is only one key_part but the length of that key_part is 5, so we start accessing an illegal memory address in the next iteration
             
            Actually according to the key_length, the field should actually be *c* instead of *d*
            Also explain also tells that we have a ref on v1.a , so the field should actually be *c*
            

            varun Varun Gupta (Inactive) added a comment - - edited In the function key_copy (lldb) p key_info->key_parts (uint) $6 = 1 (lldb) p key_length (uint) $3 = 1027 (lldb) p (*key_part->field).field_name ( const char *) $2 = 0x0000000102849f88 "d" (lldb) p key_info->key_part->store_length (uint16) $8 = 4 So we get a crash here because the key_length= 1027 and there is only one key_part but the length of that key_part is 5, so we start accessing an illegal memory address in the next iteration   Actually according to the key_length, the field should actually be *c* instead of *d* Also explain also tells that we have a ref on v1.a , so the field should actually be *c*

            In the function JOIN_CACHE_BNLH::get_matching_chain_by_join_key()
            we have

            KEY *keyinfo= join_tab->get_keyinfo_by_key_no(ref->key);
             
            KEY *get_keyinfo_by_key_no(uint key) 
            {
              return (is_hash_join_key_no(key) ? hj_key : table->key_info+key);
            }
            

            (lldb) p ref->key
            (int) $11 = 0

            so with the ref->key set to 0 here we are not able to return the hash join key for the BNLH join
            if i set the ref->key to MAX_KEY then there is no crash

            varun Varun Gupta (Inactive) added a comment - In the function JOIN_CACHE_BNLH::get_matching_chain_by_join_key() we have KEY *keyinfo= join_tab->get_keyinfo_by_key_no( ref ->key);   KEY *get_keyinfo_by_key_no(uint key) { return (is_hash_join_key_no(key) ? hj_key : table->key_info+key); } (lldb) p ref->key (int) $11 = 0 so with the ref->key set to 0 here we are not able to return the hash join key for the BNLH join if i set the ref->key to MAX_KEY then there is no crash

            In JOIN::drop_unused_derived_keys()
            we see that tab->ref.key is set to 0

            if (tab->ref.key >= 0)
                tab->ref.key= 0;
            
            

            varun Varun Gupta (Inactive) added a comment - In JOIN::drop_unused_derived_keys() we see that tab->ref.key is set to 0 if (tab-> ref .key >= 0) tab-> ref .key= 0;

            The suggestion

            if i set the ref->key to MAX_KEY then there is no crash

            looks logical, and I follow it (by setting tab->ref.key back to MAX_KEY in the debugger). Then, I observe another odd thing:

            (gdb) wher
              #0  rr_sequential (info=0x7fff84051ac0) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/records.cc:485
              #1  0x0000555555c8f6d4 in JOIN_TAB_SCAN::next (this=0x7fff84054b50) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_join_cache.cc:3375
              #2  0x0000555555c8dd65 in JOIN_CACHE::join_matching_records (this=0x7fff84054998, skip_last=false) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_join_cache.cc:2259
              #3  0x0000555555c8d6f2 in JOIN_CACHE::join_records (this=0x7fff84054998, skip_last=false) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_join_cache.cc:2087
              #4  0x0000555555b7090e in sub_select_cache (join=0x7fff84044f10, join_tab=0x7fff840519f8, end_of_records=true) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:18417
              #5  0x0000555555b70b20 in sub_select (join=0x7fff84044f10, join_tab=0x7fff84051648, end_of_records=true) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:18589
              #6  0x0000555555b70336 in do_select (join=0x7fff84044f10, procedure=0x0) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:18184
              #7  0x0000555555b4a9e7 in JOIN::exec_inner (this=0x7fff84044f10) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:3514
              #8  0x0000555555b49e96 in JOIN::exec (this=0x7fff84044f10) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:3309
              #9  0x0000555555b4b05f in mysql_select (thd=0x7fff84000b00, tables=0x7fff84011198, wild_num=1, fields=..., conds=0x7fff84012b50, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fff84016d90, unit=0x7fff840046a0, select_lex=0x7fff84004dd8) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:3709
              #10 0x0000555555b3f8c0 in handle_select (thd=0x7fff84000b00, lex=0x7fff840045d8, result=0x7fff84016d90, setup_tables_done_option=0) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:373
              #11 0x0000555555b0b9b8 in execute_sqlcom_select (thd=0x7fff84000b00, all_tables=0x7fff84011198) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_parse.cc:6446
              #12 0x0000555555b019a3 in mysql_execute_command (thd=0x7fff84000b00) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_parse.cc:3462
              #13 0x0000555555b0f37a in mysql_parse (thd=0x7fff84000b00, rawbuf=0x7fff84010f88 "SELECT * FROM v1, t2, v3 WHERE a = c AND b = d", length=46, parser_state=0x7ffff40f7250, is_com_multi=false, is_next_command=false) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_parse.cc:7887
            

            We see from the stacktrace that the hash join code is reading the temp.table sequentially.
            However the table has an index:

            (gdb) p info->table->alias.Ptr
              $136 = 0x7fff84011e50 "v3"
            (gdb) p info->table->s->keys
              $137 = 1
            (gdb) p info->table->file
              $138 = (ha_heap *) 0x7fff8404a5e8
            (gdb) p info->table->file->active_index
              $139 = 64
            

            Why would the optimizer create an index on a temp table if then the temp. table is only read sequentially?

            psergei Sergei Petrunia added a comment - The suggestion if i set the ref->key to MAX_KEY then there is no crash looks logical, and I follow it (by setting tab->ref.key back to MAX_KEY in the debugger). Then, I observe another odd thing: (gdb) wher #0 rr_sequential (info=0x7fff84051ac0) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/records.cc:485 #1 0x0000555555c8f6d4 in JOIN_TAB_SCAN::next (this=0x7fff84054b50) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_join_cache.cc:3375 #2 0x0000555555c8dd65 in JOIN_CACHE::join_matching_records (this=0x7fff84054998, skip_last=false) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_join_cache.cc:2259 #3 0x0000555555c8d6f2 in JOIN_CACHE::join_records (this=0x7fff84054998, skip_last=false) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_join_cache.cc:2087 #4 0x0000555555b7090e in sub_select_cache (join=0x7fff84044f10, join_tab=0x7fff840519f8, end_of_records=true) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:18417 #5 0x0000555555b70b20 in sub_select (join=0x7fff84044f10, join_tab=0x7fff84051648, end_of_records=true) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:18589 #6 0x0000555555b70336 in do_select (join=0x7fff84044f10, procedure=0x0) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:18184 #7 0x0000555555b4a9e7 in JOIN::exec_inner (this=0x7fff84044f10) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:3514 #8 0x0000555555b49e96 in JOIN::exec (this=0x7fff84044f10) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:3309 #9 0x0000555555b4b05f in mysql_select (thd=0x7fff84000b00, tables=0x7fff84011198, wild_num=1, fields=..., conds=0x7fff84012b50, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fff84016d90, unit=0x7fff840046a0, select_lex=0x7fff84004dd8) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:3709 #10 0x0000555555b3f8c0 in handle_select (thd=0x7fff84000b00, lex=0x7fff840045d8, result=0x7fff84016d90, setup_tables_done_option=0) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_select.cc:373 #11 0x0000555555b0b9b8 in execute_sqlcom_select (thd=0x7fff84000b00, all_tables=0x7fff84011198) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_parse.cc:6446 #12 0x0000555555b019a3 in mysql_execute_command (thd=0x7fff84000b00) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_parse.cc:3462 #13 0x0000555555b0f37a in mysql_parse (thd=0x7fff84000b00, rawbuf=0x7fff84010f88 "SELECT * FROM v1, t2, v3 WHERE a = c AND b = d", length=46, parser_state=0x7ffff40f7250, is_com_multi=false, is_next_command=false) at /home/psergey/dev-git/10.2-mariarocks-r4/sql/sql_parse.cc:7887 We see from the stacktrace that the hash join code is reading the temp.table sequentially. However the table has an index: (gdb) p info->table->alias.Ptr $136 = 0x7fff84011e50 "v3" (gdb) p info->table->s->keys $137 = 1 (gdb) p info->table->file $138 = (ha_heap *) 0x7fff8404a5e8 (gdb) p info->table->file->active_index $139 = 64 Why would the optimizer create an index on a temp table if then the temp. table is only read sequentially?

            After discussion during the optimizer call , it was decided that when we have derived table keys we should not use hash join key

            varun Varun Gupta (Inactive) added a comment - After discussion during the optimizer call , it was decided that when we have derived table keys we should not use hash join key
            psergei Sergei Petrunia added a comment - - edited

            So, with the patch I get:

            +------+-------------+------------+----------+---------------+-----------+---------+------+------+--------------------------------------------------+
            | id   | select_type | table      | type     | possible_keys | key       | key_len | ref  | rows | Extra                                            |
            +------+-------------+------------+----------+---------------+-----------+---------+------+------+--------------------------------------------------+
            |    1 | PRIMARY     | <derived2> | ALL      | NULL          | NULL      | NULL    | NULL |    2 |                                                  |
            |    1 | PRIMARY     | <derived3> | hash_ALL | NULL          | #hash#$hj | 3075    | func |    2 | Using where; Using join buffer (flat, BNLH join) |
            |    1 | PRIMARY     | t2         | eq_ref   | PRIMARY       | PRIMARY   | 4       | v3.d |    1 | Using index                                      |
            |    3 | DERIVED     | t3         | ALL      | NULL          | NULL      | NULL    | NULL |    2 |                                                  |
            |    2 | DERIVED     | t1         | ALL      | NULL          | NULL      | NULL    | NULL |    2 |                                                  |
            +------+-------------+------------+----------+---------------+-----------+---------+------+------+--------------------------------------------------+
            

            there's no key0 for table=<derived3>, good.

            psergei Sergei Petrunia added a comment - - edited So, with the patch I get: +------+-------------+------------+----------+---------------+-----------+---------+------+------+--------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+------------+----------+---------------+-----------+---------+------+------+--------------------------------------------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | | | 1 | PRIMARY | <derived3> | hash_ALL | NULL | #hash#$hj | 3075 | func | 2 | Using where; Using join buffer (flat, BNLH join) | | 1 | PRIMARY | t2 | eq_ref | PRIMARY | PRIMARY | 4 | v3.d | 1 | Using index | | 3 | DERIVED | t3 | ALL | NULL | NULL | NULL | NULL | 2 | | | 2 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 2 | | +------+-------------+------------+----------+---------------+-----------+---------+------+------+--------------------------------------------------+ there's no key0 for table=<derived3>, good.

            Ok to push.

            psergei Sergei Petrunia added a comment - Ok to push.

            People

              varun Varun Gupta (Inactive)
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.