[MDEV-10397] Server crashes in key_copy with join_cache_level > 2 and join on BIT fields Created: 2016-07-19  Updated: 2017-11-30  Resolved: 2017-11-30

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5, 10.0, 10.1
Fix Version/s: 5.5.59, 10.0.34, 10.1.30

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-14241 Server crash in key_copy / get_matchi... Closed
Sprint: 10.1.29

 Description   

CREATE TABLE t1 (b1 BIT NOT NULL);
INSERT INTO t1 VALUES (0),(1);
 
CREATE TABLE t2 (b2 BIT NOT NULL);
INSERT INTO t2 VALUES (0),(1);
 
SET SESSION JOIN_CACHE_LEVEL = 3;
SELECT * FROM t1 LEFT JOIN t2 ON b1 = b2;

Stack trace from 5.5 commit 5cf49cdf92

#3  <signal handler called>
#4  0x00000000008696bc in key_copy (to_key=0x7f3fddc94009 "\245\245\245\245\245\245\245\020", from_record=0x7f3fddc83a78 "\375\245\245\245\245\245\245\245\377\245\245\245\245\245\245\245\230:\310\335?\177", key_info=0x7f3fddc93b68, key_length=1, with_zerofill=true) at /data/src/5.5/sql/key.cc:126
#5  0x000000000072cbcf in JOIN_CACHE_BNLH::get_matching_chain_by_join_key (this=0x7f3fddca6518) at /data/src/5.5/sql/sql_join_cache.cc:3635
#6  0x000000000072cc60 in JOIN_CACHE_BNLH::prepare_look_for_matches (this=0x7f3fddca6518, skip_last=false) at /data/src/5.5/sql/sql_join_cache.cc:3672
#7  0x000000000072aca2 in JOIN_CACHE::join_matching_records (this=0x7f3fddca6518, skip_last=false) at /data/src/5.5/sql/sql_join_cache.cc:2272
#8  0x000000000072a771 in JOIN_CACHE::join_records (this=0x7f3fddca6518, skip_last=false) at /data/src/5.5/sql/sql_join_cache.cc:2086
#9  0x000000000065774a in sub_select_cache (join=0x7f3fddc469c0, join_tab=0x7f3fddc93828, end_of_records=true) at /data/src/5.5/sql/sql_select.cc:16734
#10 0x0000000000657955 in sub_select (join=0x7f3fddc469c0, join_tab=0x7f3fddc93508, end_of_records=true) at /data/src/5.5/sql/sql_select.cc:16904
#11 0x000000000065743e in do_select (join=0x7f3fddc469c0, fields=0x7f3fe1b4cb88, table=0x0, procedure=0x0) at /data/src/5.5/sql/sql_select.cc:16615
#12 0x0000000000635c63 in JOIN::exec (this=0x7f3fddc469c0) at /data/src/5.5/sql/sql_select.cc:2873
#13 0x000000000063642e in mysql_select (thd=0x7f3fe1b49060, rref_pointer_array=0x7f3fe1b4cce0, tables=0x7f3fddc45290, wild_num=1, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7f3fddc469a0, unit=0x7f3fe1b4c390, select_lex=0x7f3fe1b4ca70) at /data/src/5.5/sql/sql_select.cc:3094
#14 0x000000000062cd14 in handle_select (thd=0x7f3fe1b49060, lex=0x7f3fe1b4c2e0, result=0x7f3fddc469a0, setup_tables_done_option=0) at /data/src/5.5/sql/sql_select.cc:319
#15 0x0000000000606175 in execute_sqlcom_select (thd=0x7f3fe1b49060, all_tables=0x7f3fddc45290) at /data/src/5.5/sql/sql_parse.cc:4689
#16 0x00000000005ff4c6 in mysql_execute_command (thd=0x7f3fe1b49060) at /data/src/5.5/sql/sql_parse.cc:2234
#17 0x0000000000608d40 in mysql_parse (thd=0x7f3fe1b49060, rawbuf=0x7f3fddc45078 "SELECT * FROM t1 LEFT JOIN t2 ON b1 = b2", length=40, parser_state=0x7f3fe4b59650) at /data/src/5.5/sql/sql_parse.cc:5934
#18 0x00000000005fca55 in dispatch_command (command=COM_QUERY, thd=0x7f3fe1b49060, packet=0x7f3fdee53061 "", packet_length=40) at /data/src/5.5/sql/sql_parse.cc:1079
#19 0x00000000005fbc0f in do_command (thd=0x7f3fe1b49060) at /data/src/5.5/sql/sql_parse.cc:793
#20 0x00000000006fde45 in do_handle_one_connection (thd_arg=0x7f3fe1b49060) at /data/src/5.5/sql/sql_connect.cc:1270
#21 0x00000000006fdbd2 in handle_one_connection (arg=0x7f3fe1b49060) at /data/src/5.5/sql/sql_connect.cc:1186
#22 0x000000000093dc53 in pfs_spawn_thread (arg=0x7f3fdfbed3e0) at /data/src/5.5/storage/perfschema/pfs.cc:1015
#23 0x00007f3fe47920a4 in start_thread (arg=0x7f3fe4b5a700) at pthread_create.c:309
#24 0x00007f3fe321587d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111



 Comments   
Comment by Varun Gupta (Inactive) [ 2017-11-03 ]

Some investigation results

In the function TABLE::create_key_part_by_field
we have

  key_part_info->null_offset= (uint) (field->null_ptr - (uchar*) record[0]);

So for the above select query as the field b1 and b2 are defined NOT NULL due to which field->null_ptr is NULL

(gdb) p field->null_ptr
$5 = (uchar *) 0x0
(gdb) p field->null_ptr - (uchar*) record[0]
$4 = -140736951630024
(gdb) p (uint)(field->null_ptr - (uchar*) record[0])
$6 = 536725304

This definitely does not look correct, we need to only do the above ptr difference only when field->null_ptr is defined

Comment by Varun Gupta (Inactive) [ 2017-11-03 ]

In function key_copy

(gdb) p key_part->null_offset
$7 = 536725304
(gdb) p from_record[key_part->null_offset]
Cannot access memory at address 0x800000000000

So due to the incorrect pointer difference pointed above we are having the server crash in the key_copy function

Comment by Sergei Petrunia [ 2017-11-06 ]

(for the record, I needed to use MyISAM to reproduce).

Confirm the findings.
key_copy treats the field as null-able ( if (key_part->null_bit) branch is taken) while it is not.
The value of key_part_info->null_bit is copied by TABLE::create_key_part_by_field from field->null_bit.
field->null_bit is not null for t2.b2, even if the field is defined as NOT NULL. This is because t2 is on an inner side of an outer join.

Comment by Sergei Petrunia [ 2017-11-06 ]

The crash goes away if I set key_part_info->null_bit to 0 in TABLE::create_key_part_by_field. But is this the right solution?

If one looks at the code in TABLE::create_key_part_by_field below, one can see:

  if (field->real_maybe_null())
  {
    key_part_info->store_length+= HA_KEY_NULL_LENGTH;
  }

This branch is not taken. The use of real_maybe_null implies that TABLE::create_key_part_by_field should use field NULL-ability from the underlying table, ignoring the NULL-ability that comes from outer joins.

Comment by Igor Babaev [ 2017-11-06 ]

I wonder whether the type of the fields (BIT) is essential here.

Comment by Sergei Petrunia [ 2017-11-08 ]

igor yes, it seems to be.

I've tried debugging with both original table definition and this one:

CREATE TABLE t1 (b1 TINYINT NOT NULL);
INSERT INTO t1 VALUES (0),(1);

then for BIT field , immediately after return from open_table we have:

tables->table->field[0]->null_bit==1

while for TINYINT it is

tables->table->field[0]->null_bit==0

.

I am not sure which one is right:

  • null_bit should be zero
  • or, the code TABLE::create_key_part_by_field should not trust its value
Comment by Sergei Petrunia [ 2017-11-11 ]

varun, when you're back please check where does this null_bit difference come from. Is it intentional, is there any handling of this in other code?

Comment by Varun Gupta (Inactive) [ 2017-11-28 ]

In the function make_field we have this piece of code for BIT field

  if (field_type == MYSQL_TYPE_BIT && !f_bit_as_char(pack_flag))
  {
    bit_ptr= null_pos;
    *bit_offset= null_bit;*
  }

(9431) (lldb) p bit_offset
(uchar) $1 = '\x01'

if (!null_ptr_arg)
  null_bit= bit_ofs_arg;

So this is where we get the null_bit set for BIT fields

Comment by Sergei Petrunia [ 2017-11-30 ]

Should check the affected .result files and ok to push after that.

Generated at Thu Feb 08 07:41:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.