[MDEV-5600] Wrong result on 2nd execution of PS depending on the length of the query Created: 2014-02-02  Updated: 2014-02-19  Resolved: 2014-02-19

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.3.12, 5.5.35, 10.0.8
Fix Version/s: 5.5.36, 10.0.9, 5.3.13

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Attachments: File mdev5600-2.show_status_increment.1st_exec.bad     File mdev5600-2.show_status_increment.1st_exec.good     File mdev5600-2.show_status_increment.2nd_exec.bad     File mdev5600-2.show_status_increment.2nd_exec.good     File mdev5600-2.test     File mdev5600_bad.test     File mdev5600_bad.trace.gz     File mdev5600_good.test     File mdev5600_good.trace.gz     HTML File mdev5600_trace_diff_between_bad_and_good    
Issue Links:
Relates

 Description   

This is the problem briefly discussed with Monty on IRC, when the result of the 2nd execution depends on seemingly unimportant factors. I will collect all information that I have so that it's not forgotten, but I suggest to put it aside till more pressing tasks are finished. It would be good to investigate it though, as it's one of things which are impossible to analyze when it happens in external users' environment.

Assorted notes and observations:

  • the result depends even on insignificant blank spaces in the query (a few less of those and the failure is gone), name of the basedir, etc.
  • I could reproduce it on at least two different machines (Ubuntu Precise 64-bit and Debian Wheezy 64-bit), so it's not limited to the machine or particular OS flavor;
  • I could reproduce it via MySQL client, although it required some additional seemingly useless actions, so it's not limited to MTR either;
  • the result seems persistent within the same machine, same build, but it might be different on different machines (the query that causes a failure on one machine did not do the same on the other at first, I had to revert to a little longer query to make the bug re-appear on the 2nd machine);
  • the problem appears at least with at least BUILD/compile-pentium-debug-max-no-ndb builds;
  • the problem is reproducible with -mysqld=-debug;
  • the problem stops appearing if the test is run with --valgrind-mysqld;
  • I could reproduce it on 5.3 (current tree) and on 5.3.12; could not reproduce on 5.5 so far, but due to the fragility of the test case it does not mean that it does not exist in 5.5.

Two complete test cases with the data are attached, mdev5600_bad.test and mdev5600_good.test. They only differ by one space in the query (where 3 right brackets come in a row):

SELECT alias2 . `col_time_key` AS field1, MIN(alias1 . `col_int_key`) AS field2, alias1.`col_int_nokey` AS field3, alias1.`col_varchar_key` AS field4 , MIN(alias1.`col_varchar_nokey` ) AS field5, alias1.`col_time_key` AS field6,  ( SELECT MAX( DISTINCT SQ1_alias1.`col_varchar_nokey` ) AS SQ1_field1 FROM CC AS SQ1_alias1 ) AS field7 FROM ( CC AS alias1, C AS alias2, ( SELECT SQL_SMALL_RESULT SQ2_alias1.* FROM ( C AS SQ2_alias1 STRAIGHT_JOIN D AS SQ2_alias2 ON (SQ2_alias2.`pk` = SQ2_alias1.`col_int_nokey` ) ) ) AS alias3 ) WHERE ( alias3.`col_int_key`, alias3 . `col_int_key` ) IN ( SELECT   SQ3_alias1 . `col_int_nokey` AS SQ3_field1 , SQ3_alias1 . `pk` AS SQ3_field2 FROM ( C AS SQ3_alias1 INNER JOIN C AS SQ3_alias2 ON (SQ3_alias2 . `col_int_key` = SQ3_alias1 . `col_int_key`  ) )    )  GROUP BY field1, field3, field4, field6, field7  ORDER BY alias1 . `col_time_key` ASC , field1, field2, field3, field4, field5, field6, field7
 
SELECT alias2 . `col_time_key` AS field1, MIN(alias1 . `col_int_key`) AS field2, alias1.`col_int_nokey` AS field3, alias1.`col_varchar_key` AS field4 , MIN(alias1.`col_varchar_nokey` ) AS field5, alias1.`col_time_key` AS field6,  ( SELECT MAX( DISTINCT SQ1_alias1.`col_varchar_nokey` ) AS SQ1_field1 FROM CC AS SQ1_alias1 ) AS field7 FROM ( CC AS alias1, C AS alias2, ( SELECT SQL_SMALL_RESULT SQ2_alias1.* FROM ( C AS SQ2_alias1 STRAIGHT_JOIN D AS SQ2_alias2 ON (SQ2_alias2.`pk` = SQ2_alias1.`col_int_nokey` ) ) ) AS alias3 ) WHERE ( alias3.`col_int_key`, alias3 . `col_int_key` ) IN ( SELECT   SQ3_alias1 . `col_int_nokey` AS SQ3_field1 , SQ3_alias1 . `pk` AS SQ3_field2 FROM ( C AS SQ3_alias1 INNER JOIN C AS SQ3_alias2 ON (SQ3_alias2 . `col_int_key` = SQ3_alias1 . `col_int_key`  ) )   )  GROUP BY field1, field3, field4, field6, field7  ORDER BY alias1 . `col_time_key` ASC , field1, field2, field3, field4, field5, field6, field7

The first query on the 2nd execution as a PS returns an empty set. The second one returns a result set.

The test is very unclean, with ugly names and probably excessive data, but I cannot safely clean it up due to the nature of the problem.

I ran the bad and good tests with debug, extracted the fragments that belong to the 2nd execution, converted them using convert-dbug-for-diff and attached the diff as mdev5600_trace_diff_between_bad_and_good. In case it did not go right there are also compressed full trace files, mdev5600_bad.trace.gz and mdev5600_bad.trace.gz (also uploaded to hasky:/tmp).

I have also set up the test on perro under mdev5600 folder.

cd mdev5600/mysql-test

Indication of the good result (it's the tail of the result set from the 2nd execution, followed by DROP):

perl ./mtr mdev5600_good
...
18:56:33	8	NULL	x	x	22:55:23	y
19:11:10	8	NULL	x	x	22:55:23	y
22:34:09	8	NULL	x	x	22:55:23	y
DROP TABLE IF EXISTS C, D, CC;

Indication of the bad result (it's the tail of the result set from the 1nd execution, followed by the second EXECUTE with an empty result set, followed by DROP):

perl ./mtr mdev5600_bad
...
18:56:33	8	NULL	x	x	22:55:23	y
19:11:10	8	NULL	x	x	22:55:23	y
22:34:09	8	NULL	x	x	22:55:23	y
EXECUTE stmt;
field1	field2	field3	field4	field5	field6	field7
DROP TABLE IF EXISTS C, D, CC;



 Comments   
Comment by Elena Stepanova [ 2014-02-02 ]

psergey, monty might have hints on where to start looking for fixing this.

Comment by Elena Stepanova [ 2014-02-16 ]

Attached another test case, mdev5600-2.test, just to point that the example in the description is not unique.
The test contains both "good" and "bad" queries. First, a PS is prepared from a "bad" query and executed twice, the first result is 3 and the second result is 0. Then, another PS is prepared from a "good" query, which differs from the first one by two spaces after the first SELECT word, and again executed twice, both results are 3.

Comment by Elena Stepanova [ 2014-02-17 ]

Attached complete SHOW STATUS increments for each PS execution from mdev5600-2.test. There are 4 files total, 2 for the bad query (1st and 2nd execution), and 2 for the good query.

With the good PS, Handler* increments for the 1st and the 2nd execution are the same:

> Handler_read_first    1
> Handler_read_key      472
> Handler_read_next     63
> Handler_read_rnd_next 22
> Handler_tmp_write     196

With the bad PS, Handler* increments differ.

1st execution of the bad PS (same as for the good one):

> Handler_read_first    1
> Handler_read_key      472
> Handler_read_next     63
> Handler_read_rnd_next 22
> Handler_tmp_write     196

2nd execution of the bad PS (differs):

> Handler_read_first    1
> Handler_read_key      521
> Handler_read_next     20
> Handler_read_rnd_next 22
> Handler_tmp_write     211

Comment by Elena Stepanova [ 2014-02-17 ]

Further statistics for mdev5600-2.test:

For both executions of the good PS, and for the 1st execution of the bad PS, TABLE_STATISTICS and INDEX_STATISTICS is the same:

SHOW TABLE_STATISTICS;
Table_schema	Table_name	Rows_read	Rows_changed	Rows_changed_x_#indexes
test	CC	57	0	0
test	BB	2	0	0
test	C	46	0	0
SHOW INDEX_STATISTICS;
Table_schema	Table_name	Index_name	Rows_read
test	CC	col_varchar_key	20
test	CC	PRIMARY	17
test	C	col_int_key	46

For the 2nd execution of the bad query, the values are different:

SHOW TABLE_STATISTICS;
Table_schema	Table_name	Rows_read	Rows_changed	Rows_changed_x_#indexes
test	CC	58	0	0
test	BB	2	0	0
SHOW INDEX_STATISTICS;
Table_schema	Table_name	Index_name	Rows_read
test	CC	col_varchar_key	20
test	CC	PRIMARY	18

(there is indeed no values for C table, it's not an overlook).

Comment by Elena Stepanova [ 2014-02-17 ]

With the second test case, mdev5600-2, the problem is reproducible on 5.5 and 10.0. On 10.0 in my build it is more persistent, it requires more blank spaces to be remove to convert the "bad" query into the "good" one, but otherwise it is the same.

Comment by Elena Stepanova [ 2014-02-17 ]

Execution plans as printed in the slow log on 10.0.

For both executions of the good PS and the 1st execution of the bad PS:

# explain: id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
# explain: 1	PRIMARY	table1	index	col_varchar_key	col_varchar_key	9	NULL	20	Using index
# explain: 1	PRIMARY	SUBQUERY1_t1	ALL	PRIMARY	NULL	NULL	NULL	20	Using where; Using join buffer (flat, BNL join)
# explain: 1	PRIMARY	table3	eq_ref	PRIMARY,col_varchar_key	PRIMARY	4	test.SUBQUERY1_t1.pk	1	Using where
# explain: 1	PRIMARY	SUBQUERY2_t1	ref	col_int_key	col_int_key	5	test.SUBQUERY1_t1.col_int_nokey	3	Using where; FirstMatch(table3)
# explain: 4	MATERIALIZED	SUBQUERY3_t1	system	NULL	NULL	NULL	NULL	1	

For the 2nd execution of the bad PS:

# explain: id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
# explain: 1	PRIMARY	table1	index	col_varchar_key	col_varchar_key	9	NULL	20	Using index
# explain: 1	PRIMARY	SUBQUERY1_t1	ALL	PRIMARY	NULL	NULL	NULL	20	Using where; Using join buffer (flat, BNL join)
# explain: 1	PRIMARY	table3	eq_ref	PRIMARY,col_varchar_key	PRIMARY	4	test.SUBQUERY1_t1.pk	1	Using where
# explain: 1	PRIMARY	SUBQUERY2_t1	ref	col_int_key	col_int_key	5	test.SUBQUERY1_t1.col_varchar_nokey	3	Using index condition; Using where; FirstMatch(table3)
# explain: 4	MATERIALIZED	SUBQUERY3_t1	system	NULL	NULL	NULL	NULL	1	

Comment by Elena Stepanova [ 2014-02-17 ]

(back on 5.5)

If I do PREPARE .. "EXPLAIN .. instead of PREPARE .. "SELECT ..., the plan returned by all 4 executions is the same:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	table1	index	col_varchar_key	col_varchar_key	9	NULL	20	Using index
1	PRIMARY	SUBQUERY1_t1	ALL	PRIMARY	NULL	NULL	NULL	20	Using where; Using join buffer (flat, BNL join)
1	PRIMARY	table3	eq_ref	PRIMARY,col_varchar_key	PRIMARY	4	test.SUBQUERY1_t1.pk	1	Using where
1	PRIMARY	SUBQUERY2_t1	ref	col_int_key	col_int_key	5	test.SUBQUERY1_t1.col_int_nokey	3	Using where; FirstMatch(table3)
4	MATERIALIZED	SUBQUERY3_t1	system	NULL	NULL	NULL	NULL	1	

Comment by Sergei Petrunia [ 2014-02-18 ]

I am tracing handler calls for good and bad executions. The difference starts
to be seen here:

==== Good ====

Breakpoint 5, handler::ha_rnd_next
"SUBQUERY3_t1(col_varchar_nokey)=(g)"

Breakpoint 3, handler::ha_index_next
"table1(col_varchar_key)=(NULL)"

Breakpoint 3, handler::ha_index_next
"table1(col_varchar_key)=(c)"

... more reads from table1 ...

Start of difference:
Breakpoint 5, handler::ha_rnd_next
"SUBQUERY1_t1(pk,col_int_nokey,col_varchar_nokey)=(10,NULL,x)"

Breakpoint 5, handler::ha_rnd_next
"SUBQUERY1_t1(pk,col_int_nokey,col_varchar_nokey)=(11,8,d)"

Breakpoint 1, handler::ha_index_read_map
"table3(pk,col_varchar_key)=(11,d)"

...

==== Bad ====
Breakpoint 5, handler::ha_rnd_next
"SUBQUERY3_t1(col_varchar_nokey)=(g)"

Breakpoint 3, handler::ha_index_next
"table1(col_varchar_key)=(NULL)"

Breakpoint 3, handler::ha_index_next
"table1(col_varchar_key)=(c)"

... more reads from table1 ...

Start of difference:
Breakpoint 5, handler::ha_rnd_next
"SUBQUERY1_t1(pk,col_int_nokey,col_int_key,col_date_key,col_date_nokey,col_time_key,col_time_nokey,col_datetime_key,col_datetime_nokey,col_varchar_key,col_varchar_nokey)=(10,NULL,8,2000-12-03,2000-12-0"...

Breakpoint 1, handler::ha_index_read_map
"table3(pk,col_varchar_key)=(10,x)"

==== Comparison ====

For some reason, Bad execution reads all columns from SUBQUERY1_t1 instead of
three columns that Good execution uses. The both read the same row with pk=10,
though.

Then, Good execution proceeds to read a row from SUBQUERY1_t1 with pk=11, while
Bad execution reads row from table3 with pk=10.

Looking at EXPLAIN:

----------------------------------------------------------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

----------------------------------------------------------------------------------------------------------------------------------------+

1 PRIMARY table1 index col_varchar_key col_varchar_key 9 NULL 20 Using index
1 PRIMARY SUBQUERY1_t1 ALL PRIMARY NULL NULL NULL 20 Using where; Using join buffer (flat, BNL join)
1 PRIMARY table3 eq_ref PRIMARY,col_varchar_key PRIMARY 4 SUBQUERY1_t1.pk 1 Using where
1 PRIMARY SUBQUERY2_t1 ref col_int_key col_int_key 5 SUBQUERY1_t1.col_int_nokey 3 Using where; FirstMatch(table3)
4 MATERIALIZED SUBQUERY3_t1 system NULL NULL NULL NULL 1  

----------------------------------------------------------------------------------------------------------------------------------------+

Comment by Sergei Petrunia [ 2014-02-18 ]

I've found where the first difference happens. It happens here:

(gdb) wher
#0 SQL_SELECT::skip_record (this=0x7fffcdd38638, thd=0x7fffd3e7c060) at /home/psergey/dev2/5.5-cp/sql/opt_range.h:1013
#1 0x000000000074d248 in JOIN_CACHE::check_match (this=0x7fffcdd39028, rec_ptr=...) at /home/psergey/dev2/5.5-cp/sql/sql_join_cache.cc:2446
#2 0x0000000000748f89 in JOIN_CACHE::generate_full_extensions (this=0x7fffcdd39028, rec_ptr=...) at /home/psergey/dev2/5.5-cp/sql/sql_join_cache.cc:2389
#3 0x0000000000748d87 in JOIN_CACHE::join_matching_records (this=0x7fffcdd39028, skip_last=false) at /home/psergey/dev2/5.5-cp/sql/sql_join_cache.cc:2289
#4 0x00000000007487dd in JOIN_CACHE::join_records (this=0x7fffcdd39028, skip_last=false) at /home/psergey/dev2/5.5-cp/sql/sql_join_cache.cc:2086
#5 0x000000000066a8f2 in sub_select_cache (join=0x7fffcdc478f0, join_tab=0x7fffcdd33c70, end_of_records=true) at /home/psergey/dev2/5.5-cp/sql/sql_select.cc:16568
#6 0x000000000066aaf6 in sub_select (join=0x7fffcdc478f0, join_tab=0x7fffcdd33950, end_of_records=true) at /home/psergey/dev2/5.5-cp/sql/sql_select.cc:16738
#7 0x000000000066a5df in do_select (join=0x7fffcdc478f0, fields=0x7fffcdc47cb0, table=0x0, procedure=0x0) at /home/psergey/dev2/5.5-cp/sql/sql_select.cc:16449
#8 0x0000000000649253 in JOIN::exec (this=0x7fffcdc478f0) at /home/psergey/dev2/5.5-cp/sql/sql_select.cc:2859
#9 0x0000000000649a5f in mysql_select (thd=0x7fffd3e7c060, rref_pointer_array=0x7fffcdc86a78, tables=0x7fffcdc62078, wild_num=0, fields=..., conds=0x7fffcdc475d8, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2416184064, result=0x7fffcdd27fe8, unit=0x7fffcdc86130, select_lex=0x7fffcdc86808) at /home/psergey/dev2/5.5-cp/sql/sql_select.cc:3079
#10 0x0000000000640460 in handle_select (thd=0x7fffd3e7c060, lex=0x7fffcdc86080, result=0x7fffcdd27fe8, setup_tables_done_option=0) at /home/psergey/dev2/5.5-cp/sql/sql_select.cc:319
#11 0x0000000000618df3 in execute_sqlcom_select (thd=0x7fffd3e7c060, all_tables=0x7fffcdc62078) at /home/psergey/dev2/5.5-cp/sql/sql_parse.cc:4688
#12 0x00000000006117f6 in mysql_execute_command (thd=0x7fffd3e7c060) at /home/psergey/dev2/5.5-cp/sql/sql_parse.cc:2232
#13 0x00000000006332c4 in Prepared_statement::execute (this=0x7fffcdd16460, expanded_query=0x7ffff7eeba40, open_cursor=false) at /home/psergey/dev2/5.5-cp/sql/sql_prepare.cc:3928
#14 0x0000000000632390 in Prepared_statement::execute_loop (this=0x7fffcdd16460, expanded_query=0x7ffff7eeba40, open_cursor=false, packet=0x0, packet_end=0x0) at /home/psergey/dev2/5.5-cp/sql/sql_prepare.cc:3587
#15 0x0000000000630420 in mysql_sql_stmt_execute (thd=0x7fffd3e7c060) at /home/psergey/dev2/5.5-cp/sql/sql_prepare.cc:2737
#16 0x0000000000611827 in mysql_execute_command (thd=0x7fffd3e7c060) at /home/psergey/dev2/5.5-cp/sql/sql_parse.cc:2242
#17 0x000000000061b5e8 in mysql_parse (thd=0x7fffd3e7c060, rawbuf=0x7fffcdc47078 "execute stmt", length=12, parser_state=0x7ffff7eec510) at /home/psergey/dev2/5.5-cp/sql/sql_parse.cc:5799
#18 0x000000000060ec48 in dispatch_command (command=COM_QUERY, thd=0x7fffd3e7c060, packet=0x7fffd3f91061 "", packet_length=12) at /home/psergey/dev2/5.5-cp/sql/sql_parse.cc:1078
#19 0x000000000060dda3 in do_command (thd=0x7fffd3e7c060) at /home/psergey/dev2/5.5-cp/sql/sql_parse.cc:793
#20 0x0000000000716cd1 in do_handle_one_connection (thd_arg=0x7fffd3e7c060) at /home/psergey/dev2/5.5-cp/sql/sql_connect.cc:1266
#21 0x00000000007166b8 in handle_one_connection (arg=0x7fffd3e7c060) at /home/psergey/dev2/5.5-cp/sql/sql_connect.cc:1181
#22 0x00007ffff737de9a in start_thread (arg=0x7ffff7eed700) at pthread_create.c:308
#23 0x00007ffff6aae3fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

Here, the condition is:
(gdb) p dbug_print_item(cond)
$1202 = 0x153ba20 "(`j44`.`SUBQUERY1_t1`.`col_varchar_nokey` is not null)"

and for the second (bad) execution the value is:
(gdb) p cond->val_int()
$1203 = 1

while for the first (good one) it is:
(gdb) p cond->val_int()
$658 = 0

Comment by Sergei Petrunia [ 2014-02-18 ]

On a more careful examination, the condition attached to SUBQUERY1_t1 is
different:

First execution:

(gdb) p dbug_print_item(cond)
$706 = 0x153ca20 "(`j23`.`SUBQUERY1_t1`.`col_int_nokey` is not null)"

Second execution:
(gdb) p dbug_print_item(cond)
$1225 = 0x153ba20 "(`j44`.`SUBQUERY1_t1`.`col_varchar_nokey` is not null)"

Since we're looking at this row:
(gdb) p dbug_print_table_row(join_tab->table)
$688 = 0x153bea0 "SUBQUERY1_t1(pk,col_int_nokey,col_varchar_nokey)=(10,NULL,x)"

cond->val_int() is apparently different.

Comment by Sergei Petrunia [ 2014-02-19 ]

The condition is generated by make_join_select()/add_not_null_conds().

There, we have (for both executions):
tab= "SUBQUERY2_t1"
referred_tab=SUBQUERY1_t1

However, tab->ref.items[keypart] are different:
$1271 = 0x153ba20 "`j44`.`SUBQUERY1_t1`.`col_varchar_nokey`"
$750 = 0x153ca20 "`j23`.`SUBQUERY1_t1`.`col_int_nokey`"

Keyuse arrays seem to be the same:
(gdb) p ((KEYUSE*)this->keyuse.buffer)[0].table->alias.Ptr
$723 = 0x7fffcd0452e0 "SUBQUERY1_t1"
(gdb) p ((KEYUSE*)this->keyuse.buffer)[1].table->alias.Ptr
$725 = 0x7fffcd01d460 "table3"
(gdb) p ((KEYUSE*)this->keyuse.buffer)[2].table->alias.Ptr
$727 = 0x7fffcd01d460 "table3"
(gdb) p ((KEYUSE*)this->keyuse.buffer)[3].table->alias.Ptr
$729 = 0x7fffcd045360 "SUBQUERY2_t1"

(gdb) p dbug_print_item(((KEYUSE*)this->keyuse.buffer)[0].val)
$731 = 0x153ca20 "`j23`.`table3`.`pk`"
(gdb) p dbug_print_item(((KEYUSE*)this->keyuse.buffer)[0].val)
$735 = 0x153ca20 "`j23`.`table3`.`pk`"
(gdb) p dbug_print_item(((KEYUSE*)this->keyuse.buffer)[1].val)
$737 = 0x153ca20 "`j23`.`SUBQUERY1_t1`.`pk`"
(gdb) p dbug_print_item(((KEYUSE*)this->keyuse.buffer)[2].val)
$739 = 0x153ca20 "`j23`.`SUBQUERY1_t1`.`col_varchar_nokey`"
(gdb) p dbug_print_item(((KEYUSE*)this->keyuse.buffer)[3].val)
$741 = 0x153ca20 "`j23`.`SUBQUERY1_t1`.`col_int_nokey`"

Comment by Sergei Petrunia [ 2014-02-19 ]

In both cases, we have:

(gdb) p join->best_positions[3].table->table->alias.Ptr
$755 = 0x7fffcd045360 "SUBQUERY2_t1"

(gdb) p dbug_print_item(join->best_positions[3].key.val)
$760 = 0x153ca20 "`j23`.`SUBQUERY1_t1`.`col_int_nokey`"

So, it's either equality propagation or create_ref_for_key().

Comment by Sergei Petrunia [ 2014-02-19 ]

It's equality propagation code. When trying to do equality substitution for
(gdb) p dbug_print_item(ref_item)
$769 = 0x153ca20 "`j23`.`SUBQUERY1_t1`.`col_int_nokey`"

the second execution sees and makes use of:

multiple equal(`j44`.`SUBQUERY1_t1`.`col_varchar_nokey`, `j44`.`table3`.`col_varchar_key`)

while the first execution doesn't see it.

Hmm.. how come this multi-quality is used for substituting col_int_nokey? It doesn't include col_int_nokey...

Comment by Sergei Petrunia [ 2014-02-19 ]

Hardware watchpoint 35: *$a4
Old value = (Item_equal *) 0x0
New value = (Item_equal *) 0x7fffcd130078

field_item->set_item_equal(item_equal);

(gdb) p dbug_print_item(field_item)
$1407 = 0x153ba20 "`j44`.`SUBQUERY1_t1`.`col_int_nokey`"
(gdb) p dbug_print_item(item_equal)
$1409 = 0x153ba20 "multiple equal(`j44`.`SUBQUERY1_t1`.`col_varchar_nokey`, `j44`.`table3`.`col_varchar_key`)"

So, an Item_field gets a wrong Item_equal. It happens here:

#0 0x00000000007f8917 in Item_field::set_item_equal
#1 0x00000000007f24f4 in Item_direct_view_ref::replace_equal_field
#2 0x00000000007ef53a in Item_ref::transform
#3 0x000000000066089e in substitute_for_best_equal_field
#4 0x0000000000643895 in JOIN::optimize
#5 0x00000000006499cd in mysql_select

Comment by Sergei Petrunia [ 2014-02-19 ]

So, we are at
#1 0x000000000066089e in substitute_for_best_equal_field (context_tab=0x7fffcd1340c8, cond=0x7fffcd053db8, cond_equal=0x7fffcd047568, table_join_idx=0x7fffcd130ea0) at /home/psergey/dev2/5.5/sql/sql_select.cc:12615
(gdb) wher
#0 Item_ref::transform (this=0x7fffcd053db8, transformer=&virtual Item::replace_equal_field(unsigned char*), arg=0x7ffff7eea710 "\330\213\004\315\377\177") at /home/psergey/dev2/5.5/sql/item.cc:7104
#1 0x000000000066089e in substitute_for_best_equal_field (context_tab=0x7fffcd1340c8, cond=0x7fffcd053db8, cond_equal=0x7fffcd047568, table_join_idx=0x7fffcd130ea0) at /home/psergey/dev2/5.5/sql/sql_select.cc:12615
#2 0x0000000000643895 in JOIN::optimize (this=0x7fffcd0477e8) at /home/psergey/dev2/5.5/sql/sql_select.cc:1362

(gdb) p dbug_print_item(cond)
$1446 = 0x153ba20 "`j44`.`SUBQUERY1_t1`.`col_int_nokey`"
(gdb) p (Item*)cond
$1447 = (Item_direct_view_ref *) 0x7fffcd053db8

(gdb) p dbug_print_item(item_equal)
$1448 = 0x153ba20 "multiple equal(`j44`.`SUBQUERY1_t1`.`col_int_nokey`, `j44`.`SUBQUERY2_t1`.`col_int_key`)"

We enter Item_ref::transform. We execute this:

. Item *new_item= (*ref)->transform(transformer, arg);

which calls Item::transform() which calls Item_field::replace_equal_field().
The condition

. if (item_equal && item_equal == param->item_equal)

fails because item_equal==NULL, and no substitution takes place.
Item_ref::transform() invokes this:

. return (this->*transformer)(arg);

which calls Item_direct_view_ref::replace_equal_field, which runs

field_item->set_item_equal(item_equal);

Comment by Sergei Petrunia [ 2014-02-19 ]

Maybe, it's Item_direct_view_ref object that is participating in the wrong Item_equal.

The problematic Item_direct_view_ref can be found as follows: put a
breakpoint on Item_direct_view_ref::fix_fields, and note the first object
for which the breakpoint is hit on the second EXECUTE statement.

When the breakpoint is hit, Item_direct_view_ref::item_equal has non-zero
value, but I suspect it is invalid data.

Comment by Sergei Petrunia [ 2014-02-19 ]

Ok, here it is how it goes:
mysql> source q2.sql
mysql> execute stmt;

Breakpoint 29, Item_direct_view_ref::Item_direct_view_ref (this=0x7fffcd453db8, context_arg=0x7d8, item=0x7fffcd452120, table_name_arg=0x7fffcd463ff0 "table2", field_name_arg=0x7fffcd4513c8 "col_int_nokey", view_arg=0x7fffcd4e4d98) at /home/psergey/dev2/5.5/sql/item.h:3301

Breakpoint 29, Item_direct_view_ref::Item_direct_view_ref ...

Breakpoint 29, Item_direct_view_ref::Item_direct_view_ref ...

Breakpoint 38, Item_equal::Item_equal (this=0x7fffcd530078, f1=0x7fffcd453db8, f2=0x7fffcd4e8a80, with_const_item=false) at /home/psergey/dev2/5.5/sql/item_cmpfunc.cc:5602
// Note the f1, it matches the Irem_direct_view_ref above. This is a multi-equality that Item_ref will participate in.
// Let's keep track of when this Item_equal is deleted:
(gdb) b Item::delete_self
Breakpoint 40 at 0x5e222c: file /home/psergey/dev2/5.5/sql/item.h, line 1396.
(gdb) cond 40 (this ==0x7fffcd530078)

// Let's also put a breakpoint somewhere in JOIN::optimize after equality propagation is done.
(gdb) break "/home/psergey/dev2/5.5/sql/sql_select.cc:1149"
Breakpoint 41 at 0x642bd7: file /home/psergey/dev2/5.5/sql/sql_select.cc, line 1149.

(gdb) cont

Breakpoint 38, Item_equal::Item_equal( .... // ignore this
Breakpoint 38, Item_equal::Item_equal (... // ignore this

Breakpoint 41, JOIN::optimize (this=0x7fffcd4478f0) at /home/psergey/dev2/5.5/sql/sql_select.cc:1149

// Ok, let's see how our objects are doing

(gdb) p ((Item_direct_view_ref*)0x7fffcd453db8)->item_equal
$1622 = (Item_equal *) 0x7fffcd530078
(gdb) p dbug_print_item(((Item_direct_view_ref*)0x7fffcd453db8)->item_equal)
$1623 = 0x153ba20 "multiple equal(`j44`.`SUBQUERY1_t1`.`col_int_nokey`, `j44`.`SUBQUERY2_t1`.`col_int_key`)"
(gdb) p dbug_print_item(((Item_direct_view_ref*)0x7fffcd453db8))
$1624 = 0x153ba20 "`j44`.`SUBQUERY1_t1`.`col_int_nokey`"

// Item_..._ref has pointer to Item_equal, and this seems to be correct ...

(gdb) cont

Breakpoint 40, Item::delete_self (this=0x7fffcd530078) at /home/psergey/dev2/5.5/sql/item.h:1396
// Ok, this is Item_equal deleting himself...

(gdb) p ((Item_direct_view_ref*)0x7fffcd453db8)->item_equal
$1629 = (Item_equal *) 0x7fffcd530078
// ^^^Note that Item_direct_view_ref object still has the wrong pointer.

// Let's see if the item is created at that location:
(gdb) b Item::Item
Breakpoint 42 at 0x7dd866: Item::Item. (2 locations)
(gdb) cond 42 (this == 0x7fffcd530078)
(gdb) c

(gdb) cont

----------

COUNT

----------

3

----------
// First EXECUTE has finished.
// Call it once again:
mysql> execute stmt;

...
Breakpoint 27, Item_direct_view_ref::fix_fields (this=0x7fffcd453db8, thd=0x7fffd3e7c060, reference=0x7fffcd52c058) at /home/psergey/dev2/5.5/sql/item.cc:7898
// This is our Item_direct_view_ref...

Breakpoint 38, Item_equal::Item_equal (this=0x7fffcd448bd8, f1=0x7fffcd453db8, f2=0x7fffcd4e8a80, with_const_item=false) at /home/psergey/dev2/5.5/sql/item_cmpfunc.cc:5602
// This is the Item_equal that's made from for our Item_direct_view_ref

Breakpoint 42, Item::Item (this=0x7fffcd530078) at /home/psergey/dev2/5.5/sql/item.cc:499
// And this is another item that is created in the location that Item_direct_view_ref has pointer to...

(gdb) up
#1 0x00000000005752e4 in Item_result_field::Item_result_field (this=0x7fffcd530078) at /home/psergey/dev2/5.5/sql/item.h:2869
(gdb) up
#2 0x00000000005ec7f8 in Item_func::Item_func (this=0x7fffcd530078) at /home/psergey/dev2/5.5/sql/item_func.h:67
(gdb) up
#3 0x00000000005ecbc8 in Item_int_func::Item_int_func (this=0x7fffcd530078) at /home/psergey/dev2/5.5/sql/item_func.h:577
(gdb) up
#4 0x00000000005ecca6 in Item_bool_func::Item_bool_func (this=0x7fffcd530078) at /home/psergey/dev2/5.5/sql/item_cmpfunc.h:121
(gdb) up
#5 0x0000000000810155 in Item_equal::Item_equal (this=0x7fffcd530078, f1=0x7fffcd4e5a10, f2=0x7fffcd448940, with_const_item=false) at /home/psergey/dev2/5.5/sql/item_cmpfunc.cc:5602
(gdb) p dbug_print_item(f1)
$1642 = 0x153ba20 "`j44`.`table3`.`col_varchar_key`"
(gdb) p dbug_print_item(f2)
$1644 = 0x153ba20 "`j44`.`SUBQUERY1_t1`.`col_varchar_nokey`"

THis is how the Item_direct_view_ref, which represents a reference to col_int_nokey column gets connected to col_varchar*key column.

Generated at Thu Feb 08 07:05:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.