[MDEV-16529] [Draft] Server crashes in cleanup_empty_jtbm_semi_joins on 2nd execution of SP Created: 2018-06-19  Updated: 2023-11-28  Resolved: 2023-11-28

Status: Closed
Project: MariaDB Server
Component/s: Stored routines
Affects Version/s: 10.3
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 1
Labels: None
Environment:

https://travis-ci.org/elenst/travis-tests/jobs/392965630


Attachments: File test_case.sql     HTML File threads     HTML File threads_full    
Issue Links:
Relates
relates to MDEV-21315 Server 10.4 sporadically crashes when... Closed

 Description   

10.3 f2c418079deff5fc0b460961094d8b833b4e30b4

#3  <signal handler called>
#4  0x00005636a71ce5c9 in cleanup_empty_jtbm_semi_joins (join=0x7f89945b5b40, join_list=0x7f89943217e0) at /home/travis/src/sql/opt_subselect.cc:5609
#5  0x00005636a706c9f5 in JOIN::cleanup (this=0x7f89945b5b40, full=true) at /home/travis/src/sql/sql_select.cc:12749
#6  0x00005636a7055733 in JOIN::destroy (this=0x7f89945b5b40) at /home/travis/src/sql/sql_select.cc:4035
#7  0x00005636a70fdbe5 in st_select_lex::cleanup (this=0x7f8994321640) at /home/travis/src/sql/sql_union.cc:1952
#8  0x00005636a7055db3 in mysql_select (thd=0x7f8994000c70, tables=0x7f89942c7a58, wild_num=0, fields=..., conds=0x7f8994327010, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=551903563520, result=0x7f8994328e60, unit=0x7f8994320ed0, select_lex=0x7f8994321640) at /home/travis/src/sql/sql_select.cc:4220
#9  0x00005636a7047e53 in handle_select (thd=0x7f8994000c70, lex=0x7f8994320e08, result=0x7f8994328e60, setup_tables_done_option=0) at /home/travis/src/sql/sql_select.cc:382
#10 0x00005636a701282b in execute_sqlcom_select (thd=0x7f8994000c70, all_tables=0x7f89942c7a58) at /home/travis/src/sql/sql_parse.cc:6541
#11 0x00005636a7008f45 in mysql_execute_command (thd=0x7f8994000c70) at /home/travis/src/sql/sql_parse.cc:3764
#12 0x00005636a6f34fdc in sp_instr_stmt::exec_core (this=0x7f8994322ce8, thd=0x7f8994000c70, nextp=0x7f89f0d186c4) at /home/travis/src/sql/sp_head.cc:3593
#13 0x00005636a6f34441 in sp_lex_keeper::reset_lex_and_exec_core (this=0x7f8994322d38, thd=0x7f8994000c70, nextp=0x7f89f0d186c4, open_tables=false, instr=0x7f8994322ce8) at /home/travis/src/sql/sp_head.cc:3321
#14 0x00005636a6f34bc2 in sp_instr_stmt::execute (this=0x7f8994322ce8, thd=0x7f8994000c70, nextp=0x7f89f0d186c4) at /home/travis/src/sql/sp_head.cc:3499
#15 0x00005636a6f2e9ba in sp_head::execute (this=0x7f89942c69e8, thd=0x7f8994000c70, merge_da_on_success=true) at /home/travis/src/sql/sp_head.cc:1353
#16 0x00005636a6f31355 in sp_head::execute_procedure (this=0x7f89942c69e8, thd=0x7f8994000c70, args=0x7f8994005880) at /home/travis/src/sql/sp_head.cc:2293
#17 0x00005636a700697d in do_execute_sp (thd=0x7f8994000c70, sp=0x7f89942c69e8) at /home/travis/src/sql/sql_parse.cc:2945
#18 0x00005636a70074f2 in Sql_cmd_call::execute (this=0x7f8994013b88, thd=0x7f8994000c70) at /home/travis/src/sql/sql_parse.cc:3187
#19 0x00005636a7011719 in mysql_execute_command (thd=0x7f8994000c70) at /home/travis/src/sql/sql_parse.cc:6279
#20 0x00005636a701673a in mysql_parse (thd=0x7f8994000c70, rawbuf=0x7f8994013a18 "CALL stored_proc_23496 /* TRANSFORM_OUTCOME_UNORDERED_MATCH */ /* QNO 61671 CON_ID 15 */", length=88, parser_state=0x7f89f0d1a600, is_com_multi=false, is_next_command=false) at /home/travis/src/sql/sql_parse.cc:8076
#21 0x00005636a7003a4f in dispatch_command (command=COM_QUERY, thd=0x7f8994000c70, packet=0x7f899400b251 "CALL stored_proc_23496 /* TRANSFORM_OUTCOME_UNORDERED_MATCH */ /* QNO 61671 CON_ID 15 */ ", packet_length=89, is_com_multi=false, is_next_command=false) at /home/travis/src/sql/sql_parse.cc:1847
#22 0x00005636a7002480 in do_command (thd=0x7f8994000c70) at /home/travis/src/sql/sql_parse.cc:1392
#23 0x00005636a71689d7 in do_handle_one_connection (connect=0x5636aa228530) at /home/travis/src/sql/sql_connect.cc:1402
#24 0x00005636a716875b in handle_one_connection (arg=0x5636aa228530) at /home/travis/src/sql/sql_connect.cc:1308
#25 0x00007f89f4f95184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#26 0x00007f89f44a1ffd in clone () from /lib/x86_64-linux-gnu/libc.so.6

travis-workarounds c1710043b35269d4bae46f892bfccc02cd7cf2e2

perl /home/travis/rqg/runall-new.pl --vardir=/home/travis/logs/vardir --basedir=/home/travis/server --duration=350 --threads=6 --seed=1529134669 --reporters=Backtrace,ErrorLog,Deadlock --validators=TransformerNoComparator --views --redefine=conf/mariadb/versioning.yy --redefine=conf/mariadb/alter_table.yy --redefine=conf/mariadb/bulk_insert.yy --redefine=conf/mariadb/sequences.yy --mysqld=--log_output=FILE --mysqld=--max-statement-time=30 --mysqld=--lock-wait-timeout=10 --mysqld=--loose-innodb-lock-wait-timeout=5 --mysqld=--loose-debug_assert_on_not_freed_memory=0 --grammar=conf/replication/replication.yy --gendata=conf/replication/replication-5.1.zz --skip-gendata --gendata-advanced --vcols --transformers=ExecuteAsCTE,ExecuteAsDeleteReturning,ExecuteAsExcept,ExecuteAsExecuteImmediate,ExecuteAsInsertSelect,ExecuteAsIntersect,ExecuteAsUnion,ExecuteAsUpdateDelete,ExecuteAsView,ExecuteAsPreparedTwice,ExecuteAsSPTwice

Not reproducible so far.
Stack traces are attached.
Coredump, datadir etc. are available.



 Comments   
Comment by Michael Widenius [ 2018-06-19 ]

The crash happens in this code:

void cleanup_empty_jtbm_semi_joins(JOIN *join, List<TABLE_LIST> *join_list)
if ((table->jtbm_subselect && table->jtbm_subselect->is_jtbm_const_tab))

Here is what I found out from the core dump:

#4 0x00005636a71ce5c9 in cleanup_empty_jtbm_semi_joins (join=0x7f89945b5b40,
join_list=0x7f89943217e0) at /home/travis/src/sql/opt_subselect.cc:5609
5609 in /home/travis/src/sql/opt_subselect.cc
(gdb) p join_list[0].first[0]
$5 = {<Sql_alloc> =

{<No data fields>}, next = 0x7f8994324a88,
info = 0x7f89942c7a58}
(gdb) p join_list[0].first[0].next[0]
$6 = {<Sql_alloc> = {<No data fields>}

, next = 0x5636a8736d00 <end_of_list>,
info = 0x7f8994328f98}

  1. join_list looks ok!

(gdb) p table
$1 = (TABLE_LIST *) 0x7f8994328f98

  1. Second table in join_list
    gdb) p table->table_name
    $14 = { str = 0x8f8f8f8f8f8f8f8f <error: Cannot access memory at address 0x8f8f8f8f8f8f8f8f>, length = 10344644715844964239}

(gdb) p ((TABLE_LIST*) join_list[0].first[0].info)->table_name
$8 =

{str = 0x7f89942c7a20 "t8", length = 2}

b) p ((TABLE_LIST*) join_list[0].first[0].next[0].info)->table_name
$7 =

{ str = 0x8f8f8f8f8f8f8f8f <error: Cannot access memory at address 0x8f8f8f8f8f8f8f8f>, length = 10344644715844964239}

(gdb) p ((TABLE_LIST*) join_list[0].first[0].next[0].info)->table
$10 = (TABLE *) 0x8f8f8f8f8f8f8f8f

So the bug is that the second table in join_list was allocated on the
wrong mem_root (not the one related to the SP) that was freed on first
execution.

The memory could be allocated here (or somewhere else where we add a table to the join_list)
convert_subq_to_jtbm()
Unable to render embedded object: File ( this is not a join nest) not found.

I did run the test case that Elena thinks was executed to try to find
out how the memory is allocated. I could first not get the code to
get into convert_subq_to_jtbm(). I assume this could be because of
some optimizer_switch setting.

I added the following trivial patch to the code to ensure that we would
not flatten away the sub query:

+++ b/sql/opt_subselect.cc
@@ -676,6 +676,7 @@ int check_and_do_in_subquery_rewrites(JOIN *join)
!((join->select_options | // 10
select_lex->outer_select()>join>select_options) // 10
& SELECT_STRAIGHT_JOIN) && // 10
+ 1 == 2 &&
select_lex->first_cond_optimization) // 11
{
DBUG_PRINT("info", ("Subquery is semi-join conversion candidate"));

When now running the example code in the the bug report, I was able to
verify that the convert_subq_to_jtbm was using the mem_root for SP
(which is correct) to allocate the jtbm.

So the bug is that in some cases we are using the wrong mem_root in
convert_subq_to_jtbm(), but I have not been able to figure out how or
when this happens.
I assume one could have an assert here to check that right mem_root is used?

Other things:
The optimizer swich for the crashed thread is:
(gdb) p /x this->thd->variables.optimizer_switch
$13 = 0xfdf1ffcf

Comment by Oleksandr Byelkin [ 2018-12-21 ]

It looks like it is fixed already (there was several bugs in semijoins fixed)

Generated at Thu Feb 08 08:29:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.