[MDEV-8087] Server crashed in Time_and_counter_tracker::incr_loops Created: 2015-05-01  Updated: 2015-10-12  Resolved: 2015-10-12

Status: Closed
Project: MariaDB Server
Component/s: Parser
Affects Version/s: 10.1
Fix Version/s: 10.1.8

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Oleksandr Byelkin
Resolution: Fixed Votes: 0
Labels: None

Sprint: 10.1.7-1, 10.1.8-1, 10.1.8-3, 10.1.8-4

 Description   

Note: unfortunately, I only have a concurrent test case. Run it with --repeat=N.

CREATE PROCEDURE proc() SELECT * FROM v2; 
 
--connect (con1,localhost,root,,test)
CREATE ALGORITHM = UNDEFINED VIEW v1 AS SELECT 1;
CREATE ALGORITHM = TEMPTABLE VIEW v2 AS SELECT 3 FROM v1;
DROP VIEW v1;
 
--error ER_VIEW_INVALID
CALL proc();
 
--connection default
--send CREATE ALGORITHM = TEMPTABLE VIEW v1 AS SELECT 2
 
--connection con1
CALL proc(); 

Stack trace from 1a3321b6496dcdbac47efb48e7b66aa23fd8e0f7

#3  <signal handler called>
#4  0x00007fb8fc286e44 in Time_and_counter_tracker::incr_loops (this=0x68) at 10.1/sql/sql_analyze_stmt.h:89
#5  0x00007fb8fc3024c5 in JOIN::exec (this=0x7fb8f0c8bec0) at 10.1/sql/sql_select.cc:2391
#6  0x00007fb8fc305b03 in mysql_select (thd=0x7fb8f2ab2070, rref_pointer_array=0x7fb8f0c833c8, tables=0x7fb8f0c8e088, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2416185088, result=0x7fb8f0c8bdc8, unit=0x7fb8f0c82a60, select_lex=0x7fb8f0c83150) at 10.1/sql/sql_select.cc:3323
#7  0x00007fb8fc289302 in mysql_derived_fill (thd=0x7fb8f2ab2070, lex=0x7fb8f0c7bb68, derived=0x7fb8f0c82188) at 10.1/sql/sql_derived.cc:938
#8  0x00007fb8fc288e94 in mysql_derived_optimize (thd=0x7fb8f2ab2070, lex=0x7fb8f0c7bb68, derived=0x7fb8f0c82188) at 10.1/sql/sql_derived.cc:827
#9  0x00007fb8fc287bd7 in mysql_handle_single_derived (lex=0x7fb8f0c7bb68, derived=0x7fb8f0c82188, phases=4) at 10.1/sql/sql_derived.cc:195
#10 0x00007fb8fc3b6620 in TABLE_LIST::handle_derived (this=0x7fb8f0c82188, lex=0x7fb8f0c7bb68, phases=4) at 10.1/sql/table.cc:7090
#11 0x00007fb8fc2a5008 in st_select_lex::handle_derived (this=0x7fb8f0c7c320, lex=0x7fb8f0c7bb68, phases=4) at 10.1/sql/sql_lex.cc:3591
#12 0x00007fb8fc2fdbbd in JOIN::optimize_inner (this=0x7fb8f0c8b1c0) at 10.1/sql/sql_select.cc:1082
#13 0x00007fb8fc2fd95e in JOIN::optimize (this=0x7fb8f0c8b1c0) at 10.1/sql/sql_select.cc:1021
#14 0x00007fb8fc305a70 in mysql_select (thd=0x7fb8f2ab2070, rref_pointer_array=0x7fb8f0c7c598, tables=0x7fb8f0c82188, wild_num=1, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147749632, result=0x7fb8f0c8b1a0, unit=0x7fb8f0c7bc30, select_lex=0x7fb8f0c7c320) at 10.1/sql/sql_select.cc:3309
#15 0x00007fb8fc2fbbbf in handle_select (thd=0x7fb8f2ab2070, lex=0x7fb8f0c7bb68, result=0x7fb8f0c8b1a0, setup_tables_done_option=0) at 10.1/sql/sql_select.cc:371
#16 0x00007fb8fc2bc071 in execute_sqlcom_select (thd=0x7fb8f2ab2070, all_tables=0x7fb8f0c82188) at 10.1/sql/sql_parse.cc:5805
#17 0x00007fb8fc2b24aa in mysql_execute_command (thd=0x7fb8f2ab2070) at 10.1/sql/sql_parse.cc:2937
#18 0x00007fb8fc61368c in sp_instr_stmt::exec_core (this=0x7fb8f0c82768, thd=0x7fb8f2ab2070, nextp=0x7fb8fbd19124) at 10.1/sql/sp_head.cc:3135
#19 0x00007fb8fc612dcb in sp_lex_keeper::reset_lex_and_exec_core (this=0x7fb8f0c827a8, thd=0x7fb8f2ab2070, nextp=0x7fb8fbd19124, open_tables=false, instr=0x7fb8f0c82768) at 10.1/sql/sp_head.cc:2901
#20 0x00007fb8fc613394 in sp_instr_stmt::execute (this=0x7fb8f0c82768, thd=0x7fb8f2ab2070, nextp=0x7fb8fbd19124) at 10.1/sql/sp_head.cc:3051
#21 0x00007fb8fc60efca in sp_head::execute (this=0x7fb8f0c7b088, thd=0x7fb8f2ab2070, merge_da_on_success=true) at 10.1/sql/sp_head.cc:1316
#22 0x00007fb8fc610e62 in sp_head::execute_procedure (this=0x7fb8f0c7b088, thd=0x7fb8f2ab2070, args=0x7fb8f2ab66b0) at 10.1/sql/sp_head.cc:2103
#23 0x00007fb8fc2b0c9d in do_execute_sp (thd=0x7fb8f2ab2070, sp=0x7fb8f0c7b088) at 10.1/sql/sql_parse.cc:2383
#24 0x00007fb8fc2b9d0e in mysql_execute_command (thd=0x7fb8f2ab2070) at 10.1/sql/sql_parse.cc:5180
#25 0x00007fb8fc2bf5de in mysql_parse (thd=0x7fb8f2ab2070, rawbuf=0x7fb8f0c22088 "CALL proc()", length=11, parser_state=0x7fb8fbd1a180) at 10.1/sql/sql_parse.cc:7181
#26 0x00007fb8fc2ae772 in dispatch_command (command=COM_QUERY, thd=0x7fb8f2ab2070, packet=0x7fb8f2ab8071 "CALL proc()", packet_length=11) at 10.1/sql/sql_parse.cc:1470
#27 0x00007fb8fc2ad49d in do_command (thd=0x7fb8f2ab2070) at 10.1/sql/sql_parse.cc:1093
#28 0x00007fb8fc3ee1f5 in do_handle_one_connection (thd_arg=0x7fb8f2ab2070) at 10.1/sql/sql_connect.cc:1350
#29 0x00007fb8fc3edf59 in handle_one_connection (arg=0x7fb8f2ab2070) at 10.1/sql/sql_connect.cc:1262
#30 0x00007fb8fcaa4f7c in pfs_spawn_thread (arg=0x7fb8f9023ff0) at 10.1/storage/perfschema/pfs.cc:1860
#31 0x00007fb8fa6bee9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#32 0x00007fb8f9debcbd in clone () from /lib/x86_64-linux-gnu/libc.so.6



 Comments   
Comment by Elena Stepanova [ 2015-06-30 ]

Please re-assign if needed.

Comment by Sergei Petrunia [ 2015-08-07 ]

Ok I was able to repeat.

It crashes when

"CALL proc()" executes

SELECT * FROM v2

which calls handle_derived() for v2

We reach JOIN::exec_inner() which has

  join->optimized=true
  join->have_query_plan == QEP_AVAILABLE
  join->join_tab[0] points to a heap table with alias="v1".

but
join->explain=NULL.

Comment by Sergei Petrunia [ 2015-08-07 ]

There is no crash when the queries are run serially. derived subqueries have degenerate joins but the code handles that situation.

Comment by Sergei Petrunia [ 2015-08-07 ]

It's interesting that Explain data structure actually exists:

(gdb) p thd->lex->explain->get_node(2)
$17 = (Explain_select *) 0x7fffe001dde0
(gdb) p $17->message
$18 = 0x55555636d4dc "No tables used"

but it didn't get saved in join->explain.

Comment by Sergei Petrunia [ 2015-08-07 ]

Added Explain_select::my_join to check how is it possible that Explain_select exists but join->explain==NULL.

(gdb) p thd->lex->explain->get_node(2)
$3 = (Explain_node *) 0x7fffe0030c50
(gdb) p $3->my_join
$4 = (void *) 0x7fffe002fdf8
(gdb) p this
$5 = (JOIN * const) 0x7fffe00304d0

So, there is a different JOIN object!

(gdb) p (class JOIN*) 0x7fffe002fdf8  ##this is $4
$6 = (JOIN *) 0x7fffe002fdf8
 
(gdb) p $6->select_lex
$7 = (SELECT_LEX *) 0x7fffe00329a8
(gdb) p this->select_lex
$8 = (SELECT_LEX *) 0x7fffe002a8c0

it has a different select_lex.

(gdb) p $6->select_lex->select_number
$9 = 2
(gdb) p this->select_lex->select_number
$10 = 2

both select_lex objects have id=2!

Comment by Oleksandr Byelkin [ 2015-09-07 ]

1) I just lost moment when select number in the statement became ID. Different statement and the same number are possible
2) where the select belong to have to be important.
3) how we reached select of VIEW?

Comment by Oleksandr Byelkin [ 2015-09-07 ]

Probably SELECT should be identified by LEX and its number in the request...

Comment by Oleksandr Byelkin [ 2015-09-08 ]

As result with talking with Sergey Petrunia we've found that mysql_make_view renumbers SELECTs...

Ways:
1) make parser number views from mlast high number
2) find how it avoid renumbering

Comment by Oleksandr Byelkin [ 2015-09-08 ]

OK. Renumbering work as it should during parsing in mysql_make_view().

So problem probably in caching statement by SP and re-parsing views...

Comment by Oleksandr Byelkin [ 2015-09-22 ]

Problem is that in case VIEW is already processed it do not advance select_number counter in THD. Also the counter is not stored properly in the statement. So we get two selects #2 in case if ALTER is in time to fix the view.

Comment by Oleksandr Byelkin [ 2015-09-22 ]

Sequential execution works because it makes full re-prepare of the procedure including views re-parsing, when in parallel execution one view is used as it was stored and other is parsed (and counter set wrongly).

Comment by Oleksandr Byelkin [ 2015-09-22 ]

revision-id: a3c33026bc1f3a8efe3a165661786e7bc08bea34 (mariadb-10.1.6-108-ga3c3302)
parent(s): 20291639994beaa070b7228eafa7be31eb1d7ff8
committer: Oleksandr Byelkin
timestamp: 2015-09-22 21:59:18 +0200
message:

MDEV-8087: Server crashed in Time_and_counter_tracker::incr_loops

Problem:
Procedure which uses stack of views first executed without most deep view. It fails but one view cached.
Then simultaniusely create the view we lack and execute the procedure.
In the beginning of procedure execution views are not yet changes so procedure used as it was cached.
But by the time we are trying to use most deep view it is already created.
The problem with the view is that thd->select_number (first view was not parsed) so second view will get the same number.

The fix is in keeping the thd->select_number correct even if we use cached views.
In the proposed solution (to keep it simple) counter can be bigger then should but it should not create problem because numbers are still unique and situation is very rare.

Comment by Oleksandr Byelkin [ 2015-09-22 ]

Also the question is should we allow the situation when view changes and the procedure which uses it is not invalidated.

Situation is possible if we had problem with the view before and only when creating/altering view made in the same time with the procedure call but still can possibly bring some other problems.

Comment by Oleksandr Byelkin [ 2015-10-06 ]

fix with test suite:

revision-id: ccb6b0dd1fb603673dbcfe7d1763bd3fe5d3bcaf (mariadb-10.1.6-108-gccb6b0d)
parent(s): 20291639994beaa070b7228eafa7be31eb1d7ff8
committer: Oleksandr Byelkin
timestamp: 2015-10-06 19:54:56 +0200
message:

MDEV-8087: Server crashed in Time_and_counter_tracker::incr_loops

Problem:
Procedure which uses stack of views first executed without most deep view.
It fails but one view cached (as well as whole procedure).
Then simultaniusely create the second view we lack and execute the procedure.
In the beginning of procedure execution the view is not yet created so
procedure used as it was cached (cache was not invalidated).
But by the time we are trying to use most deep view it is already created.
The problem with the view is that thd->select_number (first view was not parsed) so second view will get the same number.

The fix is in keeping the thd->select_number correct even if we use cached views.
In the proposed solution (to keep it simple) counter can be bigger then should but it should not create problem because numbers are still unique and situation is very rare.

Comment by Sergei Petrunia [ 2015-10-11 ]

Ok to push

Generated at Thu Feb 08 07:24:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.