[MDEV-14857] problem with 10.2.11 server crashing when executing stored procedure Created: 2018-01-02  Updated: 2018-02-19  Resolved: 2018-02-01

Status: Closed
Project: MariaDB Server
Component/s: Stored routines
Affects Version/s: 10.1, 10.2.11, 10.2
Fix Version/s: 10.1.31, 10.2.13, 10.3.5

Type: Bug Priority: Blocker
Reporter: Nicholas Denning Assignee: Oleksandr Byelkin
Resolution: Fixed Votes: 6
Labels: regression
Environment:

centos fully patched running java connector 2.1.2


Attachments: Text File bug_error.log     Text File bug_error_2.log     Text File error_mariadb.log     File ip_error_gen.sql     File server.cnf     Text File stacktrace.txt    
Issue Links:
Duplicate
is duplicated by MDEV-14858 MariaDB 10.1.30 Segfault Closed
is duplicated by MDEV-14914 MariaDB 10.2.12 crashes after midnight Closed
is duplicated by MDEV-15037 JOIN::optimize() crash in case of in ... Closed
is duplicated by MDEV-15186 Server Crash on Subquery / Union / Te... Closed
is duplicated by MDEV-15187 MariaDB crashes when running a stored... Closed
Relates
relates to MDEV-15347 Valgrind or ASAN errors in mysql_make... Closed
Sprint: 10.1.31, 10.2.13

 Description   

Our solution follows a model which maintains most of the database manipulation logic in stored procedures.

When I upgraded from 10.2.9 to 10.2.11 I immediately got a failure in existing established code. This code also continues to function on 10.1.28 without problem.

I have prepared a database dump and a script which create a repeatable error.

to avoid shipping all of the code in the database I cut down the number of procedures shipped to a minimum, and the problem is now occurring in a different procedure so it looks like a general error rather than something that is specific to a partiiuclar error in our code.

I dont want to load up the database dump onto a public web site so can I provide that separately when this bug is picked up.

I have attached below the server.cnf file, the error log associated with the current repeatable crash plus a couple of other error logs from previous tests, plus the script to generate this problem.

To recreate:

  • create a database ipswichdb;
  • create a user ipswichdba and essentially grant them super user administrator status to the database.
  • mysql -u ipswichdb -p <ip_2018_01_02_togo.sql
  • mysql -u ipswichdba -p <ip_error_gen.sql

the error files below I generated by using first deleting my current error log and then doing a systemctl restart mariadb after loading the database and before running the test script.

Also I actually run the test script in Heidi, but I get the same effect whether its hedi or a more complex java app that I am running that normally triggers the problem.

Using the 2.1.2 java connector



 Comments   
Comment by Elena Stepanova [ 2018-01-02 ]

You can upload the dump to ftp.askmonty.org/private.

Comment by Nicholas Denning [ 2018-01-02 ]

Elena, sorry but having a bit of trouble connecting to the ftp site. What precisely do I need to do.

Comment by Nicholas Denning [ 2018-01-03 ]

Have worked it out and uploaded all the above files plus a gzip'ed copy of the sql dump to the ftp site. That file is ip_2018_01_02_togo.sql.zip

Comment by Nicholas Denning [ 2018-01-03 ]

Incidentally, I have a standard set of .sql build scripts which we run to create all the tables, views, triggers and procedures. After that we have a data load PoJo java application which reads files and loads data in. Its this data load applicaiton which first exhibits the failure. the stack dump I have provided is that java application. when it fails the bug_error and bug_error_2 are generated.

To prepare the database dump I first did a standard dump without --events --routines. I then dropped and recreated the database, reloaded from the dump then applied just the procedures that I think are needed for the test procedure to work, so that I dont have to provide the entire data layer code base.

The script executing the failing procedure fails in a different procedure from the one running the standalone application and you see that error in the error_mariadb.log

In the server.cnf you will see a load of parameters commented out. I was backing out these parameters to see if there was some sort of resource issue, and of course to turn off encryption so that I could export the limited data loaded at this crash point for you in clear.

the one difference between normal data load and the test database is that there was about 60MB of data in the DOCUMENT_DATA blob field in DOCUMENT_VERSIONS which I set to null while prepareing the dump to reduce the size of the database to be sent as its not touched by the test procedure.

If you get any strange effects and think there are any procedures missing shout and i can upload them.

Comment by Nicholas Denning [ 2018-01-03 ]

The first error logs bug_error and bug_error_2 were generated with the the original server.cnf and the standard java application. You can see different procedures reported being executed and generating the errors. this was only because I was changing the code to try to isolate the problem down to some of my code.

However the procedures that were causing the errors work if i ran the application to the point just before the failing stored procedure call, then committed and exited, then ran the procedure directly.

However when run within the context of the existing session they result in a crash.

Comment by Nicholas Denning [ 2018-01-03 ]

Sorry, there is a material code difference between the original java application runs that resulted in the bug_error* log files and the error_mariadb version. I realized my coding style was sub-optimal and that i was naming every cursor in the stored procedures CUR1 so i spent some time changing the material cursor names to give them distinct cursor names. That is potentially likely to cause a problem if I have nested cursors using the same name, though the documentation describes how these scenarios are handled so should not have generated a crash, just an error! I may not have completed that task universally.

However regardless, this all worked in 10.2.9 and fails in 10.2.11 (my yum update from the repo jumped straight from .9 to .11 so never tested a .10 release).

All still works on 10.1.28 as previously mentioned.

Comment by Elena Stepanova [ 2018-01-03 ]

nick.denning@diegesis.co.uk, thanks a lot for the provided data, analysis and test case, I was able to reproduce the crash as you described.

Comment by Nicholas Denning [ 2018-01-04 ]

Many thanks for the prompt action to verify this. If it can be given a priority that would be very helpful as it is preventing us upgrading to 10.2 in production and as we are using the community edition we cannot start to use the AWS key managment plug in on 10.1.

One thing that i forgot to mention was that when I found the first query that was causing this problem and I moved it further up the calling order, it succeeded, so I suspect that this is nothing to do with the specific code but is indeed a resources issues were a resource limit is not being handled correctly.

Its strange that this bug hit us immediately - very first run of our application post 10.2.11 upgrade so why has it not also affected other people if it kills us dead. So I also remain suspicious that our use of views in an EXISTS() in stored procedures may have something to do with this. It was a bug in this area that the team fixed early last year for us. That bug generated a very similar type of failure.

Comment by Elena Stepanova [ 2018-01-05 ]

Simplified and obfuscated test case:

SET max_sp_recursion_depth=10;
 
CREATE TABLE t1 (a INT);
CREATE TABLE t2 (b INT);
 
--delimiter ;;
 
CREATE PROCEDURE proc_0()
BEGIN
  CALL empty_1();
  CALL proc_1();
END ;;
 
CREATE PROCEDURE proc_1()
BEGIN
  CALL proc_2();
  CALL proc_3();
  CALL proc_4();
  CALL proc_5();
END ;;
 
CREATE PROCEDURE proc_2()
  CALL proc_6();
;;
 
CREATE PROCEDURE proc_3()
BEGIN
  CALL empty_2();
  CALL empty_3();
END ;;
 
CREATE PROCEDURE proc_4()
  CALL proc_7();
;;
 
CREATE PROCEDURE proc_5()
  CALL proc_select();
;;
 
CREATE PROCEDURE proc_6()
BEGIN
  CALL empty_4();
  CALL empty_5();
  CALL empty_6();
  CALL empty_7();
  CALL proc_8();
END ;;
 
CREATE PROCEDURE proc_7()
  CALL proc_9('foo');
;;
 
CREATE PROCEDURE proc_8()
  CALL proc_10();
;;
 
CREATE PROCEDURE proc_9(IN opt VARCHAR(40))
  IF LEFT(opt,1) <> '_' THEN
    CALL proc_11();
  END IF;
;;
 
CREATE PROCEDURE proc_10()
  CALL proc_12();
;;
 
CREATE PROCEDURE proc_11()
BEGIN
  CALL empty_8();
  CALL empty_9();
  CALL empty_10();
  CALL proc_13();
END ;;
 
CREATE PROCEDURE proc_12()
BEGIN
  CALL empty_11();
  CALL empty_12();
  CALL empty_13();
END ;;
 
CREATE PROCEDURE proc_13()
BEGIN
  CALL proc_9('_bar');
  CALL empty_14();
END ;;
 
CREATE PROCEDURE empty_1() BEGIN END ;;
CREATE PROCEDURE empty_2() BEGIN END ;;
CREATE PROCEDURE empty_3() BEGIN END ;;
CREATE PROCEDURE empty_4() BEGIN END ;;
CREATE PROCEDURE empty_5() BEGIN END ;;
CREATE PROCEDURE empty_6() BEGIN END ;;
CREATE PROCEDURE empty_7() BEGIN END ;;
CREATE PROCEDURE empty_8() BEGIN END ;;
CREATE PROCEDURE empty_9() BEGIN END ;;
CREATE PROCEDURE empty_10() BEGIN END ;;
CREATE PROCEDURE empty_11() BEGIN END ;;
CREATE PROCEDURE empty_12() BEGIN END ;;
CREATE PROCEDURE empty_13() BEGIN END ;;
CREATE PROCEDURE empty_14() BEGIN END ;;
 
CREATE PROCEDURE proc_select()
  SELECT * FROM t1 WHERE NOT EXISTS ( SELECT * FROM t2)
;;
 
--delimiter ;
 
CALL proc_0();
 
 
# Cleanup
DROP PROCEDURE empty_1;
DROP PROCEDURE empty_2;
DROP PROCEDURE empty_3;
DROP PROCEDURE empty_4;
DROP PROCEDURE empty_5;
DROP PROCEDURE empty_6;
DROP PROCEDURE empty_7;
DROP PROCEDURE empty_8;
DROP PROCEDURE empty_9;
DROP PROCEDURE empty_10;
DROP PROCEDURE empty_11;
DROP PROCEDURE empty_12;
DROP PROCEDURE empty_13;
DROP PROCEDURE empty_14;
DROP PROCEDURE proc_0;
DROP PROCEDURE proc_1;
DROP PROCEDURE proc_2;
DROP PROCEDURE proc_3;
DROP PROCEDURE proc_4;
DROP PROCEDURE proc_5;
DROP PROCEDURE proc_6;
DROP PROCEDURE proc_7;
DROP PROCEDURE proc_8;
DROP PROCEDURE proc_9;
DROP PROCEDURE proc_10;
DROP PROCEDURE proc_11;
DROP PROCEDURE proc_12;
DROP PROCEDURE proc_13;
DROP PROCEDURE proc_select;
DROP TABLE t1, t2;

Please use it for debugging (and for the test suite if you don't come up with anything better), but after fixing please also make sure to verify the fix on the original full dataset.
Additionally, please check the test case provided in MDEV-15037.

10.1 db3bdca7c2982ae

#3  <signal handler called>
#4  0x0000564484ecd1e2 in Explain_query::get_select (this=0x7f4c2ff1fb98, select_id=536870913) at /data/src/10.1/sql/sql_explain.cc:85
#5  0x0000564484ecd3f1 in Explain_query::add_node (this=0x7f4c2ff1fb98, node=0x7f4c2ff1fd08) at /data/src/10.1/sql/sql_explain.cc:121
#6  0x0000564484dde996 in JOIN::save_explain_data_intern (this=0x7f4c2ff1e7b8, output=0x7f4c2ff1fb98, need_tmp_table=false, need_order=false, distinct=false, message=0x5644856bfb50 "no matching row in const table") at /data/src/10.1/sql/sql_select.cc:24615
#7  0x0000564484da6316 in JOIN::save_explain_data (this=0x7f4c2ff1e7b8, output=0x7f4c2ff1fb98, can_overwrite=false, need_tmp_table=false, need_order=false, distinct=false) at /data/src/10.1/sql/sql_select.cc:2481
#8  0x0000564484da140f in JOIN::optimize (this=0x7f4c2ff1e7b8) at /data/src/10.1/sql/sql_select.cc:1053
#9  0x0000564484d57fb3 in st_select_lex::optimize_unflattened_subqueries (this=0x7f4c2ff15850, const_only=true) at /data/src/10.1/sql/sql_lex.cc:3792
#10 0x0000564484ef2ca3 in JOIN::optimize_constant_subqueries (this=0x7f4c2ff1e220) at /data/src/10.1/sql/opt_subselect.cc:5236
#11 0x0000564484da172a in JOIN::optimize_inner (this=0x7f4c2ff1e220) at /data/src/10.1/sql/sql_select.cc:1138
#12 0x0000564484da131a in JOIN::optimize (this=0x7f4c2ff1e220) at /data/src/10.1/sql/sql_select.cc:1045
#13 0x0000564484da99b2 in mysql_select (thd=0x7f4c318d5070, rref_pointer_array=0x7f4c2ff15af8, tables=0x7f4c2ff03eb0, wild_num=1, fields=..., conds=0x7f4c2ff16d40, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147749632, result=0x7f4c2ff1e200, unit=0x7f4c2ff15150, select_lex=0x7f4c2ff15850) at /data/src/10.1/sql/sql_select.cc:3435
#14 0x0000564484d9f44e in handle_select (thd=0x7f4c318d5070, lex=0x7f4c2ff15088, result=0x7f4c2ff1e200, setup_tables_done_option=0) at /data/src/10.1/sql/sql_select.cc:384
#15 0x0000564484d6f4e8 in execute_sqlcom_select (thd=0x7f4c318d5070, all_tables=0x7f4c2ff03eb0) at /data/src/10.1/sql/sql_parse.cc:5926
#16 0x0000564484d659be in mysql_execute_command (thd=0x7f4c318d5070) at /data/src/10.1/sql/sql_parse.cc:2976
#17 0x00005644850f634e in sp_instr_stmt::exec_core (this=0x7f4c2ff16e00, thd=0x7f4c318d5070, nextp=0x7f4c3aa9c794) at /data/src/10.1/sql/sp_head.cc:3241
:3007
#19 0x00005644850f6014 in sp_instr_stmt::execute (this=0x7f4c2ff16e00, thd=0x7f4c318d5070, nextp=0x7f4c3aa9c794) at /data/src/10.1/sql/sp_head.cc:3157
#20 0x00005644850f185f in sp_head::execute (this=0x7f4c2ff03088, thd=0x7f4c318d5070, merge_da_on_success=true) at /data/src/10.1/sql/sp_head.cc:1328
#21 0x00005644850f3720 in sp_head::execute_procedure (this=0x7f4c2ff03088, thd=0x7f4c318d5070, args=0x7f4c2febddd8) at /data/src/10.1/sql/sp_head.cc:2125
#22 0x0000564484d6401b in do_execute_sp (thd=0x7f4c318d5070, sp=0x7f4c2ff03088) at /data/src/10.1/sql/sql_parse.cc:2407
#23 0x0000564484d6d1a3 in mysql_execute_command (thd=0x7f4c318d5070) at /data/src/10.1/sql/sql_parse.cc:5282
#24 0x00005644850f634e in sp_instr_stmt::exec_core (this=0x7f4c2feb1ec0, thd=0x7f4c318d5070, nextp=0x7f4c3aa9d714) at /data/src/10.1/sql/sp_head.cc:3241
#25 0x00005644850f5a40 in sp_lex_keeper::reset_lex_and_exec_core (this=0x7f4c2feb1f00, thd=0x7f4c318d5070, nextp=0x7f4c3aa9d714, open_tables=false, instr=0x7f4c2feb1ec0) at /data/src/10.1/sql/sp_head.cc:3007
#26 0x00005644850f6014 in sp_instr_stmt::execute (this=0x7f4c2feb1ec0, thd=0x7f4c318d5070, nextp=0x7f4c3aa9d714) at /data/src/10.1/sql/sp_head.cc:3157
#27 0x00005644850f185f in sp_head::execute (this=0x7f4c2feb1088, thd=0x7f4c318d5070, merge_da_on_success=true) at /data/src/10.1/sql/sp_head.cc:1328
#28 0x00005644850f3720 in sp_head::execute_procedure (this=0x7f4c2feb1088, thd=0x7f4c318d5070, args=0x7f4c2fe22dd8) at /data/src/10.1/sql/sp_head.cc:2125
#29 0x0000564484d6401b in do_execute_sp (thd=0x7f4c318d5070, sp=0x7f4c2feb1088) at /data/src/10.1/sql/sql_parse.cc:2407
#30 0x0000564484d6d1a3 in mysql_execute_command (thd=0x7f4c318d5070) at /data/src/10.1/sql/sql_parse.cc:5282
#31 0x00005644850f634e in sp_instr_stmt::exec_core (this=0x7f4c2fd5d960, thd=0x7f4c318d5070, nextp=0x7f4c3aa9e694) at /data/src/10.1/sql/sp_head.cc:3241
#32 0x00005644850f5a40 in sp_lex_keeper::reset_lex_and_exec_core (this=0x7f4c2fd5d9a0, thd=0x7f4c318d5070, nextp=0x7f4c3aa9e694, open_tables=false, instr=0x7f4c2fd5d960) at /data/src/10.1/sql/sp_head.cc:3007
#33 0x00005644850f6014 in sp_instr_stmt::execute (this=0x7f4c2fd5d960, thd=0x7f4c318d5070, nextp=0x7f4c3aa9e694) at /data/src/10.1/sql/sp_head.cc:3157
#34 0x00005644850f185f in sp_head::execute (this=0x7f4c2fd5c088, thd=0x7f4c318d5070, merge_da_on_success=true) at /data/src/10.1/sql/sp_head.cc:1328
#35 0x00005644850f3720 in sp_head::execute_procedure (this=0x7f4c2fd5c088, thd=0x7f4c318d5070, args=0x7f4c2fdaadd8) at /data/src/10.1/sql/sp_head.cc:2125
#36 0x0000564484d6401b in do_execute_sp (thd=0x7f4c318d5070, sp=0x7f4c2fd5c088) at /data/src/10.1/sql/sql_parse.cc:2407
#37 0x0000564484d6d1a3 in mysql_execute_command (thd=0x7f4c318d5070) at /data/src/10.1/sql/sql_parse.cc:5282
#38 0x00005644850f634e in sp_instr_stmt::exec_core (this=0x7f4c2fdcf320, thd=0x7f4c318d5070, nextp=0x7f4c3aa9f614) at /data/src/10.1/sql/sp_head.cc:3241
#39 0x00005644850f5a40 in sp_lex_keeper::reset_lex_and_exec_core (this=0x7f4c2fdcf360, thd=0x7f4c318d5070, nextp=0x7f4c3aa9f614, open_tables=false, instr=0x7f4c2fdcf320) at /data/src/10.1/sql/sp_head.cc:3007
#40 0x00005644850f6014 in sp_instr_stmt::execute (this=0x7f4c2fdcf320, thd=0x7f4c318d5070, nextp=0x7f4c3aa9f614) at /data/src/10.1/sql/sp_head.cc:3157
#41 0x00005644850f185f in sp_head::execute (this=0x7f4c2fdce088, thd=0x7f4c318d5070, merge_da_on_success=true) at /data/src/10.1/sql/sp_head.cc:1328
#42 0x00005644850f3720 in sp_head::execute_procedure (this=0x7f4c2fdce088, thd=0x7f4c318d5070, args=0x7f4c318d97f8) at /data/src/10.1/sql/sp_head.cc:2125
#43 0x0000564484d6401b in do_execute_sp (thd=0x7f4c318d5070, sp=0x7f4c2fdce088) at /data/src/10.1/sql/sql_parse.cc:2407
#44 0x0000564484d6d1a3 in mysql_execute_command (thd=0x7f4c318d5070) at /data/src/10.1/sql/sql_parse.cc:5282
#45 0x0000564484d72bf5 in mysql_parse (thd=0x7f4c318d5070, rawbuf=0x7f4c2fc43088 "CALL proc_0()", length=13, parser_state=0x7f4c3aaa05e0) at /data/src/10.1/sql/sql_parse.cc:7347
#46 0x0000564484d61aba in dispatch_command (command=COM_QUERY, thd=0x7f4c318d5070, packet=0x7f4c343f9071 "CALL proc_0()", packet_length=13) at /data/src/10.1/sql/sql_parse.cc:1477
#47 0x0000564484d6083f in do_command (thd=0x7f4c318d5070) at /data/src/10.1/sql/sql_parse.cc:1106
#48 0x0000564484e99325 in do_handle_one_connection (thd_arg=0x7f4c318d5070) at /data/src/10.1/sql/sql_connect.cc:1330
#49 0x0000564484e99089 in handle_one_connection (arg=0x7f4c318d5070) at /data/src/10.1/sql/sql_connect.cc:1242
#50 0x000056448525375a in pfs_spawn_thread (arg=0x7f4c38039ef0) at /data/src/10.1/storage/perfschema/pfs.cc:1861
#51 0x00007f4c3a720494 in start_thread (arg=0x7f4c3aaa1b00) at pthread_create.c:333
#52 0x00007f4c38ad993f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Reproducible on 10.1-10.3, recently appeared in 10.1 with this commit:

commit 0f43279cc472d82859960f5057e0ec98428459ee
Author: Oleksandr Byelkin <sanja@mariadb.com>
Date:   Mon Nov 6 14:35:58 2017 +0100
 
    MDEV-13936: Server crashes in Time_and_counter_tracker::incr_loops
    
    Repeat reworked solution of procedures for all posible Sp (functions &
    triggers).

Comment by Nicholas Denning [ 2018-01-08 ]

By the way this is bug has now arrived in 10.1.30 not 10.1.29 as I previously said in this comment..

Comment by Jérôme Brauge [ 2018-01-08 ]

Hi,
I've also report this issue to the MariaDB Developpers mail list on 2017-12-12, but I haven't be able to produce a small test case.
Thank you very much Nicholas !

Comment by Nicholas Denning [ 2018-01-09 ]

My test case wasnt small!!!! Elena had by all accounts a huge amount of work to unpick what i sent so rather than me, is the one to be congratulated/thanked.

Comment by Nicholas Denning [ 2018-01-09 ]

Elena and Oleksandr, when you are ready if you want to send me a candidate release for testing I can do that very quickly, 10.1 or 10.2 and probably 10.3 as well. I have a series of VMs and can just snapshot and install on top of those. Nick

Comment by Jérôme Brauge [ 2018-01-10 ]

Hi,
I confirm, 10.3.4 is also concerned.

Comment by Elena Stepanova [ 2018-01-10 ]

Given the number of similar complaints and the fact that it's a recent regression, I'm raising it to a Blocker for the next 10.1 (and consequently 10.2) release.

Comment by Martin Vit [ 2018-01-10 ]

We have the same issue always reproducible on all 10.1.X 10.2.X and 10.3.X versions. We had to roll back 10.2.9

Comment by Uwe Beierlein [ 2018-01-17 ]

We have the same issue with version 10.1.30 on
"CentOS Linux release 7.4.1708 (Core) - 3.10.0-693.11.6.el7.x86_64" (fully patched 2018-01-13).

Comment by Lasana Murray [ 2018-01-17 ]

Because of this I had to downgrade (or upgrade) to 10.3.0 which throws strange errors at random but the SPs work.

Sadly we probably can't use this in production so technically we do not have a stable MariaDB target we can deploy on since we are making heavy use of JSON.

Comment by Michael Widenius [ 2018-01-22 ]

The crash happens while MariaDB is trying to store an explain of the query:

sql/sql_explain.cc:86(Explain_query::get_select(unsigned int))[0x55cba00eddf8]
/usr/sbin/mysqld(_ZN4JOIN14optimize_innerEv+0x1cfb)[0x55cba014a5bb]
sql/sql_select.cc:24874(JOIN::save_explain_data_intern(Explain_query*, bool, bool, bool, char const*))[0x55cba014b45f]

So it's not really related to stored procedures, but to the particulate query used:

INSERT INTO ENTITIES_TO_COPY ( ENTITY_UID, ENTITY_TYPE, ENTITY_SUB_TYPE, COPY_INSTANCE_UID ) SELECT ECD.DOCUMENT_VERSION_UID, 'document', 'event', NAME_CONST('V_COPY_INSTANCE_UID',59831) FROM EVENT_COLLECTION_DOCUMENTS ECD JOIN EVENT_DETAILS ED ON ED.EVENT_COLLECTION_UID = ECD.EVENT_COLLECTION_UID WHERE ED.EVENT_UID = NAME_CONST('V_EVENT_UID',34116) AND ED.ORGANISATION_UID = NAME_CONST('V_SOURCE_ORG_UID',26) AND NOT EXISTS ( SELECT * FROM ENTITIES_TO_COPY ETC WHERE ETC.ENTITY_UID = ECD.DOCUMENT_VERSION_UID AND ETC.COPY_INSTANCE_UID = NAME_CONST('V_COPY_INSTANCE_UID',59831) )

Comment by Oleksandr Byelkin [ 2018-01-23 ]

So if it is not SP but EXPLAIN, so I reassign it.

Comment by Sergei Petrunia [ 2018-01-24 ]

Debugging Elena's testcase. Interesting frames in the stack trace of the crash:

#4: JOIN::optimize(), select_lex->select_number=536870913= 0x20000001 ????
#8: JOIN::optimize(), select_lex->select_number=1
#13: sp_instr_stmt::exec_core: SELECT * FROM t1 WHERE NOT EXISTS ( SELECT * FROM t2)
#16: sp_head::execute, m_name=proc_select
#23: sp_head::execute, m_name=proc_5
#30: sp_head::execute, m_name=proc_1
#37: sp_head::execute, m_name=proc_0

The incorrect value of select_lex->select_number comes from the parser:

  
 #0  mysql_new_select (lex=0x7fff800ef6e8, move_down=true) at /home/psergey/dev-git/10.2-r5/sql/sql_parse.cc:7553
  #1  0x0000555555d20acb in MYSQLparse (thd=0x7fff80000b00) at /home/psergey/dev-git/10.2-r5/sql/sql_yacc.yy:16367
  #2  0x0000555555b14c21 in parse_sql (thd=0x7fff80000b00, parser_state=0x7ffff40a7290, creation_ctx=0x7fff800df500, do_pfs_digest=false) at /home/psergey/dev-git/10.2-r5/sql/sql_parse.cc:9919
  #3  0x0000555555ee771f in sp_compile (thd=0x7fff80000b00, defstr=0x7ffff40a74c0, sql_mode=1411383296, creation_ctx=0x7fff800df500) at /home/psergey/dev-git/10.2-r5/sql/sp.cc:754
  #4  0x0000555555ee7bab in db_load_routine (thd=0x7fff80000b00, type=TYPE_ENUM_PROCEDURE, name=0x7ffff40a9270, sphp=0x7ffff40a9498, sql_mode=1411383296, params=0x555556691f38 "", returns=0x555556691f38 "", body=0x7fff800df468 "SELECT * FROM t1 WHERE NOT EXISTS ( SELECT * FROM t2)", chistics=..., definer_user_name=0x7ffff40a8d20, definer_host_name=0x7ffff40a8d30, created=20180124135908, modified=20180124135908, creation_ctx=0x7fff800df500) at /home/psergey/dev-git/10.2-r5/sql/sp.cc:882
  #5  0x0000555555ee7486 in db_find_routine (thd=0x7fff80000b00, type=TYPE_ENUM_PROCEDURE, name=0x7ffff40a9270, sphp=0x7ffff40a9498) at /home/psergey/dev-git/10.2-r5/sql/sp.cc:670
  #6  0x0000555555eeb8ba in sp_cache_routine (thd=0x7fff80000b00, type=TYPE_ENUM_PROCEDURE, name=0x7ffff40a9270, lookup_only=false, sp=0x7ffff40a9498) at /home/psergey/dev-git/10.2-r5/sql/sp.cc:2153
...

There, we have

  select_lex->select_number= ++thd->select_number;

Comment by Sergei Petrunia [ 2018-01-24 ]

The large value of thd->select_number is not garbage. It comes from sp_head::execute in sp_head.cc:

  /*
     Normally the counter is not reset between parsing and first execution,
     but it is possible in case of error to have parsing on one CALL and
     first execution (where VIEW will be parsed and added). So we store the
     counter after parsing and restore it before execution just to avoid
     repeating SELECT numbers.
 
     Other problem is that it can be more SELECTs parsed in case of fixing
     error causes previous interruption of the SP. So it is save not just
     assign old value but add it.
   */
  thd->select_number+= m_select_number;

This was introduced in https://github.com/MariaDB/server/commit/0f43279cc472d82859960f5057e0ec98428459ee .

EXPLAIN data structure code expects select_number values to be continuous and small.

Comment by Oleksandr Byelkin [ 2018-01-26 ]

revision-id: 26dcadf8af745edebbeff44d831380f34d54c3e3 (mariadb-10.1.30-26-g26dcadf8af7)
parent(s): 524221e7a34d42214e82dd868348b453f2ef1591
author: Oleksandr Byelkin
committer: Oleksandr Byelkin
timestamp: 2018-01-26 17:49:36 +0100
message:

MDEV-14857: problem with 10.2.11 server crashing when executing stored procedure

Counter for select numbering made stored with the statement (before was global)
So now it does have always accurate value which does not depend on
interruption of statement prepare by errors like lack of table in
a view definition.

Comment by Oleksandr Byelkin [ 2018-01-27 ]

revision-id: a0ea3a5822486fa1702d6615f1be30684c3d412f (mariadb-10.1.30-26-ga0ea3a58224)
parent(s): 524221e7a34d42214e82dd868348b453f2ef1591
author: Oleksandr Byelkin
committer: Oleksandr Byelkin
timestamp: 2018-01-27 11:38:38 +0100
message:

MDEV-14857: problem with 10.2.11 server crashing when executing stored procedure

Counter for select numbering made stored with the statement (before was global)
So now it does have always accurate value which does not depend on
interruption of statement prepare by errors like lack of table in
a view definition.

Comment by Sergei Petrunia [ 2018-01-31 ]

Review comments https://lists.launchpad.net/maria-developers/msg11098.html

Comment by Nicholas Denning [ 2018-02-01 ]

Very interesting to see the development of this fix.

We have been having persistent problems over the last year when we referenced VIEWS in queries within an EXISTS() statement in stored procedures.

Hopefully this fix will get to the bottom of this fundamental issue.

Many thanks.

Generated at Thu Feb 08 08:16:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.