[MDEV-26009] Server crash when calling twice procedure using FOR-loop Created: 2021-06-24  Updated: 2022-07-19  Resolved: 2022-03-21

Status: Closed
Project: MariaDB Server
Component/s: Stored routines
Affects Version/s: 10.5.11, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.3.35, 10.4.25, 10.5.16, 10.6.8, 10.7.4, 10.8.3

Type: Bug Priority: Blocker
Reporter: Paulus Limma Assignee: Oleksandr Byelkin
Resolution: Fixed Votes: 0
Labels: crash, procedure
Environment:

Docker: mariadb:10.5.11
AWS RDS: mariadb 10.5.8

Client: ConnectorJ 2.6.0


Attachments: File server_crash.sql     File table_does_not_exists.sql    
Issue Links:
Duplicate

 Description   

Steps to reproduce:

1. Create a function, which selects data from any table. It is not enough to just return some constant value.

2. Create a view, which uses the function created in step 1.

3. Create a procedure, which runs for-loop. This for loop must query view created on step 2. However, it should not use any columns returned from the query, but a constant, eg. 1, for each row.

4. Call procedure twice from the same connection. First run success normally. The second one crashes the server without any explanation.

Minimal Reproducible Example is on attachment server_crash.sql.

All these details seem to be required to cause the server to crash. Interestingly enough, dropping some of them causes a different issue. If I select any value on cursor definition or skip the view and call the function directly, the first run results an error Table 'test_table' doesn't exist. Example of this is on file table_does_not_exist.sql.



 Comments   
Comment by Paulus Limma [ 2021-06-24 ]

Also reproducible on 10.6.2 (Docker).

Comment by Alice Sherepa [ 2021-06-24 ]

Thank you! I reproduced as described on 10.3-10.6:

10.3 1deb630484caf572c9

Version: '10.3.31-MariaDB-debug-log'  
210624 13:52:15 [ERROR] mysqld got signal 11 ;
 
Server version: 10.3.31-MariaDB-debug-log
 
sql/signal_handler.cc:221(handle_fatal_signal)[0x557597af10f3]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f63284ee730]
sql/sql_acl.cc:7610(check_grant(THD*, unsigned long, TABLE_LIST*, bool, unsigned int, bool))[0x55759717fb4f]
sql/sql_parse.cc:6981(check_table_access(THD*, unsigned long, TABLE_LIST*, bool, unsigned int, bool))[0x557597360385]
sql/sql_parse.cc:3863(mysql_execute_command(THD*))[0x55759734aa22]
sql/sql_cursor.cc:150(mysql_open_cursor(THD*, select_result*, Server_side_cursor**))[0x55759726d9f2]
sql/sp_rcontext.cc:745(sp_cursor::open(THD*))[0x55759714032a]
sql/sp_head.cc:4391(sp_instr_copen::exec_core(THD*, unsigned int*))[0x55759711ff1e]
sql/sp_head.cc:3433(sp_lex_keeper::reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*))[0x557597119459]
sql/sp_head.cc:3528(sp_lex_keeper::cursor_reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*))[0x557597119c53]
sql/sp_head.cc:4380(sp_instr_copen::execute(THD*, unsigned int*))[0x55759711fd25]
sql/sp_head.cc:1377(sp_head::execute(THD*, bool))[0x55759710b7ab]
sql/sp_head.cc:2404(sp_head::execute_procedure(THD*, List<Item>*))[0x557597111c4f]
sql/sql_parse.cc:3019(do_execute_sp(THD*, sp_head*))[0x5575973443c8]
sql/sql_parse.cc:3259(Sql_cmd_call::execute(THD*))[0x557597346023]
sql/sql_parse.cc:6075(mysql_execute_command(THD*))[0x55759735a55d]
sql/sql_parse.cc:7870(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x557597366bab]
sql/sql_parse.cc:1855(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x55759733d342]
sql/sql_parse.cc:1398(do_command(THD*))[0x557597339a89]
sql/sql_connect.cc:1403(do_handle_one_connection(CONNECT*))[0x55759772e270]
sql/sql_connect.cc:1309(handle_one_connection)[0x55759772db28]
perfschema/pfs.cc:1871(pfs_spawn_thread)[0x557598eab823]
nptl/pthread_create.c:487(start_thread)[0x7f63284e3fa3]
x86_64/clone.S:97(clone)[0x7f63284144cf]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x62b000000290): call test_proc()

wrapped for mtr test:

CREATE TABLE t1 (id int, name varchar(24));
INSERT INTO t1 (id, name) VALUES (1, 'x'),(2, 'y'),(3, 'z');
 
create function get_name(_id int) returns varchar(24) 
	return (select name from t1 where id = _id);
 
create view v1 as select get_name(id) from t1;
 
delimiter $$;
create  procedure test_proc()
begin
    declare _cur cursor for select 1 from v1;   
    for row in _cur do select 1; end for;
end$$
delimiter ;$$
 
call test_proc();
call test_proc();

the second case- without view - returns error, on the 2nd execution - crash:

 
CREATE TABLE t1 ( id int, name varchar(24));
INSERT INTO t1 values (1, 'x'), (2, 'y'), (3, 'z');
 
create function get_name(_id int) returns varchar(24)
    return (select name from t1 where id = _id);
 
--delimiter ^^
 
create  procedure test_proc()
begin
    declare _cur cursor for select get_name(id) from t1;
	for row in _cur do select 1; end for;
end;
^^
 
--delimiter ;
--error 1146
call test_proc();
call test_proc();

210624 16:13:14 [ERROR] mysqld got signal 11 ;
Server version: 10.3.30-MariaDB-debug-log
 
sql/signal_handler.cc:221(handle_fatal_signal)[0x561a0674e732]
sigaction.c:0(__restore_rt)[0x7fe2b73a43c0]
sql/sql_acl.cc:7610(check_grant(THD*, unsigned long, TABLE_LIST*, bool, unsigned int, bool))[0x561a05e21211]
sql/sql_parse.cc:6981(check_table_access(THD*, unsigned long, TABLE_LIST*, bool, unsigned int, bool))[0x561a05ff27fc]
sql/sql_parse.cc:3863(mysql_execute_command(THD*))[0x561a05fdd151]
sql/sql_cursor.cc:150(mysql_open_cursor(THD*, select_result*, Server_side_cursor**))[0x561a05f07796]
sql/sp_rcontext.cc:745(sp_cursor::open(THD*))[0x561a05de6502]
sql/sp_head.cc:4391(sp_instr_copen::exec_core(THD*, unsigned int*))[0x561a05dc5cec]
sql/sp_head.cc:3433(sp_lex_keeper::reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*))[0x561a05dbf545]
sql/sp_head.cc:3528(sp_lex_keeper::cursor_reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*))[0x561a05dbfd71]
sql/sp_head.cc:4380(sp_instr_copen::execute(THD*, unsigned int*))[0x561a05dc5b49]
sql/sp_head.cc:1377(sp_head::execute(THD*, bool))[0x561a05db1d86]
sql/sp_head.cc:2404(sp_head::execute_procedure(THD*, List<Item>*))[0x561a05db7d73]
sql/sql_parse.cc:3019(do_execute_sp(THD*, sp_head*))[0x561a05fd6c5c]
sql/sql_parse.cc:3259(Sql_cmd_call::execute(THD*))[0x561a05fd8838]
sql/sql_parse.cc:6075(mysql_execute_command(THD*))[0x561a05fecbe7]
sql/sql_parse.cc:7870(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x561a05ff8f12]
sql/sql_parse.cc:1855(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x561a05fcfde3]
sql/sql_parse.cc:1398(do_command(THD*))[0x561a05fcc926]
sql/sql_connect.cc:1403(do_handle_one_connection(CONNECT*))[0x561a0639a8eb]
sql/sql_connect.cc:1309(handle_one_connection)[0x561a0639a1a5]
perfschema/pfs.cc:1871(pfs_spawn_thread)[0x561a079cdf3d]
nptl/pthread_create.c:478(start_thread)[0x7fe2b7398609]
x86_64/clone.S:97(__GI___clone)[0x7fe2b72bf293]
 
Query (0x62b000000290): call test_proc()

Comment by Oleksandr Byelkin [ 2021-09-20 ]

The problem is that we try to iterarte tables till last "own" table (thd->lex->first_not_own_table()) which was created in sp_head::add_used_tables_to_table_list but somehow missed on the second execution in the list of all tables (removed? replaced?).

Comment by Oleksandr Byelkin [ 2022-03-14 ]

it looks like open cursor do not do correct prelocking to make prelock slots (the error in first run and crash in the second)

Comment by Oleksandr Byelkin [ 2022-03-15 ]

sp_instr_cursor_copy_struct removes prelocking tables from list and then sp_instr_copen trys to use prelocking tables.

Comment by Oleksandr Byelkin [ 2022-03-16 ]

the difference of second time is processing procedure from the cache, and it miss prelocked tables to restore

Comment by Oleksandr Byelkin [ 2022-03-17 ]

Usually when we execute instruction, prelock-tables added to the tail of table list and then removed (with savimg its address to add them back and remove on re-execution)

the specific of the cursor is that it use 2 instructions instead of 1:

  1. sp_instr_cursor_copy_struct
  2. sp_instr_copen

sp_instr_cursor_copy_struct uses temporary cursor to fill some data, it add prelocking tables and ten remove them which should roll back everything, but the last table add there left in Query_tables_list structure.

When sp_instr_copen add prelock-tables to the list it uses adress of the last table in Query_tables_list structure changed by sp_instr_cursor_copy_struct, and so add the tables in the wrong place. and real table list is left without added tables which lead to all problem we have including crash on attempt to access NULL pointer on unexisting "prelock-tables-tail".

Comment by Oleksandr Byelkin [ 2022-03-17 ]

This fixes suite without view, but test suite with a view crashes by other cause (maybe other bug or this patch should be impruved):

diff --git a/sql/sp_head.cc b/sql/sp_head.cc
index 57ab31d9edf..7f9e15d8dc9 100644
--- a/sql/sp_head.cc
+++ b/sql/sp_head.cc
@@ -4584,6 +4584,7 @@ sp_instr_cursor_copy_struct::exec_core(THD *thd, uint *nextp)
   if (!row->arguments())
   {
     sp_cursor tmp(thd, &m_lex_keeper, true);
+    TABLE_LIST **query_tables_last_save= m_lex_keeper.get_query_tables_last();
     // Open the cursor without copying data
     if (!(ret= tmp.open(thd)))
     {
@@ -4605,6 +4606,7 @@ sp_instr_cursor_copy_struct::exec_core(THD *thd, uint *nextp)
       thd->restore_active_arena(thd->spcont->callers_arena, &current_arena);
       tmp.close(thd);
     }
+    m_lex_keeper.set_query_tables_last(query_tables_last_save);
   }
   *nextp= m_ip + 1;
   DBUG_RETURN(ret);
diff --git a/sql/sp_head.h b/sql/sp_head.h
index e1cfbb484ad..1fa9232b0d9 100644
--- a/sql/sp_head.h
+++ b/sql/sp_head.h
@@ -1214,6 +1214,17 @@ class sp_lex_keeper
     m_lex->safe_to_cache_query= 0;
   }
 
+  TABLE_LIST **get_query_tables_last()
+  {
+    return m_lex->query_tables_last;
+  }
+
+  void set_query_tables_last(TABLE_LIST ** t)
+  {
+    m_lex->query_tables_last= t;
+  }
+
+
 private:
 
   LEX *m_lex;

Comment by Elena Stepanova [ 2022-03-17 ]

With the patch the test case with views

  • on debug build: fails a debug assertion on the first execution of the SP, which didn't happen before,
  • on non-debug build: fails same way as it did before, crashing in check_grant.

Also, it's not exactly specific to a view – as often happens, the view in that test case can be replaced with a subquery with the same effect, as in

declare _cur cursor for select 1 from (select get_name(id) from t1) sq;

So the patch may indeed need further improvement.

Comment by Oleksandr Byelkin [ 2022-03-18 ]

JFYI rewritten without FOR it still cause error where should not (but does not crash)

CREATE TABLE t1 ( id int, name varchar(24));
INSERT INTO t1 values (1, 'x'), (2, 'y'), (3, 'z');
 
create function get_name(_id int) returns varchar(24)
    return (select name from t1 where id = _id);
 
select get_name(id) from t1;
 
delimiter ^^;
 
create  procedure test_proc()
begin
    DECLARE done INT DEFAULT FALSE;
    DECLARE rec ROW TYPE OF _cur;
    declare _cur cursor for select get_name(id) from t1;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
    open _cur;
    read_loop: LOOP
      fetch _cur into rec;
      IF done THEN
        LEAVE read_loop;
      END IF;
      select 1;
    end LOOP;
end;
^^
delimiter ;^^
 
--error 1146
call test_proc();
--error 1146
call test_proc();

Comment by Oleksandr Byelkin [ 2022-03-18 ]

My first patch does not work with views because view also add tables. So I come up with one lined fix which work with views.

Comment by Oleksandr Byelkin [ 2022-03-18 ]

commit d8e915a88fcdebd0b42f10b2f6d2165296056dd1 (HEAD -> bb-10.3-MDEV-26009, origin/bb-10.3-MDEV-26009)
Author: Oleksandr Byelkin <sanja@mariadb.com>
Date:   Fri Mar 18 11:13:09 2022 +0100
 
    MDEV-26009 Server crash when calling twice procedure using FOR-loop
    
    The problem was that instructions sp_instr_cursor_copy_struct and
    sp_instr_copen uses the same lex, adding and removing "tail" of
    prelocked tables and forgetting that tail of all tables is kept in
    LEX::query_tables_last. If the LEX used only by one instruction
    or the query do not have prelocked tables it is not important.
    But to work correctly in all cases LEX::query_tables_last should
    be reset to make new tables added in the correct list (after last
    table in the LEX instead after last table of the prelocking "tail"
    which was cut).

Generated at Thu Feb 08 09:42:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.