[MDEV-24910] Crash with SELECT that uses table value constructor as a subselect Created: 2021-02-18  Updated: 2021-03-02  Resolved: 2021-02-25

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.3.27
Fix Version/s: 10.3.29, 10.4.19, 10.5.10

Type: Bug Priority: Major
Reporter: Igor Babaev Assignee: Igor Babaev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by MDEV-24918 Server crash after the query with tab... Closed

 Description   

The following test case causes a crash of the server in 10.3

create table t1 (a int) engine=myisam;
insert into t1 values (3), (7), (1);
create table t2 (b int) engine=myisam;
insert into t2 values (1), (2);
 
select (values ((select * from t1 where a > 10))) from t2; 
 
drop table t1;

Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f8401ffce70 thread_stack 0x49000
mysys/stacktrace.c:174(my_print_stacktrace)[0x5564f638ad87]
sql/signal_handler.cc:209(handle_fatal_signal)[0x5564f5b11f67]
/lib64/libpthread.so.0(+0xf890)[0x7f8409d66890]
sql/sql_lex.h:1196(st_select_lex::next_select())[0x5564f579f9ee]
sql/sql_lex.h:1427(st_select_lex_unit::is_unit_op())[0x5564f579fa86]
sql/sql_select.cc:1044(JOIN::prepare(TABLE_LIST*, unsigned int, Item*, unsigned int, st_order*, bool, st_order*, Item*, st_order*, st_select_lex*, st_select_lex_unit*))[0x5564f5831668]
sql/item_subselect.cc:3707(subselect_single_select_engine::prepare(THD*))[0x5564f5bfc15b]
sql/item_subselect.cc:280(Item_subselect::fix_fields(THD*, Item**))[0x5564f5beedc8]
sql/item.h:830(Item::fix_fields_if_needed(THD*, Item**))[0x5564f56e56d0]
sql/item.h:833(Item::fix_fields_if_needed_for_scalar(THD*, Item**))[0x5564f56e56fd]
sql/sql_tvc.cc:62(fix_fields_for_tvc(THD*, List_iterator_fast<List<Item> >&))[0x5564f5a16de6]
sql/sql_tvc.cc:238(table_value_constr::prepare(THD*, st_select_lex*, select_result*, st_select_lex_unit*))[0x5564f5a174d9]
sql/sql_union.cc:1029(st_select_lex_unit::prepare(TABLE_LIST*, select_result*, unsigned long))[0x5564f58ec794]
sql/sql_derived.cc:771(mysql_derived_prepare(THD*, LEX*, TABLE_LIST*))[0x5564f579d6ee]
sql/sql_derived.cc:199(mysql_handle_single_derived(LEX*, TABLE_LIST*, unsigned int))[0x5564f579c271]
sql/table.cc:8362(TABLE_LIST::handle_derived(LEX*, unsigned int))[0x5564f591d152]
sql/sql_lex.h:4004(LEX::handle_list_of_derived(TABLE_LIST*, unsigned int))[0x5564f57b649d]
sql/sql_lex.cc:4164(st_select_lex::handle_derived(LEX*, unsigned int))[0x5564f57c26aa]
sql/sql_select.cc:1049(JOIN::prepare(TABLE_LIST*, unsigned int, Item*, unsigned int, st_order*, bool, st_order*, Item*, st_order*, st_select_lex*, st_select_lex_unit*))[0x5564f58316bb]
sql/item_subselect.cc:3707(subselect_single_select_engine::prepare(THD*))[0x5564f5bfc15b]
sql/item_subselect.cc:280(Item_subselect::fix_fields(THD*, Item**))[0x5564f5beedc8]
sql/item.h:830(Item::fix_fields_if_needed(THD*, Item**))[0x5564f56e56d0]
sql/item.h:833(Item::fix_fields_if_needed_for_scalar(THD*, Item**))[0x5564f56e56fd]
sql/sql_base.cc:7562(setup_fields(THD*, Bounds_checked_array<Item*>, List<Item>&, enum_column_usage, List<Item>*, List<Item>*, bool))[0x5564f575ff8f]
sql/sql_select.cc:1151(JOIN::prepare(TABLE_LIST*, unsigned int, Item*, unsigned int, st_order*, bool, st_order*, Item*, st_order*, st_select_lex*, st_select_lex_unit*))[0x5564f5831ce9]
sql/sql_select.cc:4301(mysql_select(THD*, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x5564f583d574]
sql/sql_select.cc:382(handle_select(THD*, LEX*, select_result*, unsigned long))[0x5564f582eb2b]
sql/sql_parse.cc:6316(execute_sqlcom_select(THD*, TABLE_LIST*))[0x5564f57f5d34]
sql/sql_parse.cc:3847(mysql_execute_command(THD*))[0x5564f57ec7fc]
sql/sql_parse.cc:7840(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x5564f57fa073]
sql/sql_parse.cc:1855(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x5564f57e6b24]
sql/sql_parse.cc:1399(do_command(THD*))[0x5564f57e5416]
sql/sql_connect.cc:1403(do_handle_one_connection(CONNECT*))[0x5564f5961dcc]
sql/sql_connect.cc:1309(handle_one_connection)[0x5564f5961b3b]
perfschema/pfs.cc:1871(pfs_spawn_thread)[0x5564f631becd]
/lib64/libpthread.so.0(+0x80a4)[0x7f8409d5f0a4]
/lib64/libc.so.6(clone+0x6d)[0x7f8407ec204d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f83f0012818): select (values ((select * from t1 where a > 10))) from t2



 Comments   
Comment by Igor Babaev [ 2021-02-18 ]

Most probably the bug affects also 10.4,10.5.

Comment by Igor Babaev [ 2021-02-18 ]

The following query

select (values ((select 2))) from t2;

causes a crash of the server of the same kind: the states of the stack are quite similar

Comment by Igor Babaev [ 2021-02-18 ]

Let's look how the query

select (values ((select 2))) from t2;

is processed.
When fix_fields is called for Item_subselect
(values ((select 2)))
the function Item_subselect::wrap_tvc_into_select() is called to wrap values ((select 2)) into a select:
values ((select 2)) => select * from ( values ((select 2))) tvc_0.
This transformation is done by the helper function wrap_tvc(). At some moment this function calls the method exclude() for the item values ((select 2)). This method removes both selects 'values ((select 2))' and
'select 2' from global select list. The first select is also removed from the select tree and and the call also destroys underlying slave structure. Before the call of the exclude() method select for values ((select 2)) is printed as
<dbug_item_print_buf> "values ((subquery#3))"
and after as
<dbug_item_print_buf> "values ((subquery#NULL))".
Before the call tvc_sl->first_inner_unit() points to the unit for select 2. After the call it becomes 0.
After wrapping is done table_value_constr::prepare is called for tvc_sl->tvc that calls fix_fields_for_tvc(). The latter calls the method fix_fields for each element of the TVC. So it calls Item_subselect::fix_fields for the subselect (select 2). In a general case this subselect may be a union. The function tries to go through the
selects of the corresponding unit and discovers that there is no selects attached to this unit because all slave structure was destroyed by the above mentioned call of the exclude method. Here we get sigenv.

Comment by Igor Babaev [ 2021-02-18 ]

If to substitute the above mentioned call of the exclude method for the call of the exclude_from_tree method that does does not destroy the slave structure the failing queries return correct result sets:

MariaDB [test]> select (values ((select 2))) from t2;
+-----------------------+
| (values ((select 2))) |
+-----------------------+
|                     2 |
|                     2 |
+-----------------------+
 
MariaDB [test]> select (values ((select * from t1 where a > 10))) from t2; 
+--------------------------------------------+
| (values ((select * from t1 where a > 10))) |
+--------------------------------------------+
|                                       NULL |
|                                       NULL |
+--------------------------------------------+

Yet if we look at the output of the EXPLAIN for the query

select (values ((select * from t1 where a = 1))) from t2;

we see that it lacks any lines related to the scan of the table t1

MariaDB [test]> explain extended select (values ((select * from t1 where a = 1))) from t2; 
+------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------+
| id   | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra          |
+------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------+
|    1 | PRIMARY     | t2         | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 |                |
|    4 | SUBQUERY    | <derived2> | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 |                |
|    2 | DERIVED     | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | No tables used |
+------+-------------+------------+------+---------------+------+---------+------+------+----------+----------------+

although the query itself returns the right result set:

MariaDB [test]> select (values ((select * from t1 where a = 1))) from t2; 
+-------------------------------------------+
| (values ((select * from t1 where a = 1))) |
+-------------------------------------------+
|                                         1 |
|                                         1 |
+-------------------------------------------+

Valgrind does not report any problems either for all queries considered above or for explains for them.

Comment by Igor Babaev [ 2021-02-18 ]

Another test case where the query returns the right result set, yet leads a memory leek

create table t1 (a int);
insert into t1 values
 (3), (7), (7), (1), (3), (9), (7), (9), (8), (7), (8);
 
create view v1 as select count(a) as c from t1 group by a;
 
select (values ((select * from t1 where a > 10 and a in (select * from v1))));
 
drop view v1;
drop table t1;

MariaDB [test]> select (values ((select * from t1 where a > 10 and a in (select * from v1))));
+------------------------------------------------------------------------+
| (values ((select * from t1 where a > 10 and a in (select * from v1)))) |
+------------------------------------------------------------------------+
|                                                                   NULL |
+------------------------------------------------------------------------+

Valgrind reports the following for the above above test:

...
==13576== 5,148 bytes in 1 blocks are still reachable in loss record 11 of 12
==13576==    at 0x4C29110: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==13576==    by 0x13D765D: sf_malloc (safemalloc.c:118)
==13576==    by 0x13C4E41: my_malloc (my_malloc.c:101)
==13576==    by 0x13B771C: alloc_root (my_alloc.c:251)
==13576==    by 0x13B7A16: multi_alloc_root (my_alloc.c:325)
==13576==    by 0x8A0824: create_tmp_table(THD*, TMP_TABLE_PARAM*, List<Item>&, st_order*, bool, bool, unsigned long long, unsigned long long, st_mysql_const_lex_string const*, bool, bool) (sql_select.cc:17559)
==13576==    by 0xA07C69: setup_sj_materialization_part1(st_join_table*) (opt_subselect.cc:3852)
==13576==    by 0x88F331: make_outerjoin_info(JOIN*) (sql_select.cc:10516)
==13576==    by 0x87574A: JOIN::optimize_stage2() (sql_select.cc:2088)
==13576==    by 0x875094: JOIN::optimize_inner() (sql_select.cc:1985)
==13576==    by 0x873433: JOIN::optimize() (sql_select.cc:1501)
==13576==    by 0xC3C7FB: subselect_single_select_engine::exec() (item_subselect.cc:3840)
==13576==    by 0xC2FF4E: Item_subselect::exec() (item_subselect.cc:740)
==13576==    by 0xC31E59: Item_singlerow_subselect::val_int() (item_subselect.cc:1333)
==13576==    by 0xB82D7B: Item::save_int_in_field(Field*, bool) (item.cc:6878)
==13576==    by 0xA34DB9: Type_handler_int_result::Item_save_in_field(Item*, Field*, bool) const (sql_type.cc:2592)

Comment by Igor Babaev [ 2021-02-18 ]

Debugging shows that the mem-root allocated for the temporary table to store the rows of v1 is never freed. It happens because the cleanup method for the unit
select count(a) as c from t1 group by aselect count(a) as c from t1 group by a
is never called.
When tracing the code of the function wrap_tvc() we see that before the call
mysql_new_select(lex, 1, tvc_sl)
points to some unit and this unit is

(gdb) p dbug_print_unit(tvc_sl->first_inner_unit())
$166 = 0x555557551220 <dbug_item_print_buf> "select `*` from test.t1 where a > 10 and a in (subquery#4)"

However after the call we have

(gdb) p tvc_sl->first_inner_unit()
$167 = (st_select_lex_unit *) 0x0

It means that the unit is lost in the select tree.

Comment by Igor Babaev [ 2021-02-18 ]

All above observations forces us to re-engineer the code of the function wrap_tvc().

Comment by Oleksandr Byelkin [ 2021-02-22 ]

f3eafa5e05c924bfcf152a3200009d3f3c5819fc OK to push

Comment by Oleksandr Byelkin [ 2021-02-25 ]

OK

Comment by Igor Babaev [ 2021-02-25 ]

The problem with EXPLAIN has nothing to do with this bug and MDEV-24936 was created for it, then fixed and closed.

Comment by Igor Babaev [ 2021-02-25 ]

A fix for this bug was pushed into 10.3.

Generated at Thu Feb 08 09:33:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.