[MDEV-28124] Server crashes in Explain_aggr_filesort::print_json_members upon SHOW ANALYZE/EXPLAIN FORMAT=JSON Created: 2022-03-18  Updated: 2022-05-02  Resolved: 2022-04-12

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: N/A
Fix Version/s: 10.9.1

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-27021 Extend SHOW EXPLAIN to support SHOW A... Closed

 Description   

The test case is for reproducing/debugging only, don't put it into the regression suite!
Create a deterministic synchronized one instead.

The test case is obviously non-deterministic. If it doesn't fail for you right away, try to adjust run and show values.

--connect (con1,localhost,root,,test)
--let $conid= `SELECT CONNECTION_ID()`
 
--let $run=100
while ($run)
{
  send
    SELECT * FROM sys.waits_by_user_by_latency;
 
  --connection default
  --let $show=100
  while ($show)
  {
    --error 0,ER_TARGET_NOT_EXPLAINABLE
     eval SHOW EXPLAIN FORMAT=JSON FOR $conid;
     --dec $show
  }
  --connection con1
  --reap
  --dec $run
}
 
# Cleanup
--disconnect con1

preview-10.9-MDEV-27021-explain cc5d67c02

#3  <signal handler called>
#4  0x000055882b57378b in append_item_to_str (out=0x7ffa57f67c90, item=0x7ffa4c01df28) at /data/src/preview-10.9-explain-gcov/sql/sql_explain.cc:1546
#5  0x000055882b56fa70 in Explain_aggr_filesort::print_json_members (this=0x7ffa4c070680, writer=0x7ffa57f68050, is_analyze=false) at /data/src/preview-10.9-explain-gcov/sql/sql_explain.cc:1059
#6  0x000055882b574c03 in Explain_table_access::print_explain_json (this=0x7ffa4c070120, query=0x7ffa4c06da88, writer=0x7ffa57f68050, is_analyze=false) at /data/src/preview-10.9-explain-gcov/sql/sql_explain.cc:1736
#7  0x000055882b5701e3 in Explain_basic_join::print_explain_json_interns (this=0x7ffa4c06ffe8, query=0x7ffa4c06da88, writer=0x7ffa57f68050, is_analyze=false) at /data/src/preview-10.9-explain-gcov/sql/sql_explain.cc:1115
#8  0x000055882b56f4f2 in Explain_select::print_explain_json (this=0x7ffa4c06ffe8, query=0x7ffa4c06da88, writer=0x7ffa57f68050, is_analyze=false) at /data/src/preview-10.9-explain-gcov/sql/sql_explain.cc:1010
#9  0x000055882b56926c in Explain_query::print_explain_json (this=0x7ffa4c06da88, output=0x7ffa48018448, is_analyze=false, query_time_in_progress_ms=57) at /data/src/preview-10.9-explain-gcov/sql/sql_explain.cc:238
#10 0x000055882b132e3c in LEX::print_explain (this=0x7ffa4c005100, output=0x7ffa48018448, explain_flags=0 '\000', is_analyze=false, is_json_format=true, printed_anything=0x7ffa57f6822f) at /data/src/preview-10.9-explain-gcov/sql/sql_lex.cc:5856
#11 0x000055882b30fdc0 in Show_explain_request::call_in_target_thread (this=0x7ffa57fb2250) at /data/src/preview-10.9-explain-gcov/sql/sql_show.cc:3003
#12 0x000055882b5db21e in Apc_target::process_apc_requests (this=0x7ffa4c0050a8) at /data/src/preview-10.9-explain-gcov/sql/my_apc.cc:222
#13 0x000055882b57da5f in Apc_target::disable (this=0x7ffa4c0050a8) at /data/src/preview-10.9-explain-gcov/sql/my_apc.h:73
#14 0x000055882b5681c5 in Explain_query::~Explain_query (this=0x7ffa4c06da88, __in_chrg=<optimized out>) at /data/src/preview-10.9-explain-gcov/sql/sql_explain.cc:68
#15 0x000055882b57be06 in delete_explain_query (lex=0x7ffa4c005100) at /data/src/preview-10.9-explain-gcov/sql/sql_explain.cc:2584
#16 0x000055882b18ac30 in log_slow_statement (thd=0x7ffa4c000db8) at /data/src/preview-10.9-explain-gcov/sql/sql_parse.cc:2561
#17 0x000055882b189f11 in dispatch_command (command=COM_QUERY, thd=0x7ffa4c000db8, packet=0x7ffa4c00b9a9 "", packet_length=42, blocking=true) at /data/src/preview-10.9-explain-gcov/sql/sql_parse.cc:2427
#18 0x000055882b18350d in do_command (thd=0x7ffa4c000db8, blocking=true) at /data/src/preview-10.9-explain-gcov/sql/sql_parse.cc:1403
#19 0x000055882b4e55db in do_handle_one_connection (connect=0x558830facaf8, put_in_cache=true) at /data/src/preview-10.9-explain-gcov/sql/sql_connect.cc:1418
#20 0x000055882b4e4de2 in handle_one_connection (arg=0x558830facaf8) at /data/src/preview-10.9-explain-gcov/sql/sql_connect.cc:1312
#21 0x000055882bde30cb in pfs_spawn_thread (arg=0x558830facbd8) at /data/src/preview-10.9-explain-gcov/storage/perfschema/pfs.cc:2201
#22 0x00007ffa69b04ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#23 0x00007ffa69703def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Also reproducible with SHOW ANALYZE FORMAT=JSON



 Comments   
Comment by Sergei Petrunia [ 2022-03-21 ]

Reproducible for me.

Indeed, item seems to point to free'd memory:

(gdb) p *item
  $5 = {<Value_source> = {<No data fields>}, <Type_all_attributes> = {<Type_std_attributes> =
{<Type_numeric_attributes> = {max_length = 2408550287, decimals = 36751, unsigned_flag = 143},
 collation = {collation = 0x8f8f8f8f8f8f8f8f, derivation = 2408550287, repertoire = 2408550287}}, 
_vptr.Type_all_attributes = 0x8f8f8f8f8f8f8f8f}

Comment by Sergei Petrunia [ 2022-03-21 ]

It crashes here:

  #1  0x000055555612285a in Explain_table_access::tag_to_json 
  #2  0x0000555556123766 in Explain_table_access::print_explain_json 
  #3  0x0000555556120b61 in Explain_basic_join::print_explain_json_interns 
  #4  0x000055555612053e in Explain_select::print_explain_json 
  #5  0x000055555611d6f9 in Explain_query::print_explain_json 
  #6  0x0000555555ed5e08 in LEX::print_explain 
  #7  0x0000555555fd37bc in Show_explain_request::call_in_target_thread 
  #8  0x000055555615b49c in Apc_target::process_apc_requests 
  #9  0x0000555556126e13 in Apc_target::disable 
  #10 0x000055555611cdd7 in Explain_query::~Explain_query 
  #11 0x0000555556125ee5 in delete_explain_query 
  #12 0x0000555555f051a4 in log_slow_statement 

log_slow_statement is where --log_slow_verbosity=explain prints tabular ANALYZE output.

Tabular output is "self-contained" (it checks if the attached conditions are NULL but does not print them).

For JSON output, one needs to access items to print the conditions. Could it be that the cut-off for printing the conditions is at some earlier stage in the query processing?

Comment by Sergei Petrunia [ 2022-03-21 ]

Items are freed in free_items()...
which is called from THD::cleanup_after_query() ...

Looking at the code:

dispatch_command()
{
  ...
  mysql_parse(...);
  log_slow_statement();
}

mysql_parse()
{
  ...
  mysql_execute_command()
  ...
  thd->cleanup_after_query(); // this calls free_items()...
}

Indeed, it seems like it's possible to try producing EXPLAIN FORMAT=JSON output after free_items() has been called.

Comment by Sergei Petrunia [ 2022-03-21 ]

First fix suggestion: disallow producing EXPLAIN FORMAT=JSON output before making the thd->cleanup_after_query() call.

Comment by Sergei Petrunia [ 2022-03-21 ]

https://gist.github.com/spetrunia/3e2fa7081caf2a13f843e9fa5f93103b - If I apply this, I start getting a different kind of crash: https://gist.github.com/spetrunia/d3342301e28b89096e12e853d024de8c .

I've encountered something similar to this for ANALYZE. Check out comments for the Item_temptable_field.

Note there:

Step #4 is done after step #3, so "ANALYZE stmt" has no problem printing
Item_field objects that refer to regular tables.

is this true anymore?

Comment by Sergei Petrunia [ 2022-03-22 ]

This is where the Field object gets freed:

  #4  0x0000555556b3bbc6 in root_free (root=0x7ffff010dd20, ptr=0x7fff7c0082d8, size=1016) at /home/psergey/dev-git2/10.9-review-analyze-oleg/mysys/my_alloc.c:78
  #5  0x0000555556b3c997 in free_root (root=0x7ffff010dd20, MyFlags=0) at /home/psergey/dev-git2/10.9-review-analyze-oleg/mysys/my_alloc.c:495
  #6  0x0000555555fa5562 in free_tmp_table (thd=0x7fff7c000d78, entry=0x7fff7ce0e0b0) at /home/psergey/dev-git2/10.9-review-analyze-oleg/sql/sql_select.cc:20406
  #7  0x0000555555e53640 in close_thread_tables (thd=0x7fff7c000d78) at /home/psergey/dev-git2/10.9-review-analyze-oleg/sql/sql_base.cc:840
  #8  0x0000555555f11165 in mysql_execute_command (thd=0x7fff7c000d78, is_called_from_prepared_stmt=false) at /home/psergey/dev-git2/10.9-review-analyze-oleg/sql/sql_parse.cc:6056
  #9  0x0000555555f16bdd in mysql_parse (thd=0x7fff7c000d78, rawbuf=0x7fff7c014220 "SELECT column_name  FROM information_schema.columns  WHERE table_schema IN ('mysql','information_schema','performance_schema','sys')", length=132, parser_state=0x7ffff010e500) at /home/psergey/dev-git2/10.9-review-analyze-oleg/sql/sql_parse.cc:8029
  #10 0x0000555555f03067 in dispatch_command (command=COM_QUERY, thd=0x7fff7c000d78, packet=0x7fff7c00ba89 "", packet_length=132, blocking=true) at /home/psergey/dev-git2/10.9-review-analyze-oleg/sql/sql_parse.cc:1895
  #11 0x0000555555f01934 in do_command (thd=0x7fff7c000d78, blocking=true) at /home/psergey/dev-git2/10.9-review-analyze-oleg/sql/sql_parse.cc:1403
  #12 0x00005555560d59a6 in do_handle_one_connection (connect=0x555558c75748, put_in_cache=true) at /home/psergey/dev-git2/10.9-review-analyze-oleg/sql/sql_connect.cc:1418
  #13 0x00005555560d563c in handle_one_connection (arg=0x555558c75748) at /home/psergey/dev-git2/10.9-review-analyze-oleg/sql/sql_connect.cc:1312
  #14 0x00005555565ce79f in pfs_spawn_thread (arg=0x555558b6d778) at /home/psergey/dev-git2/10.9-review-analyze-oleg/storage/perfschema/pfs.cc:2201

Note that the table is freed in free_tmp_table(), that is, it's a temporary table.
However, the Item_field object pointing to it is NOT an Item_temptable_field..

(gdb) p $i
  $113 = (Item_func_in * const) 0x7fff74017b58
(gdb) p $i->args[0]
  $114 = (Item_field *) 0x7fff74017748

Comment by Sergei Petrunia [ 2022-03-22 ]

Fix candidate : https://gist.github.com/spetrunia/211ae821160d5ecdbb403609d536f538 . It also fixes MDEV-28125.

Comment by Sergei Petrunia [ 2022-04-12 ]

Fix pushed into preview-10.9-MDEV-27021-explain. See also a closely related MDEV-28201.

Comment by Marko Mäkelä [ 2022-05-02 ]

Because the sys schema depends on PLUGIN_PERFSCHEMA,

--source include/have_perfschema.inc

should have been included in the tests.

Generated at Thu Feb 08 09:58:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.