[MDEV-22042] Server crash in Item_field::print on ANALYZE FORMAT=JSON Created: 2020-03-26  Updated: 2020-08-25  Resolved: 2020-06-09

Status: Closed
Project: MariaDB Server
Component/s: Optimizer - CTE
Affects Version/s: 10.3.14
Fix Version/s: 10.5.4, 10.2.33, 10.3.24, 10.4.14

Type: Bug Priority: Critical
Reporter: Valerii Kravchuk Assignee: Igor Babaev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-22121 server crashes in Item_field::print a... Closed

 Description   

Server crashes while executing ANALYZE FORMAT=JSON for complex enough query having WITH RECURSIVE clause. Stack trace is:

Server version: 10.3.14-MariaDB-log
key_buffer_size=33554432
read_buffer_size=2097152
max_used_connections=18
max_threads=5010
thread_count=15
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 30832274 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f56400e1ae8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f56e34e2dd8 thread_stack 0x49000
/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x2b)[0x55b150cfa30b]
/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x537)[0x55b1507e7f47]
/lib64/libpthread.so.0(+0xf5d0)[0x7f59792875d0]
/usr/local/mysql/bin/mysqld(_ZN10Item_field5printEP6String15enum_query_type+0x14)[0x55b1507ffd24]
/usr/local/mysql/bin/mysqld(_ZN19Item_func_isnotnull5printEP6String15enum_query_type+0x3d)[0x55b15081496d]
sql/item.cc:7715(Item_field::print(String*, enum_query_type))[0x55b150714166]
sql/item_cmpfunc.cc:5200(Item_func_isnotnull::print(String*, enum_query_type))[0x55b150717b36]
sql/sql_string.h:218(String::c_ptr_safe())[0x55b15071862f]
sql/sql_explain.cc:1550(Explain_table_access::print_explain_json(Explain_query*, Json_writer*, bool))[0x55b150718ae7]
sql/sql_explain.cc:1058(Explain_basic_join::print_explain_json_interns(Explain_query*, Json_writer*, bool))[0x55b150715336]
sql/sql_explain.cc:955(Explain_select::print_explain_json(Explain_query*, Json_writer*, bool))[0x55b150717f21]
sql/sql_explain.cc:1753(Explain_table_access::print_explain_json(Explain_query*, Json_writer*, bool))[0x55b15071862f]
sql/sql_explain.cc:1058(Explain_basic_join::print_explain_json_interns(Explain_query*, Json_writer*, bool))[0x55b1507189d7]
sql/sql_explain.cc:955(Explain_select::print_explain_json(Explain_query*, Json_writer*, bool))[0x55b150717f21]
sql/sql_explain.cc:1753(Explain_table_access::print_explain_json(Explain_query*, Json_writer*, bool))[0x55b15071862f]
sql/sql_explain.cc:1058(Explain_basic_join::print_explain_json_interns(Explain_query*, Json_writer*, bool))[0x55b150718ae7]
sql/sql_explain.cc:955(Explain_select::print_explain_json(Explain_query*, Json_writer*, bool))[0x55b150714a0c]
sql/sql_explain.cc:229(Explain_query::print_explain_json(select_result_sink*, bool))[0x55b150714af0]
sql/sql_explain.cc:179(Explain_query::send_explain(THD*))[0x55b1505447e1]
sql/sql_parse.cc:6560(execute_sqlcom_select(THD*, TABLE_LIST*))[0x55b15062da69]
sql/sql_parse.cc:3821(mysql_execute_command(THD*))[0x55b15062fd19]
sql/sql_class.h:1820(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55b150631e95]
sql/sql_parse.cc:1857(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x55b15063263e]
sql/sql_parse.cc:1405(do_command(THD*))[0x55b1507331b7]
sql/threadpool_common.cc:358(threadpool_process_request)[0x55b1507be2c8]
sql/threadpool_generic.cc:1601(worker_main(void*))[0x55b15095f35d]
/lib64/libpthread.so.0(+0x7dd5)[0x7f597927fdd5]
/lib64/libc.so.6(clone+0x6d)[0x7f597815aead]

It looks very similar to the one in MDEV-8864 fixed long time ago.



 Comments   
Comment by Alice Sherepa [ 2020-04-02 ]

I failed to reproduce the issue, as I am getting MDEV-22121 (or sometimes MDEV-16230 because of the window function). It might be a variation of the same problem or it should be investigated further after the fix MDEV-22121.

Comment by Elena Stepanova [ 2020-04-05 ]

With the test case from MDEV-22121 on 10.3.14 non-debug build with ASAN I am getting this:

10.3.14

==18302==ERROR: AddressSanitizer: heap-use-after-free on address 0x6190000e15c0 at pc 0x561ac0780ea9 bp 0x7fd0745f1850 sp 0x7fd0745f1848
READ of size 8 at 0x6190000e15c0 thread T27
    #0 0x561ac0780ea8 in Item_field::print(String*, enum_query_type) /data/src/10.3-bug/sql/item.cc:7715
    #1 0x561ac07d427b in Item_func_isnotnull::print(String*, enum_query_type) /data/src/10.3-bug/sql/item_cmpfunc.cc:5199
    #2 0x561ac04989ee in write_item /data/src/10.3-bug/sql/sql_explain.cc:1407
    #3 0x561ac04a98b4 in Explain_table_access::print_explain_json(Explain_query*, Json_writer*, bool) /data/src/10.3-bug/sql/sql_explain.cc:1712
    #4 0x561ac04aaaf8 in Explain_basic_join::print_explain_json_interns(Explain_query*, Json_writer*, bool) /data/src/10.3-bug/sql/sql_explain.cc:1056
    #5 0x561ac04ab132 in Explain_select::print_explain_json(Explain_query*, Json_writer*, bool) /data/src/10.3-bug/sql/sql_explain.cc:955
    #6 0x561ac049e9c6 in Explain_union::print_explain_json(Explain_query*, Json_writer*, bool) /data/src/10.3-bug/sql/sql_explain.cc:611
    #7 0x561ac04aa003 in Explain_table_access::print_explain_json(Explain_query*, Json_writer*, bool) /data/src/10.3-bug/sql/sql_explain.cc:1752
    #8 0x561ac04aaaf8 in Explain_basic_join::print_explain_json_interns(Explain_query*, Json_writer*, bool) /data/src/10.3-bug/sql/sql_explain.cc:1056
    #9 0x561ac04ab264 in Explain_basic_join::print_explain_json(Explain_query*, Json_writer*, bool) /data/src/10.3-bug/sql/sql_explain.cc:1039
    #10 0x561ac04a9bc7 in Explain_table_access::print_explain_json(Explain_query*, Json_writer*, bool) /data/src/10.3-bug/sql/sql_explain.cc:1770
    #11 0x561ac04aaaf8 in Explain_basic_join::print_explain_json_interns(Explain_query*, Json_writer*, bool) /data/src/10.3-bug/sql/sql_explain.cc:1056
    #12 0x561ac04ab132 in Explain_select::print_explain_json(Explain_query*, Json_writer*, bool) /data/src/10.3-bug/sql/sql_explain.cc:955
    #13 0x561ac049beb1 in Explain_query::print_explain_json(select_result_sink*, bool) /data/src/10.3-bug/sql/sql_explain.cc:229
    #14 0x561ac049c209 in Explain_query::send_explain(THD*) /data/src/10.3-bug/sql/sql_explain.cc:172
    #15 0x561ac012a157 in execute_sqlcom_select /data/src/10.3-bug/sql/sql_parse.cc:6560
    #16 0x561ac0141e9a in mysql_execute_command(THD*) /data/src/10.3-bug/sql/sql_parse.cc:3821
    #17 0x561ac015cc89 in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) /data/src/10.3-bug/sql/sql_parse.cc:8091
    #18 0x561ac0163b9c in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) /data/src/10.3-bug/sql/sql_parse.cc:1856
    #19 0x561ac016652f in do_command(THD*) /data/src/10.3-bug/sql/sql_parse.cc:1402
    #20 0x561ac04447fd in do_handle_one_connection(CONNECT*) /data/src/10.3-bug/sql/sql_connect.cc:1402
    #21 0x561ac0444e4a in handle_one_connection /data/src/10.3-bug/sql/sql_connect.cc:1308
    #22 0x561ac15e8206 in pfs_spawn_thread /data/src/10.3-bug/storage/perfschema/pfs.cc:1862
    #23 0x7fd08aabffa2 in start_thread /build/glibc-vjB4T1/glibc-2.28/nptl/pthread_create.c:486
    #24 0x7fd08a30e4ce in clone (/lib/x86_64-linux-gnu/libc.so.6+0xf94ce)
 
0x6190000e15c0 is located 64 bytes inside of 992-byte region [0x6190000e1580,0x6190000e1960)
freed by thread T27 here:
    #0 0x7fd08abc1fb0 in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.5+0xe8fb0)
    #1 0x561ac169641b in free_root /data/src/10.3-bug/mysys/my_alloc.c:428
 
previously allocated by thread T27 here:
    #0 0x7fd08abc2330 in __interceptor_malloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9330)
    #1 0x561ac16a95dc in my_malloc /data/src/10.3-bug/mysys/my_malloc.c:101
 
Thread T27 created by T0 here:
    #0 0x7fd08ab29db0 in __interceptor_pthread_create (/lib/x86_64-linux-gnu/libasan.so.5+0x50db0)
    #1 0x561ac15ef832 in spawn_thread_v1 /data/src/10.3-bug/storage/perfschema/pfs.cc:1912
 
SUMMARY: AddressSanitizer: heap-use-after-free /data/src/10.3-bug/sql/item.cc:7715 in Item_field::print(String*, enum_query_type)
Shadow bytes around the buggy address:
  0x0c3280014260: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c3280014270: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c3280014280: fd fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c3280014290: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c32800142a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c32800142b0: fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd
  0x0c32800142c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c32800142d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c32800142e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c32800142f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c3280014300: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==18302==ABORTING

I would say it is as close to the description as it ever gets, given that SIGSEGV on an optimized build is a matter of luck and non-perfect stack trace.

Comment by Igor Babaev [ 2020-06-02 ]

Changed the population of the table t2 in the first test case of MDEV-22121 to stabilize the chosen execution plan:

insert into t2 values (1,1),(2,2),(3,3),(4,4);

Compiled with Valgrind

cmake . -DCMAKE_BUILD_TYPE=Debug -DWITHOUT_MROONGA:bool=1 -DWITHOUT_TOKUDB:bool=1 -DWITH_VALGRIND:BOOL=1

The test of MDEV-22121 passes. Yet Valgrind resports the following problem

==32444== Invalid read of size 8
==32444==    at 0xB7E9F8: Item_field::print(String*, enum_query_type) (item.cc:7769)
==32444==    by 0xB67838: Item::print_parenthesised(String*, enum_query_type, precedence) (item.cc:611)
==32444==    by 0xBA3372: Item_func_isnotnull::print(String*, enum_query_type) (item_cmpfunc.cc:5214)
==32444==    by 0x9E1457: write_item(Json_writer*, Item*) (sql_explain.cc:1414)
==32444==    by 0x9E1739: Explain_table_access::tag_to_json(Json_writer*, explain_extra_tag) (sql_explain.cc:1472)
==32444==    by 0x9E2353: Explain_table_access::print_explain_json(Explain_query*, Json_writer*, bool) (sql_explain.cc:1719)
==32444==    by 0x9E0078: Explain_basic_join::print_explain_json_interns(Explain_query*, Json_writer*, bool) (sql_explain.cc:1063)
==32444==    by 0x9DFAEB: Explain_select::print_explain_json(Explain_query*, Json_writer*, bool) (sql_explain.cc:962)
==32444==    by 0x9DE9C5: Explain_union::print_explain_json(Explain_query*, Json_writer*, bool) (sql_explain.cc:613)
==32444==    by 0x9E2687: Explain_table_access::print_explain_json(Explain_query*, Json_writer*, bool) (sql_explain.cc:1764)
==32444==    by 0x9E0078: Explain_basic_join::print_explain_json_interns(Explain_query*, Json_writer*, bool) (sql_explain.cc:1063)
==32444==    by 0x9DFFC8: Explain_basic_join::print_explain_json(Explain_query*, Json_writer*, bool) (sql_explain.cc:1046)
==32444==    by 0x9E27F4: Explain_table_access::print_explain_json(Explain_query*, Json_writer*, bool) (sql_explain.cc:1782)
==32444==    by 0x9E0078: Explain_basic_join::print_explain_json_interns(Explain_query*, Json_writer*, bool) (sql_explain.cc:1063)
==32444==    by 0x9DFAEB: Explain_select::print_explain_json(Explain_query*, Json_writer*, bool) (sql_explain.cc:962)
==32444==    by 0x9DCFA7: Explain_query::print_explain_json(select_result_sink*, bool) (sql_explain.cc:229)
==32444==  Address 0x19a4d630 is 64 bytes inside a block of size 264 free'd
==32444==    at 0x4C2A37C: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==32444==    by 0x13B973E: my_free (my_malloc.c:223)
==32444==    by 0x13AC0DC: free_root (my_alloc.c:420)
==32444==    by 0x8A0E5B: free_tmp_table(THD*, TABLE*) (sql_select.cc:19017)
==32444==    by 0x78C5FD: close_thread_tables(THD*) (sql_base.cc:798)
==32444==    by 0x747294: sp_lex_keeper::reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*) (sp_head.cc:3359)
==32444==    by 0x747DB8: sp_instr_set::execute(THD*, unsigned int*) (sp_head.cc:3626)
==32444==    by 0x74150D: sp_head::execute(THD*, bool) (sp_head.cc:1371)
==32444==    by 0x743109: sp_head::execute_function(THD*, Item**, unsigned int, Field*, sp_rcontext**, Query_arena*) (sp_head.cc:2029)
==32444==    by 0xB6F868: Item_sp::execute_impl(THD*, Item**, unsigned int) (item.cc:3020)
==32444==    by 0xB6F42A: Item_sp::execute(THD*, bool*, Item**, unsigned int) (item.cc:2933)
==32444==    by 0xBEA5AB: Item_func_sp::execute() (item_func.cc:6410)
==32444==    by 0xBF0E13: Item_func_sp::val_str(String*) (item_func.h:2959)
==32444==    by 0xA342DA: Type_handler::Item_send_str(Item*, Protocol*, st_value*) const (sql_type.cc:5380)
==32444==    by 0xA39339: Type_handler_string_result::Item_send(Item*, Protocol*, st_value*) const (sql_type.h:2250)
==32444==    by 0x714B97: Item::send(Protocol*, st_value*) (item.h:880)

As we can see the problem appears exactly with the same stack as in the reported crash.

Comment by Oleksandr Byelkin [ 2020-06-05 ]

OK to push

Generated at Thu Feb 08 09:11:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.