[MDEV-23160] SIGSEGV in Explain_node::print_explain_for_children on UNION SELECT or on EXPLAIN EXTENDED (on optimized builds) Created: 2020-07-14  Updated: 2022-11-22  Resolved: 2022-10-23

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9
Fix Version/s: 10.3.37, 10.4.27, 10.5.18, 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2, 10.11.1

Type: Bug Priority: Critical
Reporter: Roel Van de Paar Assignee: Sergei Petrunia
Resolution: Fixed Votes: 1
Labels: affects-tests

Issue Links:
Duplicate
is duplicated by MDEV-29039 ANALYZE on RECURSIVE CTE crashes Mari... Closed
Relates
relates to MDEV-9513 Assertion `join->group_list || !join-... Closed
relates to MDEV-18157 Server crashes in Explain_node::print... Closed
relates to MDEV-25564 Server crashed on running some EXPLAI... Closed

 Description   

USE test;
SET GLOBAL slow_query_log=ON;
SET SESSION log_queries_not_using_indexes=TRUE;
SET SESSION slow_query_log=ON;
SET SESSION log_slow_verbosity='innodb,query_plan,explain';
CREATE TABLE t (a int);
SELECT * FROM t UNION SELECT * FROM t ORDER BY (SELECT a FROM t WHERE b = 12);  # Will correctly generate "ERROR 1054 (42S22): Unknown column 'b' in 'where clause'" error
SELECT SLEEP(10);  # Sever will crash after small delay

Leads to:

10.5.5 e1013725ce0f3f947e728491eef75d9985e8db2f (Optimized)

Core was generated by `/test/MD250620-mariadb-10.5.5-linux-x86_64-opt/bin/mysqld --no-defaults --core-'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11)
    at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
[Current thread is 1 (Thread 0x152b4a60c700 (LWP 489165))]
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
#1  0x0000563fdaedd0e7 in my_write_core (sig=sig@entry=11) at /test/10.5_opt/mysys/stacktrace.c:518
#2  0x0000563fda8a74ca in handle_fatal_signal (sig=11) at /test/10.5_opt/sql/signal_handler.cc:330
#3  <signal handler called>
#4  Explain_node::print_explain_for_children (this=this@entry=0x152b25d24f08, query=query@entry=0x152b25c4c770, output=output@entry=0x152b4a60a9e0, explain_flags=explain_flags@entry=0 '\000', is_analyze=is_analyze@entry=true) at /test/10.5_opt/sql/sql_explain.cc:663
#5  0x0000563fda7bf78d in Explain_union::print_explain (this=0x152b25d24f08, query=0x152b25c4c770, output=0x152b4a60a9e0, explain_flags=<optimized out>, is_analyze=<optimized out>) at /test/10.5_opt/sql/sql_explain.cc:588
#6  0x0000563fda7bec11 in Explain_query::print_explain_str (this=0x152b25c4c770, thd=thd@entry=0x152b25c12018, out_str=out_str@entry=0x152b4a60ab80, is_analyze=is_analyze@entry=true) at /test/10.5_opt/sql/sql_explain.cc:265
#7  0x0000563fda7bec56 in print_explain_for_slow_log (lex=<optimized out>, thd=thd@entry=0x152b25c12018, str=str@entry=0x152b4a60ab80) at /test/10.5_opt/sql/sql_explain.cc:249
#8  0x0000563fda9890a3 in MYSQL_QUERY_LOG::write (this=this@entry=0x152b47458c28, thd=thd@entry=0x152b25c12018, current_time=<optimized out>, current_time@entry=<error reading variable: That operation is not available on integers of more than 8 bytes.>, user_host=user_host@entry=0x152b4a60b110 "root[root] @ localhost []", user_host_len=user_host_len@entry=25, query_utime=query_utime@entry=855, lock_utime=651, is_command=false, sql_text=0x152b25c47030 "SELECT * FROM t UNION SELECT * FROM t ORDER BY (SELECT a FROM t WHERE b = 12)", sql_text_len=77) at /test/10.5_opt/sql/log.cc:3252
#9  0x0000563fda9894e1 in Log_to_file_event_handler::log_slow (this=0x152b47458800, thd=0x152b25c12018, current_time=<optimized out>, user_host=0x152b4a60b110 "root[root] @ localhost []", user_host_len=25, query_utime=855, lock_utime=651, is_command=false, sql_text=0x152b25c47030 "SELECT * FROM t UNION SELECT * FROM t ORDER BY (SELECT a FROM t WHERE b = 12)", sql_text_len=77) at /test/10.5_opt/sql/log.cc:1092
#10 0x0000563fda9838aa in LOGGER::slow_log_print (this=this@entry=0x563fdba7d260 <logger>, thd=thd@entry=0x152b25c12018, query=0x152b25c47030 "SELECT * FROM t UNION SELECT * FROM t ORDER BY (SELECT a FROM t WHERE b = 12)", query_length=77, current_utime=<optimized out>) at /test/10.5_opt/sql/log.cc:1366
#11 0x0000563fda989c3c in slow_log_print (thd=thd@entry=0x152b25c12018, query=<optimized out>, query_length=<optimized out>, current_utime=<optimized out>) at /test/10.5_opt/sql/log.cc:6798
#12 0x0000563fda6aced9 in log_slow_statement (thd=thd@entry=0x152b25c12018) at /test/10.5_opt/sql/sql_parse.cc:2612
#13 0x0000563fda6aac1b in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x152b25c12018, packet=<optimized out>, packet@entry=0x152b25c3a019 "", packet_length=<optimized out>, packet_length@entry=77, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/10.5_opt/sql/sql_parse.cc:2484
#14 0x0000563fda6aa034 in do_command (thd=0x152b25c12018) at /test/10.5_opt/sql/sql_parse.cc:1355
#15 0x0000563fda79fc51 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x152b474338f8, put_in_cache=put_in_cache@entry=true) at /test/10.5_opt/sql/sql_connect.cc:1411
#16 0x0000563fda79ffb4 in handle_one_connection (arg=arg@entry=0x152b474338f8) at /test/10.5_opt/sql/sql_connect.cc:1313
#17 0x0000563fdab10daa in pfs_spawn_thread (arg=0x152b4744f218) at /test/10.5_opt/storage/perfschema/pfs.cc:2201
#18 0x0000152b495856db in start_thread (arg=0x152b4a60c700) at pthread_create.c:463
#19 0x0000152b48983a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Debug stack;

10.5.5 e1013725ce0f3f947e728491eef75d9985e8db2f (Debug)

Core was generated by `/test/MD250620-mariadb-10.5.5-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11)
    at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
[Current thread is 1 (Thread 0x15166d383700 (LWP 510542))]
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
#1  0x000055f907ae27d0 in my_write_core (sig=sig@entry=11) at /test/10.5_dbg/mysys/stacktrace.c:518
#2  0x000055f90729b47a in handle_fatal_signal (sig=11) at /test/10.5_dbg/sql/signal_handler.cc:330
#3  <signal handler called>
#4  0x000055f90717ba38 in Explain_node::print_explain_for_children (this=this@entry=0x1516491ad180, query=query@entry=0x1516490797e0, output=output@entry=0x15166d3816e0, explain_flags=explain_flags@entry=0 '\000', is_analyze=is_analyze@entry=true) at /test/10.5_dbg/sql/sql_explain.cc:663
#5  0x000055f90717c5d5 in Explain_union::print_explain (this=0x1516491ad180, query=0x1516490797e0, output=0x15166d3816e0, explain_flags=<optimized out>, is_analyze=<optimized out>) at /test/10.5_dbg/sql/sql_explain.cc:588
#6  0x000055f90717b295 in Explain_query::print_explain (this=this@entry=0x1516490797e0, output=output@entry=0x15166d3816e0, explain_flags=<optimized out>, is_analyze=is_analyze@entry=true) at /test/10.5_dbg/sql/sql_explain.cc:211
#7  0x000055f90717b7a6 in Explain_query::print_explain_str (this=0x1516490797e0, thd=thd@entry=0x151649015088, out_str=out_str@entry=0x15166d3818b0, is_analyze=is_analyze@entry=true) at /test/10.5_dbg/sql/sql_explain.cc:265
#8  0x000055f90717b7eb in print_explain_for_slow_log (lex=<optimized out>, thd=thd@entry=0x151649015088, str=str@entry=0x15166d3818b0) at /test/10.5_dbg/sql/sql_explain.cc:249
#9  0x000055f9073d05e7 in MYSQL_QUERY_LOG::write (this=this@entry=0x15166a083128, thd=thd@entry=0x151649015088, current_time=<optimized out>, user_host=user_host@entry=0x15166d381e40 "root[root] @ localhost []", user_host_len=user_host_len@entry=25, query_utime=query_utime@entry=5745, lock_utime=4981, is_command=false, sql_text=0x1516490740a0 "SELECT * FROM t UNION SELECT * FROM t ORDER BY (SELECT a FROM t WHERE b = 12)", sql_text_len=77) at /test/10.5_dbg/sql/log.cc:3252
#10 0x000055f9073d0ae4 in Log_to_file_event_handler::log_slow (this=0x15166a082c00, thd=0x151649015088, current_time=<optimized out>, user_host=0x15166d381e40 "root[root] @ localhost []", user_host_len=25, query_utime=5745, lock_utime=4981, is_command=false, sql_text=0x1516490740a0 "SELECT * FROM t UNION SELECT * FROM t ORDER BY (SELECT a FROM t WHERE b = 12)", sql_text_len=77) at /test/10.5_dbg/sql/log.cc:1092
#11 0x000055f9073c8e62 in LOGGER::slow_log_print (this=this@entry=0x55f908864ea0 <logger>, thd=thd@entry=0x151649015088, query=0x1516490740a0 "SELECT * FROM t UNION SELECT * FROM t ORDER BY (SELECT a FROM t WHERE b = 12)", query_length=77, current_utime=<optimized out>) at /test/10.5_dbg/sql/log.cc:1366
#12 0x000055f9073d1530 in slow_log_print (thd=thd@entry=0x151649015088, query=<optimized out>, query_length=<optimized out>, current_utime=<optimized out>) at /test/10.5_dbg/sql/log.cc:6798
#13 0x000055f906fecf14 in log_slow_statement (thd=thd@entry=0x151649015088) at /test/10.5_dbg/sql/sql_parse.cc:2612
#14 0x000055f906fec68a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x151649015088, packet=<optimized out>, packet@entry=0x151649067089 "", packet_length=<optimized out>, packet_length@entry=77, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/10.5_dbg/sql/sql_parse.cc:2484
#15 0x000055f906fe88ea in do_command (thd=0x151649015088) at /test/10.5_dbg/sql/sql_parse.cc:1355
#16 0x000055f90714415f in do_handle_one_connection (connect=<optimized out>, connect@entry=0x15164c4d2808, put_in_cache=put_in_cache@entry=true) at /test/10.5_dbg/sql/sql_connect.cc:1411
#17 0x000055f90714487b in handle_one_connection (arg=arg@entry=0x15164c4d2808) at /test/10.5_dbg/sql/sql_connect.cc:1313
#18 0x000055f9075a711c in pfs_spawn_thread (arg=0x15166a046508) at /test/10.5_dbg/storage/perfschema/pfs.cc:2201
#19 0x000015166c2fc6db in start_thread (arg=0x15166d383700) at pthread_create.c:463
#20 0x000015166b6faa3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Bug confirmed present in:
MariaDB: 10.3.24 (dbg), 10.3.24 (opt), 10.4.14 (dbg), 10.4.14 (opt), 10.5.5 (dbg), 10.5.5 (opt)

Bug confirmed not present in:
MariaDB: 10.1.46 (dbg), 10.1.46 (opt), 10.2.33 (dbg), 10.2.33 (opt)
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.47 (dbg), 5.6.47 (opt), 5.7.29 (dbg), 5.7.29 (opt), 8.0.19 (dbg), 8.0.19 (opt)



 Comments   
Comment by Roel Van de Paar [ 2020-08-13 ]

USE test;
CREATE TABLE t (a INT);
EXPLAIN EXTENDED SELECT * FROM t WHERE a IN (SELECT a FROM t UNION SELECT a FROM t ORDER BY (SELECT a)) UNION SELECT * FROM t ORDER BY (SELECT a);

Leads to:

10.4.15 eae968f62d285de97ed607c87bc131cd863d5d03 (Optimized)

Core was generated by `/test/MD110820-mariadb-10.4.15-linux-x86_64-opt/bin/mysqld --no-defaults --core'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11)
    at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
[Current thread is 1 (Thread 0x151170155700 (LWP 20297))]
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
#1  0x00005572a7e6fa77 in my_write_core (sig=sig@entry=11) at /test/10.4_opt/mysys/stacktrace.c:482
#2  0x00005572a784762a in handle_fatal_signal (sig=11) at /test/10.4_opt/sql/signal_handler.cc:343
#3  <signal handler called>
#4  Explain_node::print_explain_for_children (this=this@entry=0x151149d5b208, query=query@entry=0x151149cc8e18, output=output@entry=0x151149cc6200, explain_flags=explain_flags@entry=3 '\003', is_analyze=is_analyze@entry=false) at /test/10.4_opt/sql/sql_explain.cc:663
#5  0x00005572a773d81d in Explain_union::print_explain (this=0x151149d5b208, query=0x151149cc8e18, output=0x151149cc6200, explain_flags=<optimized out>, is_analyze=<optimized out>) at /test/10.4_opt/sql/sql_explain.cc:588
#6  0x00005572a773cf7a in Explain_node::print_explain_for_children (this=this@entry=0x151149d5db40, query=query@entry=0x151149cc8e18, output=output@entry=0x151149cc6200, explain_flags=explain_flags@entry=3 '\003', is_analyze=is_analyze@entry=false) at /test/10.4_opt/sql/sql_explain.cc:663
#7  0x00005572a7740dfd in Explain_select::print_explain (this=<optimized out>, query=0x151149cc8e18, output=0x151149cc6200, explain_flags=<optimized out>, is_analyze=<optimized out>) at /test/10.4_opt/sql/sql_explain.cc:855
#8  0x00005572a773d119 in Explain_union::print_explain (this=0x151149d5fca8, query=0x151149cc8e18, output=0x151149cc6200, explain_flags=3 '\003', is_analyze=false) at /test/10.4_opt/sql/sql_explain.cc:503
#9  0x00005572a762efd2 in execute_sqlcom_select (thd=thd@entry=0x151149c12008, all_tables=<optimized out>) at /test/10.4_opt/sql/sql_parse.cc:6310
#10 0x00005572a7636d7a in mysql_execute_command (thd=thd@entry=0x151149c12008) at /test/10.4_opt/sql/sql_parse.cc:3889
#11 0x00005572a763e22a in mysql_parse (thd=0x151149c12008, rawbuf=<optimized out>, length=145, parser_state=0x1511701544d0, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /test/10.4_opt/sql/sql_parse.cc:7896
#12 0x00005572a7640685 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x151149c12008, packet=packet@entry=0x151149c32009 "", packet_length=packet_length@entry=145, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/10.4_opt/sql/sql_parse.cc:1834
#13 0x00005572a7641e04 in do_command (thd=0x151149c12008) at /test/10.4_opt/sql/sql_parse.cc:1352
#14 0x00005572a771edbe in do_handle_one_connection (connect=connect@entry=0x15116d032748) at /test/10.4_opt/sql/sql_connect.cc:1412
#15 0x00005572a771ee7d in handle_one_connection (arg=0x15116d032748) at /test/10.4_opt/sql/sql_connect.cc:1316
#16 0x000015116f3536db in start_thread (arg=0x151170155700) at pthread_create.c:463
#17 0x000015116e4cda3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Bug confirmed present in:
MariaDB: 10.1.47 (dbg), 10.1.47 (opt), 10.2.34 (dbg), 10.2.34 (opt), 10.3.25 (dbg), 10.3.25 (opt), 10.4.15 (dbg), 10.4.15 (opt), 10.5.6 (dbg), 10.5.6 (opt)

Bug confirmed not present in:
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.47 (dbg), 5.6.47 (opt), 5.7.29 (dbg), 5.7.29 (opt), 8.0.19 (dbg), 8.0.19 (opt)

Comment by Varun Gupta (Inactive) [ 2020-08-13 ]

Well i think this is a something that was not taken into account while fixing MDEV-9513.

Comment by Varun Gupta (Inactive) [ 2020-08-13 ]

Dataset :

CREATE TABLE t1 (a int, b INT);
 INSERT INTO t1 VALUES (1),(2),(3);

After the fix for MDEV-9513, the ORDER BY clause is removed from the UNION in the IN subquery, so let say we have a query like:

EXPLAIN SELECT * FROM t1 WHERE a IN (SELECT a FROM t1 A UNION SELECT a FROM t1 B ORDER BY (SELECT a));

this is re-written as

EXPLAIN SELECT * FROM t1 WHERE a IN (SELECT a FROM t1 A UNION SELECT a FROM t1 B);

If I run the query without the ORDER BY clause, the query does not crash. So looks like some cleanup is missing while the ORDER BY clause was removed.

Comment by Roel Van de Paar [ 2020-08-13 ]

Two hopefully helpful thoughts;
1) The original report was from before MDEV-9513 fix it seems
2) Is the fix for MDEV-9513 included in 10.4 at eae968f62d285de97ed607c87bc131cd863d5d03?

Comment by Varun Gupta (Inactive) [ 2020-08-14 ]

Roel I checked the fix for MDEV-9513 is in 10.4 when you tested against eae968f62d285de97ed607c87bc131cd863d5d03.

Comment by Roel Van de Paar [ 2022-02-26 ]

This SQL produces a slightly different stack:

# Repeat till a crash is seen
INSTALL PLUGIN spider SONAME 'ha_spider.so';
SET SQL_mode='';
SET LOCAL slow_query_log=ON;
SET log_slow_verbosity='EXPLAIN';
SET GLOBAL slow_query_log=ON;
SET max_statement_time=0.0001;
SET GLOBAL log_output='TABLE,FILE';
SET long_query_time=0.000001;
VALUES ( (VALUES (2))) UNION VALUES ( (SELECT 3));

10.8.0 e222e44d1bfc995870430bb90d8ac97e91f66cb4 (Optimized)

Core was generated by `/test/MD190122-mariadb-10.8.0-linux-x86_64-opt/bin/mysqld --no-defaults --core-'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  Explain_node::print_explain_for_children (this=this@entry=0x14e7a001c0c8,
    query=query@entry=0x14e7a001bf28, output=output@entry=0x14e85c88c9e0,
    explain_flags=explain_flags@entry=0 '\000',
    is_analyze=is_analyze@entry=true) at /test/10.8_opt/sql/sql_explain.cc:665
 
warning: Source file is more recent than executable.
665         if (node->print_explain(query, output, explain_flags, is_analyze))
[Current thread is 1 (Thread 0x14e85c88e700 (LWP 899588))]
(gdb) bt
#0  Explain_node::print_explain_for_children (this=this@entry=0x14e7a001c0c8, query=query@entry=0x14e7a001bf28, output=output@entry=0x14e85c88c9e0, explain_flags=explain_flags@entry=0 '\000', is_analyze=is_analyze@entry=true) at /test/10.8_opt/sql/sql_explain.cc:665
#1  0x000055ce3e0f2142 in Explain_select::print_explain (this=<optimized out>, query=0x14e7a001bf28, output=<optimized out>, explain_flags=<optimized out>, is_analyze=<optimized out>) at /test/10.8_opt/sql/sql_explain.cc:857
#2  0x000055ce3e0eec3d in Explain_query::print_explain_str (this=0x14e7a001bf28, thd=thd@entry=0x14e7a0000c58, out_str=out_str@entry=0x14e85c88cb80, is_analyze=is_analyze@entry=true) at /test/10.8_opt/sql/sql_explain.cc:267
#3  0x000055ce3e0eec85 in print_explain_for_slow_log (lex=<optimized out>, thd=thd@entry=0x14e7a0000c58, str=str@entry=0x14e85c88cb80) at /test/10.8_opt/sql/sql_explain.cc:251
#4  0x000055ce3e304b0a in MYSQL_QUERY_LOG::write (this=0x55ce4130e398, thd=0x14e7a0000c58, current_time=<optimized out>, user_host=<optimized out>, user_host_len=<optimized out>, query_utime=470, lock_utime=0, is_command=false, sql_text=0x14e7a00108e0 "VALUES ( (VALUES (2))) UNION VALUES ( (SELECT 3))", sql_text_len=49) at /test/10.8_opt/sql/log.cc:3247
#5  0x000055ce3e3052a6 in Log_to_file_event_handler::log_slow (this=0x55ce4130df70, thd=0x14e7a0000c58, current_time=<optimized out>, user_host=0x14e85c88d0e0 "root[root] @ localhost []", user_host_len=25, query_utime=470, lock_utime=0, is_command=false, sql_text=0x14e7a00108e0 "VALUES ( (VALUES (2))) UNION VALUES ( (SELECT 3))", sql_text_len=49) at /test/10.8_opt/sql/log.cc:1077
#6  0x000055ce3e2fe87a in LOGGER::slow_log_print (this=0x55ce3f14fb00 <logger>, thd=0x14e7a0000c58, query=0x14e7a00108e0 "VALUES ( (VALUES (2))) UNION VALUES ( (SELECT 3))", query_length=49, current_utime=<optimized out>) at /test/10.8_opt/sql/log.cc:1354
#7  0x000055ce3dfa6d67 in log_slow_statement (thd=thd@entry=0x14e7a0000c58) at /test/10.8_opt/sql/sql_class.h:1360
#8  0x000055ce3dfac586 in dispatch_command (command=COM_QUERY, thd=0x14e7a0000c58, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /test/10.8_opt/sql/sql_parse.cc:2426
#9  0x000055ce3dfaf387 in do_command (thd=0x14e7a0000c58, blocking=blocking@entry=true) at /test/10.8_opt/sql/sql_parse.cc:1402
#10 0x000055ce3e0cd987 in do_handle_one_connection (connect=<optimized out>, put_in_cache=true) at /test/10.8_opt/sql/sql_connect.cc:1418
#11 0x000055ce3e0cdccd in handle_one_connection (arg=arg@entry=0x55ce419ce5d8) at /test/10.8_opt/sql/sql_connect.cc:1312
#12 0x000055ce3e43bdb8 in pfs_spawn_thread (arg=0x55ce4193acf8) at /test/10.8_opt/storage/perfschema/pfs.cc:2201
#13 0x000014e87ac4a609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#14 0x000014e87a838293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment by Roel Van de Paar [ 2022-02-26 ]

psergei is this waiting for review?

Comment by Roel Van de Paar [ 2022-04-21 ]

Note also MDEV-18157

Comment by Sergei Petrunia [ 2022-04-22 ]

Debugging Varun's testcase, I observe that the subquery in the ORDER BY list is added as a child select (with id=4), but I can't find where the Item_singlerow_subselect is attached.

It is not attached to select#3, it is not attached to union's global_parameters().

Actually it has fixed=0. Proof that the item is never fixed:

CREATE TABLE t1 (a int, b INT);
INSERT INTO t1 VALUES (1,1),(2,3),(3,3);
SELECT * FROM t1 WHERE a IN (SELECT a FROM t1 A UNION SELECT a FROM t1 B ORDER BY (SELECT abcd+1)); 
+------+------+
| a    | b    |
+------+------+
|    1 |    1 |
|    2 |    3 |
|    3 |    3 |
+------+------+

Comment by Sergei Petrunia [ 2022-04-23 ]

Ok, the item cannot be find anywhere because it is removed in st_select_lex_unit::prepare:

  /*
    If we are reading UNION output and the UNION is in the
    IN/ANY/ALL/EXISTS subquery, then ORDER BY is redundant and hence should
    be removed.
...
  */
  if (is_union() && item &&
      (item->is_in_predicate() || item->is_exists_predicate()))
  {
    global_parameters()->order_list.first= NULL;
    global_parameters()->order_list.elements= 0;
  }

But the removal doesn't remove the subquery from the SELECT_LEX tree, and it doesn't set item_subselect->eliminated.

Comment by Sergei Petrunia [ 2022-04-25 ]

Looking at Varun's patch...

st_select_lex_unit::prepare()

  • No longer removes the UNION's ORDER BY clause. (It used to remove it. This happens before fix_fields is called on the item.

st_select_lex_unit::init_prepare_fake_select_lex()

  • marks the item as removed. Note that the item is not fixed, yet.

Note: there are TWO functions related to item removal:

  • Item::eliminate_subselect_processor() - removes the item from e.g. select tree.
  • Item::mark_as_eliminated_processor() - just sets item->eliminated=1.

here, mark_as_eliminated_processor() is used. It is less intrusive so it should still be possible to call fix_fields on the item.

remove_redundant_subquery_clauses()

  • removes ORDER BY clause from fake_select_lex's JOIN object.
  • Note: doesn't remove it from select_lex.

st_select_lex_unit::save_union_explain():
used to analyze select_lex's ORDER BY clause, now needs adjustments:

@@ -4678,7 +4712,7 @@ int st_select_lex_unit::save_union_explain(Explain_query *output)
     eu->add_select(sl->select_number);
 
   eu->fake_select_type= "UNION RESULT";
-  eu->using_filesort= MY_TEST(global_parameters()->order_list.first);
+  eu->using_filesort= MY_TEST(!is_order_by_clause_redundant());
   eu->using_tmp= union_needs_tmp_table();

Comment by Sergei Petrunia [ 2022-10-14 ]

Taking a step back and looking at the issue. The crashes we observe in this and related bugs are caused by inconsistency in EXPLAIN data structures. The inconsistency is of this kind:

There is a parent-child link node_N -> node_M, but node_M is missing.

This is caused by this workflow:

  • When saving plan for nodeN, we also add all links to its children, node_N -> node_Ns_child
  • For some reason, the child node never gets the query plan. Possible reasons are:
    = The child node is eliminated. See this MDEV-19157 and this MDEV for examples.
    = The child node is degenerate (see MDEV-25564 for example)

What if we changed the approach to be as follows:

  • When saving a plan for node_M, add a link to node_M's parent.

This will make sure there are no hanging parent->... links.

What about hanging child->... links? These are possible:

child2.parent -> child1
child1 is eliminated, the optimizer doesn't save the query plan for it. child1->parent link is not present
Parent node is present, with no links to it.

The hanging children should be ignored. They will be, because the Explain data structure is traversed in a bottom-up way.

Comment by Sergei Petrunia [ 2022-10-18 ]

Attempt to implement the above makes the patch too large for released versions. There are a number of issues, for example,
INSERT/UPDATE/DELETE statements do not have a top-level data structure (but they do have a top-level data structure in Explain). This means the code has to handle various special cases.
Will discuss with Sanja tomorrow.

Comment by Sergei Petrunia [ 2022-10-21 ]

bb-10.3-mdev23160 . sanja, please review

Comment by Oleksandr Byelkin [ 2022-10-21 ]

OK to push

Generated at Thu Feb 08 09:20:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.