Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23160

SIGSEGV in Explain_node::print_explain_for_children on UNION SELECT or on EXPLAIN EXTENDED (on optimized builds)

Details

    Description

      USE test;
      SET GLOBAL slow_query_log=ON;
      SET SESSION log_queries_not_using_indexes=TRUE;
      SET SESSION slow_query_log=ON;
      SET SESSION log_slow_verbosity='innodb,query_plan,explain';
      CREATE TABLE t (a int);
      SELECT * FROM t UNION SELECT * FROM t ORDER BY (SELECT a FROM t WHERE b = 12);  # Will correctly generate "ERROR 1054 (42S22): Unknown column 'b' in 'where clause'" error
      SELECT SLEEP(10);  # Sever will crash after small delay
      

      Leads to:

      10.5.5 e1013725ce0f3f947e728491eef75d9985e8db2f (Optimized)

      Core was generated by `/test/MD250620-mariadb-10.5.5-linux-x86_64-opt/bin/mysqld --no-defaults --core-'.
      Program terminated with signal SIGSEGV, Segmentation fault.
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11)
          at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
      [Current thread is 1 (Thread 0x152b4a60c700 (LWP 489165))]
      (gdb) bt
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
      #1  0x0000563fdaedd0e7 in my_write_core (sig=sig@entry=11) at /test/10.5_opt/mysys/stacktrace.c:518
      #2  0x0000563fda8a74ca in handle_fatal_signal (sig=11) at /test/10.5_opt/sql/signal_handler.cc:330
      #3  <signal handler called>
      #4  Explain_node::print_explain_for_children (this=this@entry=0x152b25d24f08, query=query@entry=0x152b25c4c770, output=output@entry=0x152b4a60a9e0, explain_flags=explain_flags@entry=0 '\000', is_analyze=is_analyze@entry=true) at /test/10.5_opt/sql/sql_explain.cc:663
      #5  0x0000563fda7bf78d in Explain_union::print_explain (this=0x152b25d24f08, query=0x152b25c4c770, output=0x152b4a60a9e0, explain_flags=<optimized out>, is_analyze=<optimized out>) at /test/10.5_opt/sql/sql_explain.cc:588
      #6  0x0000563fda7bec11 in Explain_query::print_explain_str (this=0x152b25c4c770, thd=thd@entry=0x152b25c12018, out_str=out_str@entry=0x152b4a60ab80, is_analyze=is_analyze@entry=true) at /test/10.5_opt/sql/sql_explain.cc:265
      #7  0x0000563fda7bec56 in print_explain_for_slow_log (lex=<optimized out>, thd=thd@entry=0x152b25c12018, str=str@entry=0x152b4a60ab80) at /test/10.5_opt/sql/sql_explain.cc:249
      #8  0x0000563fda9890a3 in MYSQL_QUERY_LOG::write (this=this@entry=0x152b47458c28, thd=thd@entry=0x152b25c12018, current_time=<optimized out>, current_time@entry=<error reading variable: That operation is not available on integers of more than 8 bytes.>, user_host=user_host@entry=0x152b4a60b110 "root[root] @ localhost []", user_host_len=user_host_len@entry=25, query_utime=query_utime@entry=855, lock_utime=651, is_command=false, sql_text=0x152b25c47030 "SELECT * FROM t UNION SELECT * FROM t ORDER BY (SELECT a FROM t WHERE b = 12)", sql_text_len=77) at /test/10.5_opt/sql/log.cc:3252
      #9  0x0000563fda9894e1 in Log_to_file_event_handler::log_slow (this=0x152b47458800, thd=0x152b25c12018, current_time=<optimized out>, user_host=0x152b4a60b110 "root[root] @ localhost []", user_host_len=25, query_utime=855, lock_utime=651, is_command=false, sql_text=0x152b25c47030 "SELECT * FROM t UNION SELECT * FROM t ORDER BY (SELECT a FROM t WHERE b = 12)", sql_text_len=77) at /test/10.5_opt/sql/log.cc:1092
      #10 0x0000563fda9838aa in LOGGER::slow_log_print (this=this@entry=0x563fdba7d260 <logger>, thd=thd@entry=0x152b25c12018, query=0x152b25c47030 "SELECT * FROM t UNION SELECT * FROM t ORDER BY (SELECT a FROM t WHERE b = 12)", query_length=77, current_utime=<optimized out>) at /test/10.5_opt/sql/log.cc:1366
      #11 0x0000563fda989c3c in slow_log_print (thd=thd@entry=0x152b25c12018, query=<optimized out>, query_length=<optimized out>, current_utime=<optimized out>) at /test/10.5_opt/sql/log.cc:6798
      #12 0x0000563fda6aced9 in log_slow_statement (thd=thd@entry=0x152b25c12018) at /test/10.5_opt/sql/sql_parse.cc:2612
      #13 0x0000563fda6aac1b in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x152b25c12018, packet=<optimized out>, packet@entry=0x152b25c3a019 "", packet_length=<optimized out>, packet_length@entry=77, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/10.5_opt/sql/sql_parse.cc:2484
      #14 0x0000563fda6aa034 in do_command (thd=0x152b25c12018) at /test/10.5_opt/sql/sql_parse.cc:1355
      #15 0x0000563fda79fc51 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x152b474338f8, put_in_cache=put_in_cache@entry=true) at /test/10.5_opt/sql/sql_connect.cc:1411
      #16 0x0000563fda79ffb4 in handle_one_connection (arg=arg@entry=0x152b474338f8) at /test/10.5_opt/sql/sql_connect.cc:1313
      #17 0x0000563fdab10daa in pfs_spawn_thread (arg=0x152b4744f218) at /test/10.5_opt/storage/perfschema/pfs.cc:2201
      #18 0x0000152b495856db in start_thread (arg=0x152b4a60c700) at pthread_create.c:463
      #19 0x0000152b48983a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Debug stack;

      10.5.5 e1013725ce0f3f947e728491eef75d9985e8db2f (Debug)

      Core was generated by `/test/MD250620-mariadb-10.5.5-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
      Program terminated with signal SIGSEGV, Segmentation fault.
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11)
          at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
      [Current thread is 1 (Thread 0x15166d383700 (LWP 510542))]
      (gdb) bt
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
      #1  0x000055f907ae27d0 in my_write_core (sig=sig@entry=11) at /test/10.5_dbg/mysys/stacktrace.c:518
      #2  0x000055f90729b47a in handle_fatal_signal (sig=11) at /test/10.5_dbg/sql/signal_handler.cc:330
      #3  <signal handler called>
      #4  0x000055f90717ba38 in Explain_node::print_explain_for_children (this=this@entry=0x1516491ad180, query=query@entry=0x1516490797e0, output=output@entry=0x15166d3816e0, explain_flags=explain_flags@entry=0 '\000', is_analyze=is_analyze@entry=true) at /test/10.5_dbg/sql/sql_explain.cc:663
      #5  0x000055f90717c5d5 in Explain_union::print_explain (this=0x1516491ad180, query=0x1516490797e0, output=0x15166d3816e0, explain_flags=<optimized out>, is_analyze=<optimized out>) at /test/10.5_dbg/sql/sql_explain.cc:588
      #6  0x000055f90717b295 in Explain_query::print_explain (this=this@entry=0x1516490797e0, output=output@entry=0x15166d3816e0, explain_flags=<optimized out>, is_analyze=is_analyze@entry=true) at /test/10.5_dbg/sql/sql_explain.cc:211
      #7  0x000055f90717b7a6 in Explain_query::print_explain_str (this=0x1516490797e0, thd=thd@entry=0x151649015088, out_str=out_str@entry=0x15166d3818b0, is_analyze=is_analyze@entry=true) at /test/10.5_dbg/sql/sql_explain.cc:265
      #8  0x000055f90717b7eb in print_explain_for_slow_log (lex=<optimized out>, thd=thd@entry=0x151649015088, str=str@entry=0x15166d3818b0) at /test/10.5_dbg/sql/sql_explain.cc:249
      #9  0x000055f9073d05e7 in MYSQL_QUERY_LOG::write (this=this@entry=0x15166a083128, thd=thd@entry=0x151649015088, current_time=<optimized out>, user_host=user_host@entry=0x15166d381e40 "root[root] @ localhost []", user_host_len=user_host_len@entry=25, query_utime=query_utime@entry=5745, lock_utime=4981, is_command=false, sql_text=0x1516490740a0 "SELECT * FROM t UNION SELECT * FROM t ORDER BY (SELECT a FROM t WHERE b = 12)", sql_text_len=77) at /test/10.5_dbg/sql/log.cc:3252
      #10 0x000055f9073d0ae4 in Log_to_file_event_handler::log_slow (this=0x15166a082c00, thd=0x151649015088, current_time=<optimized out>, user_host=0x15166d381e40 "root[root] @ localhost []", user_host_len=25, query_utime=5745, lock_utime=4981, is_command=false, sql_text=0x1516490740a0 "SELECT * FROM t UNION SELECT * FROM t ORDER BY (SELECT a FROM t WHERE b = 12)", sql_text_len=77) at /test/10.5_dbg/sql/log.cc:1092
      #11 0x000055f9073c8e62 in LOGGER::slow_log_print (this=this@entry=0x55f908864ea0 <logger>, thd=thd@entry=0x151649015088, query=0x1516490740a0 "SELECT * FROM t UNION SELECT * FROM t ORDER BY (SELECT a FROM t WHERE b = 12)", query_length=77, current_utime=<optimized out>) at /test/10.5_dbg/sql/log.cc:1366
      #12 0x000055f9073d1530 in slow_log_print (thd=thd@entry=0x151649015088, query=<optimized out>, query_length=<optimized out>, current_utime=<optimized out>) at /test/10.5_dbg/sql/log.cc:6798
      #13 0x000055f906fecf14 in log_slow_statement (thd=thd@entry=0x151649015088) at /test/10.5_dbg/sql/sql_parse.cc:2612
      #14 0x000055f906fec68a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x151649015088, packet=<optimized out>, packet@entry=0x151649067089 "", packet_length=<optimized out>, packet_length@entry=77, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/10.5_dbg/sql/sql_parse.cc:2484
      #15 0x000055f906fe88ea in do_command (thd=0x151649015088) at /test/10.5_dbg/sql/sql_parse.cc:1355
      #16 0x000055f90714415f in do_handle_one_connection (connect=<optimized out>, connect@entry=0x15164c4d2808, put_in_cache=put_in_cache@entry=true) at /test/10.5_dbg/sql/sql_connect.cc:1411
      #17 0x000055f90714487b in handle_one_connection (arg=arg@entry=0x15164c4d2808) at /test/10.5_dbg/sql/sql_connect.cc:1313
      #18 0x000055f9075a711c in pfs_spawn_thread (arg=0x15166a046508) at /test/10.5_dbg/storage/perfschema/pfs.cc:2201
      #19 0x000015166c2fc6db in start_thread (arg=0x15166d383700) at pthread_create.c:463
      #20 0x000015166b6faa3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Bug confirmed present in:
      MariaDB: 10.3.24 (dbg), 10.3.24 (opt), 10.4.14 (dbg), 10.4.14 (opt), 10.5.5 (dbg), 10.5.5 (opt)

      Bug confirmed not present in:
      MariaDB: 10.1.46 (dbg), 10.1.46 (opt), 10.2.33 (dbg), 10.2.33 (opt)
      MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.47 (dbg), 5.6.47 (opt), 5.7.29 (dbg), 5.7.29 (opt), 8.0.19 (dbg), 8.0.19 (opt)

      Attachments

        Issue Links

          Activity

            Looking at Varun's patch...

            st_select_lex_unit::prepare()

            • No longer removes the UNION's ORDER BY clause. (It used to remove it. This happens before fix_fields is called on the item.

            st_select_lex_unit::init_prepare_fake_select_lex()

            • marks the item as removed. Note that the item is not fixed, yet.

            Note: there are TWO functions related to item removal:

            • Item::eliminate_subselect_processor() - removes the item from e.g. select tree.
            • Item::mark_as_eliminated_processor() - just sets item->eliminated=1.

            here, mark_as_eliminated_processor() is used. It is less intrusive so it should still be possible to call fix_fields on the item.

            remove_redundant_subquery_clauses()

            • removes ORDER BY clause from fake_select_lex's JOIN object.
            • Note: doesn't remove it from select_lex.

            st_select_lex_unit::save_union_explain():
            used to analyze select_lex's ORDER BY clause, now needs adjustments:

            @@ -4678,7 +4712,7 @@ int st_select_lex_unit::save_union_explain(Explain_query *output)
                 eu->add_select(sl->select_number);
             
               eu->fake_select_type= "UNION RESULT";
            -  eu->using_filesort= MY_TEST(global_parameters()->order_list.first);
            +  eu->using_filesort= MY_TEST(!is_order_by_clause_redundant());
               eu->using_tmp= union_needs_tmp_table();
            

            psergei Sergei Petrunia added a comment - Looking at Varun's patch... st_select_lex_unit::prepare() No longer removes the UNION's ORDER BY clause. (It used to remove it. This happens before fix_fields is called on the item. st_select_lex_unit::init_prepare_fake_select_lex() marks the item as removed. Note that the item is not fixed, yet. Note: there are TWO functions related to item removal: Item::eliminate_subselect_processor() - removes the item from e.g. select tree. Item::mark_as_eliminated_processor() - just sets item->eliminated=1. here, mark_as_eliminated_processor() is used. It is less intrusive so it should still be possible to call fix_fields on the item. remove_redundant_subquery_clauses() removes ORDER BY clause from fake_select_lex's JOIN object. Note: doesn't remove it from select_lex. st_select_lex_unit::save_union_explain(): used to analyze select_lex's ORDER BY clause, now needs adjustments: @@ -4678,7 +4712,7 @@ int st_select_lex_unit::save_union_explain(Explain_query *output) eu->add_select(sl->select_number); eu->fake_select_type= "UNION RESULT"; - eu->using_filesort= MY_TEST(global_parameters()->order_list.first); + eu->using_filesort= MY_TEST(!is_order_by_clause_redundant()); eu->using_tmp= union_needs_tmp_table();

            Taking a step back and looking at the issue. The crashes we observe in this and related bugs are caused by inconsistency in EXPLAIN data structures. The inconsistency is of this kind:

            There is a parent-child link node_N -> node_M, but node_M is missing.
            

            This is caused by this workflow:

            • When saving plan for nodeN, we also add all links to its children, node_N -> node_Ns_child
            • For some reason, the child node never gets the query plan. Possible reasons are:
              = The child node is eliminated. See this MDEV-19157 and this MDEV for examples.
              = The child node is degenerate (see MDEV-25564 for example)

            What if we changed the approach to be as follows:

            • When saving a plan for node_M, add a link to node_M's parent.

            This will make sure there are no hanging parent->... links.

            What about hanging child->... links? These are possible:

            child2.parent -> child1
            child1 is eliminated, the optimizer doesn't save the query plan for it. child1->parent link is not present
            Parent node is present, with no links to it.
            

            The hanging children should be ignored. They will be, because the Explain data structure is traversed in a bottom-up way.

            psergei Sergei Petrunia added a comment - Taking a step back and looking at the issue. The crashes we observe in this and related bugs are caused by inconsistency in EXPLAIN data structures. The inconsistency is of this kind: There is a parent-child link node_N -> node_M, but node_M is missing. This is caused by this workflow: When saving plan for nodeN, we also add all links to its children, node_N -> node_Ns_child For some reason, the child node never gets the query plan. Possible reasons are: = The child node is eliminated. See this MDEV-19157 and this MDEV for examples. = The child node is degenerate (see MDEV-25564 for example) What if we changed the approach to be as follows: When saving a plan for node_M, add a link to node_M's parent. This will make sure there are no hanging parent->... links. What about hanging child->... links? These are possible: child2.parent -> child1 child1 is eliminated, the optimizer doesn't save the query plan for it. child1->parent link is not present Parent node is present, with no links to it. The hanging children should be ignored. They will be, because the Explain data structure is traversed in a bottom-up way.

            Attempt to implement the above makes the patch too large for released versions. There are a number of issues, for example,
            INSERT/UPDATE/DELETE statements do not have a top-level data structure (but they do have a top-level data structure in Explain). This means the code has to handle various special cases.
            Will discuss with Sanja tomorrow.

            psergei Sergei Petrunia added a comment - Attempt to implement the above makes the patch too large for released versions. There are a number of issues, for example, INSERT/UPDATE/DELETE statements do not have a top-level data structure (but they do have a top-level data structure in Explain). This means the code has to handle various special cases. Will discuss with Sanja tomorrow.

            bb-10.3-mdev23160 . sanja, please review

            psergei Sergei Petrunia added a comment - bb-10.3-mdev23160 . sanja , please review

            OK to push

            sanja Oleksandr Byelkin added a comment - OK to push

            People

              psergei Sergei Petrunia
              Roel Roel Van de Paar
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.