Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13286

Floating point exception in Filesort_tracker::print_json_members(Json_writer*)

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.2.7
    • 10.2.8
    • Optimizer
    • None
    • bb-10.2-compatibility/build-14921

    Description

      Crasch happens on "ANALYZE FORMAT=JSON ..." on a specific query involving views

      170710 19:49:05 [ERROR] mysqld got signal 8 ;
      [...]
      /usr/sbin/mysqld(Filesort_tracker::print_json_members(Json_writer*)+0xbc)[0x7f85b97ec15c]
      /usr/sbin/mysqld(Explain_aggr_filesort::print_json_members(Json_writer*, bool)+0x174)[0x7f85b97e6884]
      /usr/sbin/mysqld(Explain_aggr_window_funcs::print_json_members(Json_writer*, bool)+0x65)[0x7f85b97e6935]
      /usr/sbin/mysqld(Explain_select::print_explain_json(Explain_query*, Json_writer*, bool)+0x223)[0x7f85b97eb443]
      /usr/sbin/mysqld(Explain_table_access::print_explain_json(Explain_query*, Json_writer*, bool)+0x760)[0x7f85b97eab90]
      /usr/sbin/mysqld(Explain_basic_join::print_explain_json_interns(Explain_query*, Json_writer*, bool)+0x3f)[0x7f85b97eb18f]
      /usr/sbin/mysqld(Explain_select::print_explain_json(Explain_query*, Json_writer*, bool)+0x397)[0x7f85b97eb5b7]
      /usr/sbin/mysqld(Explain_query::print_explain_json(select_result_sink*, bool)+0x2b2)[0x7f85b97e53f2]
      /usr/sbin/mysqld(Explain_query::send_explain(THD*)+0xa8)[0x7f85b97e5528]
      

      The actual crash is probably due at

      33     writer->add_member("r_total_time_ms").
       34             add_double(time_tracker.get_time_ms());
      

      in inlined member function get_time_ms():

       75   double get_time_ms() const
       76   {
       77     // convert 'cycles' to milliseconds.
       78     return 1000 * ((double)cycles) / sys_timer_info.cycles.frequency;
       79   }
      

      Or at least that's the only floating point operation in print_json_members() that I can identify.

      A quick naive check shows that there may indeed situations where "frequency" is zero and so may cause a division by zero exception:

      $ sourcefind -n "cycles.frequency= 0"
      ./mysys/my_rdtsc.c:580:    mti->cycles.frequency= 0;
      

      Attachments

        Activity

          Hello,

          A quick naive check shows that there may indeed situations where "frequency" is zero and so may cause a division by zero exception:

          $ sourcefind -n "cycles.frequency= 0"
          ./mysys/my_rdtsc.c:580:    mti->cycles.frequency= 0;
          

          Looking at the code there, this line of code seems to be handling the case when the system haven't found any way to determine current time. I don't think this is the cause of this bug (if this was the case, any ANALYZE command would have failed, and probably this would not be limited to just ANALYZE statement)

          psergei Sergei Petrunia added a comment - Hello, A quick naive check shows that there may indeed situations where "frequency" is zero and so may cause a division by zero exception: $ sourcefind -n "cycles.frequency= 0" ./mysys/my_rdtsc.c:580: mti->cycles.frequency= 0; Looking at the code there, this line of code seems to be handling the case when the system haven't found any way to determine current time. I don't think this is the cause of this bug (if this was the case, any ANALYZE command would have failed, and probably this would not be limited to just ANALYZE statement)

          Let's pursue the other direction:

          Filesort_tracker::print_json_members() does division by get_r_loops() like so:

                writer->add_ll((longlong) rint(r_limit/get_r_loops()));
          

          Looking at where that value comes from

            ulonglong get_r_loops() const { return time_tracker.get_loops(); }
          

          Could it be that Filesort_tracker was created but filesort never executed (or somehow started and immediately exited, without incrementing the counter) and time_tracker.get_loops() remained zero?

          psergei Sergei Petrunia added a comment - Let's pursue the other direction: Filesort_tracker::print_json_members() does division by get_r_loops() like so: writer->add_ll((longlong) rint(r_limit/get_r_loops())); Looking at where that value comes from ulonglong get_r_loops() const { return time_tracker.get_loops(); } Could it be that Filesort_tracker was created but filesort never executed (or somehow started and immediately exited, without incrementing the counter) and time_tracker.get_loops() remained zero?

          I managed to construct an example showing the problem. I am not sure what happens at the site of the original reporter of the bug, but the below shows that the code has a problem and needs to be fixed:

          create table ten(a int);
          insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
           
          create table t1 (a int, b int, c int);
          insert into t1 select a,a,a from ten;
          create table t2 as select * from t1;
          

          explain select a, (select t2.b from t2 where t2.a<t1.a order by t2.c limit 1) from t1 where t1.a<0;
          +------+--------------------+-------+------+---------------+------+---------+------+------+-----------------------------+
          | id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows | Extra                       |
          +------+--------------------+-------+------+---------------+------+---------+------+------+-----------------------------+
          |    1 | PRIMARY            | t1    | ALL  | NULL          | NULL | NULL    | NULL |    0 | Using where                 |
          |    2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL |   26 | Using where; Using filesort |
          +------+--------------------+-------+------+---------------+------+---------+------+------+-----------------------------+
          

          analyze format=json select a, (select t2.b from t2 where t2.a<t1.a order by t2.c limit 1) from t1 where t1.a<0;
          

            Program received signal SIGFPE, Arithmetic exception.
            0x0000555555c8deda in Filesort_tracker::print_json_members (this=0x7fffc002e858, writer=0x7ffff4eb4ff0) at /home/psergey/dev-git/10.2/sql/sql_analyze_stmt.cc:53
          (gdb) c
            Continuing.
            170710 22:02:27 [ERROR] mysqld got signal 8 ;
            This could be because you hit a bug. It is also possible that this binary
            ...
            Program received signal SIGFPE, Arithmetic exception.
            0x0000555555c8deda in Filesort_tracker::print_json_members (this=0x7fffc002e858, writer=0x7ffff4eb4ff0) at /home/psergey/dev-git/10.2/sql/sql_analyze_stmt.cc:53
           
          (gdb) wher
            #0  0x0000555555c8deda in Filesort_tracker::print_json_members (this=0x7fffc002e858, writer=0x7ffff4eb4ff0) at /home/psergey/dev-git/10.2/sql/sql_analyze_stmt.cc:53
            #1  0x0000555555c87984 in Explain_aggr_filesort::print_json_members (this=0x7fffc002e830, writer=0x7ffff4eb4ff0, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:963
            #2  0x0000555555c896b6 in Explain_table_access::print_explain_json (this=0x7fffc002e320, query=0x7fffc002e0a8, writer=0x7ffff4eb4ff0, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:1529
            #3  0x0000555555c87bab in Explain_basic_join::print_explain_json_interns (this=0x7fffc002e220, query=0x7fffc002e0a8, writer=0x7ffff4eb4ff0, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:1007
            #4  0x0000555555c876c7 in Explain_select::print_explain_json (this=0x7fffc002e220, query=0x7fffc002e0a8, writer=0x7ffff4eb4ff0, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:912
            #5  0x0000555555c86a54 in Explain_node::print_explain_json_for_children (this=0x7fffc002eba8, query=0x7fffc002e0a8, writer=0x7ffff4eb4ff0, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:675
            #6  0x0000555555c87c03 in Explain_basic_join::print_explain_json_interns (this=0x7fffc002eba8, query=0x7fffc002e0a8, writer=0x7ffff4eb4ff0, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:1012
            #7  0x0000555555c876c7 in Explain_select::print_explain_json (this=0x7fffc002eba8, query=0x7fffc002e0a8, writer=0x7ffff4eb4ff0, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:912
            #8  0x0000555555c84de6 in Explain_query::print_explain_json (this=0x7fffc002e0a8, output=0x7fffc002f1d8, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:228
            #9  0x0000555555c84b41 in Explain_query::send_explain (this=0x7fffc002e0a8, thd=0x7fffc0000b00) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:171
            #10 0x0000555555b0c489 in execute_sqlcom_select (thd=0x7fffc0000b00, all_tables=0x7fffc0014ea8) at /home/psergey/dev-git/10.2/sql/sql_parse.cc:6455
            #11 0x0000555555b023c1 in mysql_execute_command (thd=0x7fffc0000b00) at /home/psergey/dev-git/10.2/sql/sql_parse.cc:3458
            #12 0x0000555555b0fd42 in mysql_parse (thd=0x7fffc0000b00, rawbuf=0x7fffc0013208 "analyze format=json select a, (select t2.b from t2 where t2.a<t1.a order by t2.c limit 1) from t1 where t1.a<0", length=110, parser_state=0x7ffff4eb6210, is_com_multi=false, is_next_command=false) at /home/psergey/dev-git/10.2/sql/sql_parse.cc:7879
          

          psergei Sergei Petrunia added a comment - I managed to construct an example showing the problem. I am not sure what happens at the site of the original reporter of the bug, but the below shows that the code has a problem and needs to be fixed: create table ten(a int ); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);   create table t1 (a int , b int , c int ); insert into t1 select a,a,a from ten; create table t2 as select * from t1; explain select a, (select t2.b from t2 where t2.a<t1.a order by t2.c limit 1) from t1 where t1.a<0; +------+--------------------+-------+------+---------------+------+---------+------+------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+-------+------+---------------+------+---------+------+------+-----------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 0 | Using where | | 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 26 | Using where; Using filesort | +------+--------------------+-------+------+---------------+------+---------+------+------+-----------------------------+ analyze format=json select a, ( select t2.b from t2 where t2.a<t1.a order by t2.c limit 1) from t1 where t1.a<0; Program received signal SIGFPE, Arithmetic exception. 0x0000555555c8deda in Filesort_tracker::print_json_members (this=0x7fffc002e858, writer=0x7ffff4eb4ff0) at /home/psergey/dev-git/10.2/sql/sql_analyze_stmt.cc:53 (gdb) c Continuing. 170710 22:02:27 [ERROR] mysqld got signal 8 ; This could be because you hit a bug. It is also possible that this binary ... Program received signal SIGFPE, Arithmetic exception. 0x0000555555c8deda in Filesort_tracker::print_json_members (this=0x7fffc002e858, writer=0x7ffff4eb4ff0) at /home/psergey/dev-git/10.2/sql/sql_analyze_stmt.cc:53   (gdb) wher #0 0x0000555555c8deda in Filesort_tracker::print_json_members (this=0x7fffc002e858, writer=0x7ffff4eb4ff0) at /home/psergey/dev-git/10.2/sql/sql_analyze_stmt.cc:53 #1 0x0000555555c87984 in Explain_aggr_filesort::print_json_members (this=0x7fffc002e830, writer=0x7ffff4eb4ff0, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:963 #2 0x0000555555c896b6 in Explain_table_access::print_explain_json (this=0x7fffc002e320, query=0x7fffc002e0a8, writer=0x7ffff4eb4ff0, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:1529 #3 0x0000555555c87bab in Explain_basic_join::print_explain_json_interns (this=0x7fffc002e220, query=0x7fffc002e0a8, writer=0x7ffff4eb4ff0, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:1007 #4 0x0000555555c876c7 in Explain_select::print_explain_json (this=0x7fffc002e220, query=0x7fffc002e0a8, writer=0x7ffff4eb4ff0, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:912 #5 0x0000555555c86a54 in Explain_node::print_explain_json_for_children (this=0x7fffc002eba8, query=0x7fffc002e0a8, writer=0x7ffff4eb4ff0, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:675 #6 0x0000555555c87c03 in Explain_basic_join::print_explain_json_interns (this=0x7fffc002eba8, query=0x7fffc002e0a8, writer=0x7ffff4eb4ff0, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:1012 #7 0x0000555555c876c7 in Explain_select::print_explain_json (this=0x7fffc002eba8, query=0x7fffc002e0a8, writer=0x7ffff4eb4ff0, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:912 #8 0x0000555555c84de6 in Explain_query::print_explain_json (this=0x7fffc002e0a8, output=0x7fffc002f1d8, is_analyze=true) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:228 #9 0x0000555555c84b41 in Explain_query::send_explain (this=0x7fffc002e0a8, thd=0x7fffc0000b00) at /home/psergey/dev-git/10.2/sql/sql_explain.cc:171 #10 0x0000555555b0c489 in execute_sqlcom_select (thd=0x7fffc0000b00, all_tables=0x7fffc0014ea8) at /home/psergey/dev-git/10.2/sql/sql_parse.cc:6455 #11 0x0000555555b023c1 in mysql_execute_command (thd=0x7fffc0000b00) at /home/psergey/dev-git/10.2/sql/sql_parse.cc:3458 #12 0x0000555555b0fd42 in mysql_parse (thd=0x7fffc0000b00, rawbuf=0x7fffc0013208 "analyze format=json select a, (select t2.b from t2 where t2.a<t1.a order by t2.c limit 1) from t1 where t1.a<0", length=110, parser_state=0x7ffff4eb6210, is_com_multi=false, is_next_command=false) at /home/psergey/dev-git/10.2/sql/sql_parse.cc:7879

          The example in the previous comment shows that it is a valid situation when Filesort_tracker has r_loops=0. Filesort_tracker::print_json_members has a bug - it doesn't handle this case. It should.

          psergei Sergei Petrunia added a comment - The example in the previous comment shows that it is a valid situation when Filesort_tracker has r_loops=0 . Filesort_tracker::print_json_members has a bug - it doesn't handle this case. It should.

          varun, I hope it's clear how to fix this.

          psergei Sergei Petrunia added a comment - varun , I hope it's clear how to fix this.

          Pushed to 10.2 
          commit 9b4d281ecd01b0d1d4a55dfaff5f27720037cb77
          

          varun Varun Gupta (Inactive) added a comment - Pushed to 10.2 commit 9b4d281ecd01b0d1d4a55dfaff5f27720037cb77

          People

            varun Varun Gupta (Inactive)
            hholzgra Hartmut Holzgraefe
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.