Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9991

main.mysql_client_test_nonblock stack corruption with HAVE_BACKTRACE

Details

    • 5.5.50

    Description

      Compile script:

      cmake -DCMAKE_BUILD_TYPE=Debug -DMYSQL_MAINTAINER_MODE=ON -DNOT_FOR_DISTRIBUTION=ON -DWITH_EMBEDDED_SERVER=ON ../mariadb && make
      

      Test output:

      ./mtr mysql_client_test_nonblock
      Logging: /home/svoj/devel/maria/mariadb/mysql-test/mysql-test-run.pl  mysql_client_test_nonblock
      vardir: /home/svoj/devel/maria/debug/mysql-test/var
      Checking leftover processes...
      Removing old var directory...
      Creating var directory '/home/svoj/devel/maria/debug/mysql-test/var'...
      Checking supported features...
      MariaDB Version 5.5.50-MariaDB-debug
      Installing system database...
       - skipping SSL, mysqld not compiled with SSL
       - binaries are debug compiled
      Collecting tests...
       
      ==============================================================================
       
      TEST                                      RESULT   TIME (ms) or COMMENT
      --------------------------------------------------------------------------
       
      worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
      main.mysql_client_test_nonblock          [ fail ]
              Test ended at 2016-04-26 12:24:37
       
      CURRENT_TEST: main.mysql_client_test_nonblock
      mysqltest: At line 17: exec of '/home/svoj/devel/maria/debug/tests/mysql_client_test --defaults-file=/home/svoj/devel/maria/debug/mysql-test/var/my.cnf --testcase --vardir=/home/svoj/devel/maria/debug/mysql-test/var --non-blocking-api --getopt-ll-test=25600M >> /home/svoj/devel/maria/debug/mysql-test/var/log/mysql_client_test.out.log 2>&1' failed, error: 35584, status: 139, errno: 0
      Output from before failure:
      SET @old_slow_query_log= @@global.slow_query_log;
       
       
       
      The result from queries just before the failure was:
      SET @old_general_log= @@global.general_log;
      SET @old_slow_query_log= @@global.slow_query_log;
      

      GDB session:

      $ gdb tests/mysql_client_test
      GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10
      Copyright (C) 2015 Free Software Foundation, Inc.
      License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
      and "show warranty" for details.
      This GDB was configured as "x86_64-linux-gnu".
      Type "show configuration" for configuration details.
      For bug reporting instructions, please see:
      <http://www.gnu.org/software/gdb/bugs/>.
      Find the GDB manual and other documentation resources online at:
      <http://www.gnu.org/software/gdb/documentation/>.
      For help, type "help".
      Type "apropos word" to search for commands related to "word"...
      Reading symbols from tests/mysql_client_test...done.
      (gdb) b my_context_spawn
      Breakpoint 1 at 0x4b6230: file /home/svoj/devel/maria/mariadb/mysys/my_context.c, line 194.
      (gdb) r --defaults-file=/home/svoj/devel/maria/debug/mysql-test/var/my.cnf --testcase --vardir=/home/svoj/devel/maria/debug/mysql-test/var --non-blocking-api --getopt-ll-test=25600M
      Starting program: /home/svoj/devel/maria/debug/tests/mysql_client_test --defaults-file=/home/svoj/devel/maria/debug/mysql-test/var/my.cnf --testcase --vardir=/home/svoj/devel/maria/debug/mysql-test/var --non-blocking-api --getopt-ll-test=25600M
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
      [New Thread 0x7ffff6b7b700 (LWP 3818)]
      [Thread 0x7ffff6b7b700 (LWP 3818) exited]
       
       
      #####################################
      client_connect
      #####################################
       
       
      Breakpoint 1, my_context_spawn (c=0x987f48, f=0x478b5f <mysql_real_connect_start_internal>, d=0x7fffffffe0f0) at /home/svoj/devel/maria/mariadb/mysys/my_context.c:194
      194       DBUG_SWAP_CODE_STATE(&c->dbug_state);
      (gdb) n
      251          : [stack] "a" (c->stack_top),
      (gdb)
      253            [save] "b" (&c->save[0])
      (gdb)
      205       __asm__ __volatile__
      (gdb)
       
      Program received signal SIGSEGV, Segmentation fault.
      0x00007ffff6b8acf9 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
      (gdb) bt
      #0  0x00007ffff6b8acf9 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
      #1  0x00007ffff6b8c618 in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
      #2  0x00007ffff6ea85f6 in __GI___backtrace (array=<optimized out>, size=10) at ../sysdeps/x86_64/backtrace.c:109
      #3  0x00000000004b671e in sf_malloc (size=7) at /home/svoj/devel/maria/mariadb/mysys/safemalloc.c:126
      #4  0x00000000004a41d0 in my_malloc (size=7, my_flags=16) at /home/svoj/devel/maria/mariadb/mysys/my_malloc.c:41
      #5  0x00000000004a469b in my_strdup (from=0x4e6cb6 "latin1", my_flags=16) at /home/svoj/devel/maria/mariadb/mysys/my_malloc.c:141
      #6  0x0000000000481456 in mysql_init_character_set (mysql=0x9879b0) at /home/svoj/devel/maria/mariadb/sql-common/client.c:2306
      #7  0x0000000000483d99 in mysql_real_connect (mysql=0x9879b0, host=0x4e7274 "localhost", user=0x983e07 "root", passwd=0x983f90 "", db=0x4ccaf3 "test", port=16000,
          unix_socket=0x983e29 "/home/svoj/devel/maria/debug/mysql-test/var/tmp/mysqld.1.sock", client_flag=0) at /home/svoj/devel/maria/mariadb/sql-common/client.c:3451
      #8  0x0000000000478bd6 in mysql_real_connect_start_internal (d=0x7fffffffe0f0) at /home/svoj/devel/maria/mariadb/sql-common/mysql_async.c:411
      #9  0x00000000004b6290 in my_context_spawn (c=0x0, f=0x5, d=0x7ffff7de89ed <_dl_fixup+237>) at /home/svoj/devel/maria/mariadb/mysys/my_context.c:205
      #10 0x00000000004a4030 in my_thread_var_dbug () at /home/svoj/devel/maria/mariadb/mysys/my_thr_init.c:464
      Backtrace stopped: previous frame inner to this frame (corrupt stack?)
      (gdb)
      

      Attachments

        Activity

          Note stack corruption happens somewhere in my_context_spawn() assembler.

          svoj Sergey Vojtovich added a comment - Note stack corruption happens somewhere in my_context_spawn() assembler.

          I do not think it is stack corruption, looks rather like a bug in _Unwind_Backtrace() that it is making assumptions on the layout of caller's stack frame and doing unchecked memory accesses.

          (my_context_spawn is creating a new co-routine and thus switching stacks).

          Maybe something can be done to appease Unwind_Backtrace, there is already something to do that in the code. After all, supposedly Unwind_Backtrace() is able to not crash if it reaches the bottom of a stack from a pthread-spawned thread. Here is what is currently done:

          #if __GNUC__ >= 4 && __GNUC_MINOR__ >= 4 && !defined(__INTEL_COMPILER)
               /*
                 This emits a DWARF DW_CFA_undefined directive to make the return address
                 undefined. This indicates that this is the top of the stack frame, and
                 helps tools that use DWARF stack unwinding to obtain stack traces.
                 (I use numeric constant to avoid a dependency on libdwarf includes).
               */
               ".cfi_escape 0x07, 16\n\t"
          #endif
          

          It does not crash for me... can you check if that dwarf directive is correctly emitted in the environment that experiences the failure?

          knielsen Kristian Nielsen added a comment - I do not think it is stack corruption, looks rather like a bug in _Unwind_Backtrace() that it is making assumptions on the layout of caller's stack frame and doing unchecked memory accesses. (my_context_spawn is creating a new co-routine and thus switching stacks). Maybe something can be done to appease Unwind_Backtrace, there is already something to do that in the code. After all, supposedly Unwind_Backtrace() is able to not crash if it reaches the bottom of a stack from a pthread-spawned thread. Here is what is currently done: #if __GNUC__ >= 4 && __GNUC_MINOR__ >= 4 && !defined(__INTEL_COMPILER) /* This emits a DWARF DW_CFA_undefined directive to make the return address undefined. This indicates that this is the top of the stack frame, and helps tools that use DWARF stack unwinding to obtain stack traces. (I use numeric constant to avoid a dependency on libdwarf includes). */ ".cfi_escape 0x07, 16\n\t" #endif It does not crash for me... can you check if that dwarf directive is correctly emitted in the environment that experiences the failure?

          It can be a bug in backtrace() indeed, but gdb seem to be affected as well (see debugging session above). Stack trace is definitely wrong and gdb issues a warning: "Backtrace stopped: previous frame inner to this frame (corrupt stack?)"

          Also note that it is failing for me since I joined back in 2013, it went through a few OS upgrades.

          Did you compile according to instructions? It won't crash for me either if I remove e.g. MYSQL_MAINTAINER_MODE or NOT_FOR_DISTRIBUTION.

          But you did a good catch, this ifdef doesn't cover my compiler version properly (5.2.1). This patch (though it is not completely right) fixes this issue:

          diff --git a/mysys/my_context.c b/mysys/my_context.c
          index 60c0014..46eda34 100644
          --- a/mysys/my_context.c
          +++ b/mysys/my_context.c
          @@ -206,7 +206,7 @@ my_context_spawn(struct my_context *c, void (*f)(void *), void *d)
               (
                "movq %%rsp, (%[save])\n\t"
                "movq %[stack], %%rsp\n\t"
          -#if __GNUC__ >= 4 && __GNUC_MINOR__ >= 4 && !defined(__INTEL_COMPILER)
          +#if __GNUC__ >= 4 && !defined(__INTEL_COMPILER)
                /*
                  This emits a DWARF DW_CFA_undefined directive to make the return address
                  undefined. This indicates that this is the top of the stack frame, and
          

          svoj Sergey Vojtovich added a comment - It can be a bug in backtrace() indeed, but gdb seem to be affected as well (see debugging session above). Stack trace is definitely wrong and gdb issues a warning: "Backtrace stopped: previous frame inner to this frame (corrupt stack?)" Also note that it is failing for me since I joined back in 2013, it went through a few OS upgrades. Did you compile according to instructions? It won't crash for me either if I remove e.g. MYSQL_MAINTAINER_MODE or NOT_FOR_DISTRIBUTION. But you did a good catch, this ifdef doesn't cover my compiler version properly (5.2.1). This patch (though it is not completely right) fixes this issue: diff --git a/mysys/my_context.c b/mysys/my_context.c index 60c0014..46eda34 100644 --- a/mysys/my_context.c +++ b/mysys/my_context.c @@ -206,7 +206,7 @@ my_context_spawn(struct my_context *c, void (*f)(void *), void *d) ( "movq %%rsp, (%[save])\n\t" "movq %[stack], %%rsp\n\t" -#if __GNUC__ >= 4 && __GNUC_MINOR__ >= 4 && !defined(__INTEL_COMPILER) +#if __GNUC__ >= 4 && !defined(__INTEL_COMPILER) /* This emits a DWARF DW_CFA_undefined directive to make the return address undefined. This indicates that this is the top of the stack frame, and

          Ah, you're right, the check is wrong and doesn't catch gcc 5.X! (I have gcc
          4.9.2 so do not see this).

          I think this should be the correct check, can you try it?

          #if (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 4)) && !defined(__INTEL_COMPILER)
          

          As you said, even if libunwind wouldn't crash on bad pointer, it is still
          much better that the stack trace terminates correctly, so we should
          definitely get this fixed.

          knielsen Kristian Nielsen added a comment - Ah, you're right, the check is wrong and doesn't catch gcc 5.X! (I have gcc 4.9.2 so do not see this). I think this should be the correct check, can you try it? #if (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 4)) && !defined(__INTEL_COMPILER) As you said, even if libunwind wouldn't crash on bad pointer, it is still much better that the stack trace terminates correctly, so we should definitely get this fixed.

          It worked, thanks! I believe we should fix i386 my_context_spawn() the very same way.

          svoj Sergey Vojtovich added a comment - It worked, thanks! I believe we should fix i386 my_context_spawn() the very same way.

          Pushed to 5.5, 10.0, and 10.1.

          knielsen Kristian Nielsen added a comment - Pushed to 5.5, 10.0, and 10.1.

          People

            svoj Sergey Vojtovich
            svoj Sergey Vojtovich
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.