[MDEV-9991] main.mysql_client_test_nonblock stack corruption with HAVE_BACKTRACE Created: 2016-04-26  Updated: 2016-06-08  Resolved: 2016-06-08

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 5.5, 10.0, 10.1, 10.2
Fix Version/s: 5.5.50, 10.0.26, 10.1.15

Type: Bug Priority: Major
Reporter: Sergey Vojtovich Assignee: Sergey Vojtovich
Resolution: Fixed Votes: 0
Labels: foundation

Sprint: 5.5.50

 Description   

Compile script:

cmake -DCMAKE_BUILD_TYPE=Debug -DMYSQL_MAINTAINER_MODE=ON -DNOT_FOR_DISTRIBUTION=ON -DWITH_EMBEDDED_SERVER=ON ../mariadb && make

Test output:

./mtr mysql_client_test_nonblock
Logging: /home/svoj/devel/maria/mariadb/mysql-test/mysql-test-run.pl  mysql_client_test_nonblock
vardir: /home/svoj/devel/maria/debug/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/home/svoj/devel/maria/debug/mysql-test/var'...
Checking supported features...
MariaDB Version 5.5.50-MariaDB-debug
Installing system database...
 - skipping SSL, mysqld not compiled with SSL
 - binaries are debug compiled
Collecting tests...
 
==============================================================================
 
TEST                                      RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------
 
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
main.mysql_client_test_nonblock          [ fail ]
        Test ended at 2016-04-26 12:24:37
 
CURRENT_TEST: main.mysql_client_test_nonblock
mysqltest: At line 17: exec of '/home/svoj/devel/maria/debug/tests/mysql_client_test --defaults-file=/home/svoj/devel/maria/debug/mysql-test/var/my.cnf --testcase --vardir=/home/svoj/devel/maria/debug/mysql-test/var --non-blocking-api --getopt-ll-test=25600M >> /home/svoj/devel/maria/debug/mysql-test/var/log/mysql_client_test.out.log 2>&1' failed, error: 35584, status: 139, errno: 0
Output from before failure:
SET @old_slow_query_log= @@global.slow_query_log;
 
 
 
The result from queries just before the failure was:
SET @old_general_log= @@global.general_log;
SET @old_slow_query_log= @@global.slow_query_log;

GDB session:

$ gdb tests/mysql_client_test
GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from tests/mysql_client_test...done.
(gdb) b my_context_spawn
Breakpoint 1 at 0x4b6230: file /home/svoj/devel/maria/mariadb/mysys/my_context.c, line 194.
(gdb) r --defaults-file=/home/svoj/devel/maria/debug/mysql-test/var/my.cnf --testcase --vardir=/home/svoj/devel/maria/debug/mysql-test/var --non-blocking-api --getopt-ll-test=25600M
Starting program: /home/svoj/devel/maria/debug/tests/mysql_client_test --defaults-file=/home/svoj/devel/maria/debug/mysql-test/var/my.cnf --testcase --vardir=/home/svoj/devel/maria/debug/mysql-test/var --non-blocking-api --getopt-ll-test=25600M
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6b7b700 (LWP 3818)]
[Thread 0x7ffff6b7b700 (LWP 3818) exited]
 
 
#####################################
client_connect
#####################################
 
 
Breakpoint 1, my_context_spawn (c=0x987f48, f=0x478b5f <mysql_real_connect_start_internal>, d=0x7fffffffe0f0) at /home/svoj/devel/maria/mariadb/mysys/my_context.c:194
194       DBUG_SWAP_CODE_STATE(&c->dbug_state);
(gdb) n
251          : [stack] "a" (c->stack_top),
(gdb)
253            [save] "b" (&c->save[0])
(gdb)
205       __asm__ __volatile__
(gdb)
 
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6b8acf9 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
(gdb) bt
#0  0x00007ffff6b8acf9 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#1  0x00007ffff6b8c618 in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#2  0x00007ffff6ea85f6 in __GI___backtrace (array=<optimized out>, size=10) at ../sysdeps/x86_64/backtrace.c:109
#3  0x00000000004b671e in sf_malloc (size=7) at /home/svoj/devel/maria/mariadb/mysys/safemalloc.c:126
#4  0x00000000004a41d0 in my_malloc (size=7, my_flags=16) at /home/svoj/devel/maria/mariadb/mysys/my_malloc.c:41
#5  0x00000000004a469b in my_strdup (from=0x4e6cb6 "latin1", my_flags=16) at /home/svoj/devel/maria/mariadb/mysys/my_malloc.c:141
#6  0x0000000000481456 in mysql_init_character_set (mysql=0x9879b0) at /home/svoj/devel/maria/mariadb/sql-common/client.c:2306
#7  0x0000000000483d99 in mysql_real_connect (mysql=0x9879b0, host=0x4e7274 "localhost", user=0x983e07 "root", passwd=0x983f90 "", db=0x4ccaf3 "test", port=16000,
    unix_socket=0x983e29 "/home/svoj/devel/maria/debug/mysql-test/var/tmp/mysqld.1.sock", client_flag=0) at /home/svoj/devel/maria/mariadb/sql-common/client.c:3451
#8  0x0000000000478bd6 in mysql_real_connect_start_internal (d=0x7fffffffe0f0) at /home/svoj/devel/maria/mariadb/sql-common/mysql_async.c:411
#9  0x00000000004b6290 in my_context_spawn (c=0x0, f=0x5, d=0x7ffff7de89ed <_dl_fixup+237>) at /home/svoj/devel/maria/mariadb/mysys/my_context.c:205
#10 0x00000000004a4030 in my_thread_var_dbug () at /home/svoj/devel/maria/mariadb/mysys/my_thr_init.c:464
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)



 Comments   
Comment by Sergey Vojtovich [ 2016-04-26 ]

Note stack corruption happens somewhere in my_context_spawn() assembler.

Comment by Kristian Nielsen [ 2016-04-26 ]

I do not think it is stack corruption, looks rather like a bug in _Unwind_Backtrace() that it is making assumptions on the layout of caller's stack frame and doing unchecked memory accesses.

(my_context_spawn is creating a new co-routine and thus switching stacks).

Maybe something can be done to appease Unwind_Backtrace, there is already something to do that in the code. After all, supposedly Unwind_Backtrace() is able to not crash if it reaches the bottom of a stack from a pthread-spawned thread. Here is what is currently done:

#if __GNUC__ >= 4 && __GNUC_MINOR__ >= 4 && !defined(__INTEL_COMPILER)
     /*
       This emits a DWARF DW_CFA_undefined directive to make the return address
       undefined. This indicates that this is the top of the stack frame, and
       helps tools that use DWARF stack unwinding to obtain stack traces.
       (I use numeric constant to avoid a dependency on libdwarf includes).
     */
     ".cfi_escape 0x07, 16\n\t"
#endif

It does not crash for me... can you check if that dwarf directive is correctly emitted in the environment that experiences the failure?

Comment by Sergey Vojtovich [ 2016-04-26 ]

It can be a bug in backtrace() indeed, but gdb seem to be affected as well (see debugging session above). Stack trace is definitely wrong and gdb issues a warning: "Backtrace stopped: previous frame inner to this frame (corrupt stack?)"

Also note that it is failing for me since I joined back in 2013, it went through a few OS upgrades.

Did you compile according to instructions? It won't crash for me either if I remove e.g. MYSQL_MAINTAINER_MODE or NOT_FOR_DISTRIBUTION.

But you did a good catch, this ifdef doesn't cover my compiler version properly (5.2.1). This patch (though it is not completely right) fixes this issue:

diff --git a/mysys/my_context.c b/mysys/my_context.c
index 60c0014..46eda34 100644
--- a/mysys/my_context.c
+++ b/mysys/my_context.c
@@ -206,7 +206,7 @@ my_context_spawn(struct my_context *c, void (*f)(void *), void *d)
     (
      "movq %%rsp, (%[save])\n\t"
      "movq %[stack], %%rsp\n\t"
-#if __GNUC__ >= 4 && __GNUC_MINOR__ >= 4 && !defined(__INTEL_COMPILER)
+#if __GNUC__ >= 4 && !defined(__INTEL_COMPILER)
      /*
        This emits a DWARF DW_CFA_undefined directive to make the return address
        undefined. This indicates that this is the top of the stack frame, and

Comment by Kristian Nielsen [ 2016-04-26 ]

Ah, you're right, the check is wrong and doesn't catch gcc 5.X! (I have gcc
4.9.2 so do not see this).

I think this should be the correct check, can you try it?

#if (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 4)) && !defined(__INTEL_COMPILER)

As you said, even if libunwind wouldn't crash on bad pointer, it is still
much better that the stack trace terminates correctly, so we should
definitely get this fixed.

Comment by Sergey Vojtovich [ 2016-04-26 ]

It worked, thanks! I believe we should fix i386 my_context_spawn() the very same way.

Comment by Kristian Nielsen [ 2016-06-08 ]

Pushed to 5.5, 10.0, and 10.1.

Generated at Thu Feb 08 07:38:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.