[MDEV-32251] Crash in stack unwinding during pthread_exit() in centos7-bintar Created: 2023-09-26  Updated: 2023-12-11  Resolved: 2023-12-11

Status: Closed
Project: MariaDB Server
Component/s: Compiling
Affects Version/s: 10.4.31
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Kristian Nielsen Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-25633 MariaDB crashes when compiled with li... In Review

 Description   

On centos7-bintar in buildbot, the test case binlog_encryption.rpl_special_charset crashes during server exit inside the stack unwinding in pthread_exit():

Thread 1 (Thread 0x7ff1fa7fc700 (LWP 72336)):
#0  0x00007ff21f00caa1 in pthread_kill () from /lib64/libpthread.so.0
#1  0x000055eb81142eae in handle_fatal_signal (sig=6) at /home/buildbot/knielsen/mariadb-10.4.32/sql/signal_handler.cc:372
#2  <signal handler called>
#3  0x00007ff21e436387 in raise () from /lib64/libc.so.6
#4  0x00007ff21e437a78 in abort () from /lib64/libc.so.6
#5  0x000055eb818d03de in _Unwind_SetGR ()
#6  0x000055eb81874628 in __gxx_personality_v0 ()
#7  0x00007ff21bc0f9f4 in ?? () from /lib64/libgcc_s.so.1
#8  0x00007ff21bc0fd44 in _Unwind_ForcedUnwind () from /lib64/libgcc_s.so.1
#9  0x00007ff21f00e362 in __pthread_unwind () from /lib64/libpthread.so.0
#10 0x00007ff21f008ef7 in pthread_exit () from /lib64/libpthread.so.0
#11 0x000055eb80e39d05 in os_thread_exit (detach=detach@entry=true) at /home/buildbot/knielsen/mariadb-10.4.32/storage/innobase/os/os0thread.cc:199
#12 0x000055eb815ad871 in btr_defragment_thread () at /home/buildbot/knielsen/mariadb-10.4.32/storage/innobase/btr/btr0defragment.cc:801
#13 0x00007ff21f007ea5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007ff21e4feb0d in clone () from /lib64/libc.so.6

The problem is that the unwind code gets an invalid register number and asserts.

The build is made with static linking of some system libraries:

cmake . -DWITH_READLINE=1 -DBUILD_CONFIG=mysql_release -DCMAKE_C_FLAGS="-static-libgcc -static-libstdc++ -DGNUTLS_NO_SIGNAL=0" -DCMAKE_CXX_FLAGS="-static-libgcc -static-libstdc++ -DGNUTLS_NO_SIGNAL=0" -DWITH_SSL=bundled -DPLATFORM=linux-systemd && make -j10 package

Removing the two flags -static-libgcc and -static-libstdc++ makes the problem go away.

It seems problematic to link a few system libraries static like this. I suggest to remove those options.

Exact commands to reproduce:

  ssh bb-amd64.mariadb.org
 
  docker run -it --user root quay.io/mariadb-foundation/bb-worker:centos7-bintar bash
  su - buildbot
  wget https://ci.mariadb.org/38760/mariadb-10.4.32.tar.gz
  tar xf mariadb-10.4.32.tar.gz 
  cd mariadb-10.4.32
  cmake . -DWITH_READLINE=1 -DBUILD_CONFIG=mysql_release -DCMAKE_C_FLAGS="-static-libgcc -static-libstdc++ -DGNUTLS_NO_SIGNAL=0" -DCMAKE_CXX_FLAGS="-static-libgcc -static-libstdc++ -DGNUTLS_NO_SIGNAL=0" -DWITH_SSL=bundled -DPLATFORM=linux-systemd && make -j10
  (cd mysql-test/ && perl mysql-test-run.pl --mysqld=--binlog-format=mixed binlog_encryption.rpl_special_charset)



 Comments   
Comment by Daniel Black [ 2023-09-27 ]

As it looks like we aren't doing anything particularly wrong, if its reproducible on the RHEL7 we can put in a support ticket for Red Hat to fix.

Comment by Kristian Nielsen [ 2023-10-28 ]

I think we are doing something wrong.

Looking deeper at the stack trace (different binary):

  #4  0x00007f85a7f16a78 in abort () from /lib64/libc.so.6
  #5  0x0000555b00ebbb0e in _Unwind_SetGR ()
  #6  0x0000555b00e5fd58 in __gxx_personality_v0 ()
  #7  0x00007f85a69a19f4 in ?? () from /lib64/libgcc_s.so.1
  #8  0x00007f85a69a1d44 in _Unwind_ForcedUnwind () from /lib64/libgcc_s.so.1
  #9  0x00007f85a87c1362 in __pthread_unwind () from /lib64/libpthread.so.0
  #10 0x00007f85a87bbef7 in pthread_exit () from /lib64/libpthread.so.0
  #11 0x0000555b0049fd51 in handle_slave_sql (arg=arg@entry=0x555b02b6ee40) at /home/buildbot/knielsen/mariadb-10.4.32/sql/slave.cc:5759
 
(gdb) info symbol 0x0000555b00e5fd58
__gxx_personality_v0 + 216 in section .text of /home/buildbot/knielsen/mariadb-10.4.32/sql/mysqld
(gdb) info symbol 0x0000555b00ebbb0e
_Unwind_SetGR + 46 in section .text of /home/buildbot/knielsen/mariadb-10.4.32/sql/mysqld
 
(gdb) info sharedlibrary
From                To                  Syms Read   Shared Object Library
...
0x00007f85a6994ad0  0x00007f85a69a4285  Yes (*)     /lib64/libgcc_s.so.1
 
$ ls -l /lib64/libgcc*
-rwxr-xr-x 1 root root 88720 Sep 30  2020 /lib64/libgcc_s-4.8.5-20150702.so.1
lrwxrwxrwx 1 root root    28 Nov 13  2020 /lib64/libgcc_s.so.1 -> libgcc_s-4.8.5-20150702.so.1
$ ls -l /usr/lib/gcc/x86_64-redhat-linux/4.8.2/libgcc*
-rw-r--r-- 1 root root 3026222 Sep 30  2020 /usr/lib/gcc/x86_64-redhat-linux/4.8.2/libgcc.a
-rw-r--r-- 1 root root   53552 Sep 30  2020 /usr/lib/gcc/x86_64-redhat-linux/4.8.2/libgcc_eh.a
lrwxrwxrwx 1 root root      20 Oct 19  2022 /usr/lib/gcc/x86_64-redhat-linux/4.8.2/libgcc_s.so -> /lib64/libgcc_s.so.1

So we see that libpthread is linked dynamically, it's calling into (dynamic)
libgcc. But then this calls into __pthread_unwind and _Unwind_ForcedUnwind
which are from the mariadbd binary, presumably from statically linked
libgcc.

And the static libgcc is version 4.8.2, while the dynamic libgcc is a
different version 4.8.5.

So we are mixing internal functions of two different versions of libgcc.
That's certainly wrong, and probably the cause of the bug here.

Let's just remove those static link options. We'll get rid of a lot of very
annoying test failures in buildbot, and it can't be worse that shipping a
known broken binary that mixes two incompatible halves of libgcc...

Comment by Kristian Nielsen [ 2023-12-11 ]

This appears to have been fixed, the centos7-bintar builder in buildbot is now no longer using the problematic -static-libgcc -static-libstdc++ options and the builds are looking good now.

Generated at Thu Feb 08 10:29:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.