Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32251

Crash in stack unwinding during pthread_exit() in centos7-bintar

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.4.31
    • N/A
    • Compiling
    • None

    Description

      On centos7-bintar in buildbot, the test case binlog_encryption.rpl_special_charset crashes during server exit inside the stack unwinding in pthread_exit():

      Thread 1 (Thread 0x7ff1fa7fc700 (LWP 72336)):
      #0  0x00007ff21f00caa1 in pthread_kill () from /lib64/libpthread.so.0
      #1  0x000055eb81142eae in handle_fatal_signal (sig=6) at /home/buildbot/knielsen/mariadb-10.4.32/sql/signal_handler.cc:372
      #2  <signal handler called>
      #3  0x00007ff21e436387 in raise () from /lib64/libc.so.6
      #4  0x00007ff21e437a78 in abort () from /lib64/libc.so.6
      #5  0x000055eb818d03de in _Unwind_SetGR ()
      #6  0x000055eb81874628 in __gxx_personality_v0 ()
      #7  0x00007ff21bc0f9f4 in ?? () from /lib64/libgcc_s.so.1
      #8  0x00007ff21bc0fd44 in _Unwind_ForcedUnwind () from /lib64/libgcc_s.so.1
      #9  0x00007ff21f00e362 in __pthread_unwind () from /lib64/libpthread.so.0
      #10 0x00007ff21f008ef7 in pthread_exit () from /lib64/libpthread.so.0
      #11 0x000055eb80e39d05 in os_thread_exit (detach=detach@entry=true) at /home/buildbot/knielsen/mariadb-10.4.32/storage/innobase/os/os0thread.cc:199
      #12 0x000055eb815ad871 in btr_defragment_thread () at /home/buildbot/knielsen/mariadb-10.4.32/storage/innobase/btr/btr0defragment.cc:801
      #13 0x00007ff21f007ea5 in start_thread () from /lib64/libpthread.so.0
      #14 0x00007ff21e4feb0d in clone () from /lib64/libc.so.6
      

      The problem is that the unwind code gets an invalid register number and asserts.

      The build is made with static linking of some system libraries:

      cmake . -DWITH_READLINE=1 -DBUILD_CONFIG=mysql_release -DCMAKE_C_FLAGS="-static-libgcc -static-libstdc++ -DGNUTLS_NO_SIGNAL=0" -DCMAKE_CXX_FLAGS="-static-libgcc -static-libstdc++ -DGNUTLS_NO_SIGNAL=0" -DWITH_SSL=bundled -DPLATFORM=linux-systemd && make -j10 package
      

      Removing the two flags -static-libgcc and -static-libstdc++ makes the problem go away.

      It seems problematic to link a few system libraries static like this. I suggest to remove those options.

      Exact commands to reproduce:

        ssh bb-amd64.mariadb.org
       
        docker run -it --user root quay.io/mariadb-foundation/bb-worker:centos7-bintar bash
        su - buildbot
        wget https://ci.mariadb.org/38760/mariadb-10.4.32.tar.gz
        tar xf mariadb-10.4.32.tar.gz 
        cd mariadb-10.4.32
        cmake . -DWITH_READLINE=1 -DBUILD_CONFIG=mysql_release -DCMAKE_C_FLAGS="-static-libgcc -static-libstdc++ -DGNUTLS_NO_SIGNAL=0" -DCMAKE_CXX_FLAGS="-static-libgcc -static-libstdc++ -DGNUTLS_NO_SIGNAL=0" -DWITH_SSL=bundled -DPLATFORM=linux-systemd && make -j10
        (cd mysql-test/ && perl mysql-test-run.pl --mysqld=--binlog-format=mixed binlog_encryption.rpl_special_charset)
      

      Attachments

        Issue Links

          Activity

            danblack Daniel Black added a comment -

            As it looks like we aren't doing anything particularly wrong, if its reproducible on the RHEL7 we can put in a support ticket for Red Hat to fix.

            danblack Daniel Black added a comment - As it looks like we aren't doing anything particularly wrong, if its reproducible on the RHEL7 we can put in a support ticket for Red Hat to fix.

            I think we are doing something wrong.

            Looking deeper at the stack trace (different binary):

              #4  0x00007f85a7f16a78 in abort () from /lib64/libc.so.6
              #5  0x0000555b00ebbb0e in _Unwind_SetGR ()
              #6  0x0000555b00e5fd58 in __gxx_personality_v0 ()
              #7  0x00007f85a69a19f4 in ?? () from /lib64/libgcc_s.so.1
              #8  0x00007f85a69a1d44 in _Unwind_ForcedUnwind () from /lib64/libgcc_s.so.1
              #9  0x00007f85a87c1362 in __pthread_unwind () from /lib64/libpthread.so.0
              #10 0x00007f85a87bbef7 in pthread_exit () from /lib64/libpthread.so.0
              #11 0x0000555b0049fd51 in handle_slave_sql (arg=arg@entry=0x555b02b6ee40) at /home/buildbot/knielsen/mariadb-10.4.32/sql/slave.cc:5759
             
            (gdb) info symbol 0x0000555b00e5fd58
            __gxx_personality_v0 + 216 in section .text of /home/buildbot/knielsen/mariadb-10.4.32/sql/mysqld
            (gdb) info symbol 0x0000555b00ebbb0e
            _Unwind_SetGR + 46 in section .text of /home/buildbot/knielsen/mariadb-10.4.32/sql/mysqld
             
            (gdb) info sharedlibrary
            From                To                  Syms Read   Shared Object Library
            ...
            0x00007f85a6994ad0  0x00007f85a69a4285  Yes (*)     /lib64/libgcc_s.so.1
             
            $ ls -l /lib64/libgcc*
            -rwxr-xr-x 1 root root 88720 Sep 30  2020 /lib64/libgcc_s-4.8.5-20150702.so.1
            lrwxrwxrwx 1 root root    28 Nov 13  2020 /lib64/libgcc_s.so.1 -> libgcc_s-4.8.5-20150702.so.1
            $ ls -l /usr/lib/gcc/x86_64-redhat-linux/4.8.2/libgcc*
            -rw-r--r-- 1 root root 3026222 Sep 30  2020 /usr/lib/gcc/x86_64-redhat-linux/4.8.2/libgcc.a
            -rw-r--r-- 1 root root   53552 Sep 30  2020 /usr/lib/gcc/x86_64-redhat-linux/4.8.2/libgcc_eh.a
            lrwxrwxrwx 1 root root      20 Oct 19  2022 /usr/lib/gcc/x86_64-redhat-linux/4.8.2/libgcc_s.so -> /lib64/libgcc_s.so.1
            

            So we see that libpthread is linked dynamically, it's calling into (dynamic)
            libgcc. But then this calls into __pthread_unwind and _Unwind_ForcedUnwind
            which are from the mariadbd binary, presumably from statically linked
            libgcc.

            And the static libgcc is version 4.8.2, while the dynamic libgcc is a
            different version 4.8.5.

            So we are mixing internal functions of two different versions of libgcc.
            That's certainly wrong, and probably the cause of the bug here.

            Let's just remove those static link options. We'll get rid of a lot of very
            annoying test failures in buildbot, and it can't be worse that shipping a
            known broken binary that mixes two incompatible halves of libgcc...

            knielsen Kristian Nielsen added a comment - I think we are doing something wrong. Looking deeper at the stack trace (different binary): #4 0x00007f85a7f16a78 in abort () from /lib64/libc.so.6 #5 0x0000555b00ebbb0e in _Unwind_SetGR () #6 0x0000555b00e5fd58 in __gxx_personality_v0 () #7 0x00007f85a69a19f4 in ?? () from /lib64/libgcc_s.so.1 #8 0x00007f85a69a1d44 in _Unwind_ForcedUnwind () from /lib64/libgcc_s.so.1 #9 0x00007f85a87c1362 in __pthread_unwind () from /lib64/libpthread.so.0 #10 0x00007f85a87bbef7 in pthread_exit () from /lib64/libpthread.so.0 #11 0x0000555b0049fd51 in handle_slave_sql (arg=arg@entry=0x555b02b6ee40) at /home/buildbot/knielsen/mariadb-10.4.32/sql/slave.cc:5759   (gdb) info symbol 0x0000555b00e5fd58 __gxx_personality_v0 + 216 in section .text of /home/buildbot/knielsen/mariadb-10.4.32/sql/mysqld (gdb) info symbol 0x0000555b00ebbb0e _Unwind_SetGR + 46 in section .text of /home/buildbot/knielsen/mariadb-10.4.32/sql/mysqld   (gdb) info sharedlibrary From To Syms Read Shared Object Library ... 0x00007f85a6994ad0 0x00007f85a69a4285 Yes (*) /lib64/libgcc_s.so.1   $ ls -l /lib64/libgcc* -rwxr-xr-x 1 root root 88720 Sep 30 2020 /lib64/libgcc_s-4.8.5-20150702.so.1 lrwxrwxrwx 1 root root 28 Nov 13 2020 /lib64/libgcc_s.so.1 -> libgcc_s-4.8.5-20150702.so.1 $ ls -l /usr/lib/gcc/x86_64-redhat-linux/4.8.2/libgcc* -rw-r--r-- 1 root root 3026222 Sep 30 2020 /usr/lib/gcc/x86_64-redhat-linux/4.8.2/libgcc.a -rw-r--r-- 1 root root 53552 Sep 30 2020 /usr/lib/gcc/x86_64-redhat-linux/4.8.2/libgcc_eh.a lrwxrwxrwx 1 root root 20 Oct 19 2022 /usr/lib/gcc/x86_64-redhat-linux/4.8.2/libgcc_s.so -> /lib64/libgcc_s.so.1 So we see that libpthread is linked dynamically, it's calling into (dynamic) libgcc. But then this calls into __pthread_unwind and _Unwind_ForcedUnwind which are from the mariadbd binary, presumably from statically linked libgcc. And the static libgcc is version 4.8.2, while the dynamic libgcc is a different version 4.8.5. So we are mixing internal functions of two different versions of libgcc. That's certainly wrong, and probably the cause of the bug here. Let's just remove those static link options. We'll get rid of a lot of very annoying test failures in buildbot, and it can't be worse that shipping a known broken binary that mixes two incompatible halves of libgcc...

            This appears to have been fixed, the centos7-bintar builder in buildbot is now no longer using the problematic -static-libgcc -static-libstdc++ options and the builds are looking good now.

            knielsen Kristian Nielsen added a comment - This appears to have been fixed, the centos7-bintar builder in buildbot is now no longer using the problematic -static-libgcc -static-libstdc++ options and the builds are looking good now.

            People

              knielsen Kristian Nielsen
              knielsen Kristian Nielsen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.