Since 2.33 GLIBC release, stat is moved to a dynamic library (it was in libc_noshared.a before, transferring calls to __xstat). So every release featuring this version will have msan broken. In particular, two last ubuntu's and arch linux are affected. Debian sid is on 2.32 yet.
Nikita Malyavin
added a comment - - edited Since 2.33 GLIBC release, stat is moved to a dynamic library (it was in libc_noshared.a before, transferring calls to __xstat). So every release featuring this version will have msan broken. In particular, two last ubuntu's and arch linux are affected. Debian sid is on 2.32 yet.
I have created the review request for llvm: https://reviews.llvm.org/D111984
While the LLVM review is not passed, and we have no immediate need of this hotfix, since no build machines run Ubuntu hirsute or impish, I will not patch the code base.
However, here is the patch for a local use if someone will experience this problem on a local machine. Once new clang is released, the problem should be gone.
For now I will close the issue as "Won't Fix": the original problem does not reproduce, and the discovered problem is a compiler issue that is to be fixed soon.
Nikita Malyavin
added a comment - While the LLVM review is not passed, and we have no immediate need of this hotfix, since no build machines run Ubuntu hirsute or impish, I will not patch the code base.
However, here is the patch for a local use if someone will experience this problem on a local machine. Once new clang is released, the problem should be gone.
MDEV-24841_Ubuntu_hirsute_impish_and_arch_linux_MSan_build.patch
For now I will close the issue as "Won't Fix": the original problem does not reproduce, and the discovered problem is a compiler issue that is to be fixed soon.
It turns out that nikitamalyavin’s fix to clang was incomplete and did not cover the variants of stat() functions where the file offset is explicitly 64 bits.
Marko Mäkelä
added a comment - It turns out that nikitamalyavin ’s fix to clang was incomplete and did not cover the variants of stat() functions where the file offset is explicitly 64 bits.
To be able to compile the code with clang version 13 or 14 as noted in MDEV-20377, I added work-arounds to declare the memory returned by a successful stat(), lstat(), or fstat() as initialized.
Even with these changes, most 10.5 tests failed with SIGSEGV due to something related to pthread_exit():
10.5 258c34f17cd5a06e29888498064bb46d019dc58f
#0 0x7f1a1e81fbe7 in unw_get_proc_info (/usr/lib/x86_64-linux-gnu/libunwind.so.1+0x1be7) (BuildId: 1fbb529fd34f80574daa43bf41c44876b1dfae42)
#1 0x7f1a1e8238cb in _Unwind_GetLanguageSpecificData (/usr/lib/x86_64-linux-gnu/libunwind.so.1+0x58cb) (BuildId: 1fbb529fd34f80574daa43bf41c44876b1dfae42)
#2 0x7f1a1e80dfcc in __gxx_personality_v0 (/usr/lib/x86_64-linux-gnu/libc++abi.so.1+0x27fcc) (BuildId: 4bd847b1f8d3dcd40106e2f5dd846f77632085e3)
#4 0x7f1a1e7d11bf in _Unwind_ForcedUnwind (/lib/x86_64-linux-gnu/libgcc_s.so.1+0x171bf) (BuildId: 57a2071bc064a943a1095dda6dd4963ea031782b)
#5 0x7f1a1ea35d1f in __pthread_unwind nptl/unwind.c:131:3
#6 0x7f1a1ea2e04b in __do_cancel nptl/pthreadP.h:306:3
#7 0x7f1a1ea2e04b in pthread_exit nptl/pthread_exit.c:28:3
#8 0x55719563c568 in os_thread_exit() /mariadb/10.5m/storage/innobase/os/os0thread.cc:103:2
#9 0x557195b2d19e in trx_rollback_all_recovered /mariadb/10.5m/storage/innobase/trx/trx0roll.cc:848:2
#10 0x7f1a1ea2cd7f in start_thread nptl/pthread_create.c:481:8
#11 0x7f1a1e6db76e in __clone misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
MemorySanitizer can not provide additional info.
SUMMARY: MemorySanitizer: SEGV (/usr/lib/x86_64-linux-gnu/libunwind.so.1+0x1be7) (BuildId: 1fbb529fd34f80574daa43bf41c44876b1dfae42) in unw_get_proc_info
On 10.6, this seriously affects replication tests, but not so much --suite=innodb.
Marko Mäkelä
added a comment - To be able to compile the code with clang version 13 or 14 as noted in MDEV-20377 , I added work-arounds to declare the memory returned by a successful stat() , lstat() , or fstat() as initialized.
Even with these changes, most 10.5 tests failed with SIGSEGV due to something related to pthread_exit() :
10.5 258c34f17cd5a06e29888498064bb46d019dc58f
#0 0x7f1a1e81fbe7 in unw_get_proc_info (/usr/lib/x86_64-linux-gnu/libunwind.so.1+0x1be7) (BuildId: 1fbb529fd34f80574daa43bf41c44876b1dfae42)
#1 0x7f1a1e8238cb in _Unwind_GetLanguageSpecificData (/usr/lib/x86_64-linux-gnu/libunwind.so.1+0x58cb) (BuildId: 1fbb529fd34f80574daa43bf41c44876b1dfae42)
#2 0x7f1a1e80dfcc in __gxx_personality_v0 (/usr/lib/x86_64-linux-gnu/libc++abi.so.1+0x27fcc) (BuildId: 4bd847b1f8d3dcd40106e2f5dd846f77632085e3)
#3 0x7f1a1e7d0ac5 (/lib/x86_64-linux-gnu/libgcc_s.so.1+0x16ac5) (BuildId: 57a2071bc064a943a1095dda6dd4963ea031782b)
#4 0x7f1a1e7d11bf in _Unwind_ForcedUnwind (/lib/x86_64-linux-gnu/libgcc_s.so.1+0x171bf) (BuildId: 57a2071bc064a943a1095dda6dd4963ea031782b)
#5 0x7f1a1ea35d1f in __pthread_unwind nptl/unwind.c:131:3
#6 0x7f1a1ea2e04b in __do_cancel nptl/pthreadP.h:306:3
#7 0x7f1a1ea2e04b in pthread_exit nptl/pthread_exit.c:28:3
#8 0x55719563c568 in os_thread_exit() /mariadb/10.5m/storage/innobase/os/os0thread.cc:103:2
#9 0x557195b2d19e in trx_rollback_all_recovered /mariadb/10.5m/storage/innobase/trx/trx0roll.cc:848:2
#10 0x7f1a1ea2cd7f in start_thread nptl/pthread_create.c:481:8
#11 0x7f1a1e6db76e in __clone misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
MemorySanitizer can not provide additional info.
SUMMARY: MemorySanitizer: SEGV (/usr/lib/x86_64-linux-gnu/libunwind.so.1+0x1be7) (BuildId: 1fbb529fd34f80574daa43bf41c44876b1dfae42) in unw_get_proc_info
On 10.6, this seriously affects replication tests, but not so much --suite=innodb .
Since 2.33 GLIBC release, stat is moved to a dynamic library (it was in libc_noshared.a before, transferring calls to __xstat). So every release featuring this version will have msan broken. In particular, two last ubuntu's and arch linux are affected. Debian sid is on 2.32 yet.
I have created the review request for llvm: https://reviews.llvm.org/D111984