Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30936

clang 15.0.7 -fsanitize=memory fails massively

Details

    Description

      It appears that since fairly recently, MSAN is marking memory uninitialized in destructors. Here is a simple example of something that needs to be adjusted in response to this:

      diff --git a/storage/innobase/que/que0que.cc b/storage/innobase/que/que0que.cc
      index 5f5f527e06b..d910ee2a881 100644
      --- a/storage/innobase/que/que0que.cc
      +++ b/storage/innobase/que/que0que.cc
      @@ -236,9 +236,9 @@ que_graph_free_stat_list(
       	que_node_t*	node)	/*!< in: first query graph node in the list */
       {
       	while (node) {
      +		que_node_t* next = que_node_get_next(node);
       		que_graph_free_recursive(node);
      -
      -		node = que_node_get_next(node);
      +		node = next;
       	}
       }
       
      

      All such code needs to be fixed before it is meaningful to upgrade to a newer MSAN environment on our CI systems.

      On my local system, for some reason, I am not getting correct diagnostics, but something like this:

      ==3567352==WARNING: MemorySanitizer: use-of-uninitialized-value
      /usr/bin/llvm-symbolizer-15: symbol lookup error: /home/marko/libmsan-15/libgmp.so.10: undefined symbol: __msan_va_arg_overflow_size_tls
      ==3567352==WARNING: Can't read from symbolizer at fd 84
      ==3567352==WARNING: Can't write to symbolizer at fd 87
      /usr/bin/llvm-symbolizer-15: symbol lookup error: /home/marko/libmsan-15/libgmp.so.10: undefined symbol: __msan_va_arg_overflow_size_tls
      ==3567352==WARNING: Can't read from symbolizer at fd 84
      ==3567352==WARNING: Can't write to symbolizer at fd 87
      /usr/bin/llvm-symbolizer-15: symbol lookup error: /home/marko/libmsan-15/libgmp.so.10: undefined symbol: __msan_va_arg_overflow_size_tls
      ==3567352==WARNING: Can't read from symbolizer at fd 84
      ==3567352==WARNING: Can't write to symbolizer at fd 87
      /usr/bin/llvm-symbolizer-15: symbol lookup error: /home/marko/libmsan-15/libgmp.so.10: undefined symbol: __msan_va_arg_overflow_size_tls
      ==3567352==WARNING: Can't read from symbolizer at fd 84
      ==3567352==WARNING: Can't write to symbolizer at fd 87
      ==3567352==WARNING: Failed to use and restart external symbolizer!
          #0 0x5628a93c7759  (/dev/shm/10.6m/sql/mariadbd+0x3188759) (BuildId: bc73951134da47d9)
          #1 0x5628a9736471  (/dev/shm/10.6m/sql/mariadbd+0x34f7471) (BuildId: bc73951134da47d9)
          #2 0x5628a9966d5e  (/dev/shm/10.6m/sql/mariadbd+0x3727d5e) (BuildId: bc73951134da47d9)
          #3 0x5628a9967236  (/dev/shm/10.6m/sql/mariadbd+0x3728236) (BuildId: bc73951134da47d9)
          #4 0x5628a8555f2d  (/dev/shm/10.6m/sql/mariadbd+0x2316f2d) (BuildId: bc73951134da47d9)
          #5 0x5628a852378a  (/dev/shm/10.6m/sql/mariadbd+0x22e478a) (BuildId: bc73951134da47d9)
          #6 0x5628a87c2ce0  (/dev/shm/10.6m/sql/mariadbd+0x2583ce0) (BuildId: bc73951134da47d9)
          #7 0x5628a81208d4  (/dev/shm/10.6m/sql/mariadbd+0x1ee18d4) (BuildId: bc73951134da47d9)
          #8 0x5628a8109b71  (/dev/shm/10.6m/sql/mariadbd+0x1ecab71) (BuildId: bc73951134da47d9)
          #9 0x5628a8100316  (/dev/shm/10.6m/sql/mariadbd+0x1ec1316) (BuildId: bc73951134da47d9)
          #10 0x5628a810b33a  (/dev/shm/10.6m/sql/mariadbd+0x1ecc33a) (BuildId: bc73951134da47d9)
          #11 0x5628a87a294c  (/dev/shm/10.6m/sql/mariadbd+0x256394c) (BuildId: bc73951134da47d9)
          #12 0x5628a87a1f37  (/dev/shm/10.6m/sql/mariadbd+0x2562f37) (BuildId: bc73951134da47d9)
          #13 0x5628a8ff6ae7  (/dev/shm/10.6m/sql/mariadbd+0x2db7ae7) (BuildId: bc73951134da47d9)
          #14 0x7f320caa7fd3  (/lib/x86_64-linux-gnu/libc.so.6+0x88fd3) (BuildId: 4aff0f9d796e67d413e44f332edace9ac0ca2401)
          #15 0x7f320cb278cf  (/lib/x86_64-linux-gnu/libc.so.6+0x1088cf) (BuildId: 4aff0f9d796e67d413e44f332edace9ac0ca2401)
       
        Memory was marked as uninitialized
          #0 0x5628a7216ded  (/dev/shm/10.6m/sql/mariadbd+0xfd7ded) (BuildId: bc73951134da47d9)
          #1 0x5628a93c758a  (/dev/shm/10.6m/sql/mariadbd+0x318858a) (BuildId: bc73951134da47d9)
       
      SUMMARY: MemorySanitizer: use-of-uninitialized-value (/dev/shm/10.6m/sql/mariadbd+0x3188759) (BuildId: bc73951134da47d9) 
      Exiting
      

      The thread local symbol in question is defined in the global BSS of mariadbd. Luckily, ./mtr --rr works, and it suffices to me to set a breakpoint in __msan_warning_with_origin_noreturn to diagnose the failures, with proper stack traces in rr replay.

      Attachments

        Issue Links

          Activity

            Here is a reason for some remaining failures:

            10.6 216d99bb395c4fda43b4e3583672ef925103fae5

            #1  0x000056129ea96227 in __msan::PoisonMemory(void const*, unsigned long, __sanitizer::StackTrace*) ()
            #2  0x000056129ea43fd6 in __sanitizer_dtor_callback ()
            #3  0x000056129f407ba1 in Json_writer_struct::~Json_writer_struct (this=0x7f2a766d20c0) at /mariadb/10.6/sql/my_json_writer.h:402
            #4  Json_writer_object::~Json_writer_object (this=0x7f2a766d20c0) at /mariadb/10.6/sql/my_json_writer.h:460
            #5  sp_lex_keeper::reset_lex_and_exec_core (this=this@entry=0x7100000140d0, thd=thd@entry=0x72b00004d018, nextp=nextp@entry=0x7f2a766d225c, open_tables=false, instr=instr@entry=0x707000007ab0)
                at /mariadb/10.6/sql/sp_head.cc:3586
            #6  0x000056129f4153c3 in sp_lex_keeper::cursor_reset_lex_and_exec_core (this=0x7100000140d0, thd=0x72b00004d018, nextp=0x7f2a766d225c, open_tables=false, instr=0x707000007ab0)
                at /mariadb/10.6/sql/sp_head.cc:3601
            #7  sp_instr_copen::execute (this=0x707000007ab0, thd=0x72b00004d018, nextp=0x7f2a766d225c) at /mariadb/10.6/sql/sp_head.cc:4511
            #8  0x000056129f3e7ddf in sp_head::execute (this=this@entry=0x71e000015030, thd=thd@entry=0x72b00004d018, merge_da_on_success=true) at /mariadb/10.6/sql/sp_head.cc:1438
            #9  0x000056129f3f0c9f in sp_head::execute_procedure (this=0x71e000015030, thd=0x72b00004d018, args=0x72b0000522a8) at /mariadb/10.6/sql/sp_head.cc:2452
            #10 0x000056129f946b14 in do_execute_sp (thd=0x72b00004d018, sp=sp@entry=0x71e000015030) at /mariadb/10.6/sql/sql_parse.cc:3026
            #11 0x000056129f945c84 in Sql_cmd_call::execute (this=<optimized out>, thd=0x7f2a7668d000) at /mariadb/10.6/sql/sql_parse.cc:3272
            #12 0x000056129f94c7b5 in mysql_execute_command (thd=thd@entry=0x72b00004d018, is_called_from_prepared_stmt=false) at /mariadb/10.6/sql/sql_parse.cc:6002
            #13 0x000056129f935a52 in mysql_parse (thd=thd@entry=0x72b00004d018, rawbuf=0x70a00001b5b0 "call cur1()", length=11, parser_state=parser_state@entry=0x7f2a766d42a0) at /mariadb/10.6/sql/sql_parse.cc:8021
            

            This would cause an error in free_root() for comparing root->used to nullptr. I am not familiar enough with the code to say how this should be best. A possible work-around might be as follows, but it feels like an overkill:

            diff --git a/mysys/my_alloc.c b/mysys/my_alloc.c
            index aa0182c755e..b9071ad7eee 100644
            --- a/mysys/my_alloc.c
            +++ b/mysys/my_alloc.c
            @@ -415,14 +415,26 @@ void free_root(MEM_ROOT *root, myf MyFlags)
               if (!(MyFlags & MY_KEEP_PREALLOC))
                 root->pre_alloc=0;
             
            +#if __has_feature(memory_sanitizer)
            +  /* Work around MSAN_OPTIONS=poison_in_dtor=1 */
            +  MEM_MAKE_DEFINED(&root->used, sizeof root->used);
            +  MEM_MAKE_DEFINED(&root->free, sizeof root->free);
            +#endif
            +
               for (next=root->used; next ;)
               {
            +#if __has_feature(memory_sanitizer)
            +    MEM_MAKE_DEFINED(&next->next, sizeof next->next);
            +#endif
                 old=next; next= next->next ;
                 if (old != root->pre_alloc)
                   my_free(old);
               }
               for (next=root->free ; next ;)
               {
            +#if __has_feature(memory_sanitizer)
            +    MEM_MAKE_DEFINED(&next->next, sizeof next->next);
            +#endif
                 old=next; next= next->next;
                 if (old != root->pre_alloc)
                   my_free(old);
            

            Another work-around might be to explicitly set MSAN_OPTIONS=poison_in_dtor=0 to revert the change of the default value.

            marko Marko Mäkelä added a comment - Here is a reason for some remaining failures: 10.6 216d99bb395c4fda43b4e3583672ef925103fae5 #1 0x000056129ea96227 in __msan::PoisonMemory(void const*, unsigned long, __sanitizer::StackTrace*) () #2 0x000056129ea43fd6 in __sanitizer_dtor_callback () #3 0x000056129f407ba1 in Json_writer_struct::~Json_writer_struct (this=0x7f2a766d20c0) at /mariadb/10.6/sql/my_json_writer.h:402 #4 Json_writer_object::~Json_writer_object (this=0x7f2a766d20c0) at /mariadb/10.6/sql/my_json_writer.h:460 #5 sp_lex_keeper::reset_lex_and_exec_core (this=this@entry=0x7100000140d0, thd=thd@entry=0x72b00004d018, nextp=nextp@entry=0x7f2a766d225c, open_tables=false, instr=instr@entry=0x707000007ab0) at /mariadb/10.6/sql/sp_head.cc:3586 #6 0x000056129f4153c3 in sp_lex_keeper::cursor_reset_lex_and_exec_core (this=0x7100000140d0, thd=0x72b00004d018, nextp=0x7f2a766d225c, open_tables=false, instr=0x707000007ab0) at /mariadb/10.6/sql/sp_head.cc:3601 #7 sp_instr_copen::execute (this=0x707000007ab0, thd=0x72b00004d018, nextp=0x7f2a766d225c) at /mariadb/10.6/sql/sp_head.cc:4511 #8 0x000056129f3e7ddf in sp_head::execute (this=this@entry=0x71e000015030, thd=thd@entry=0x72b00004d018, merge_da_on_success=true) at /mariadb/10.6/sql/sp_head.cc:1438 #9 0x000056129f3f0c9f in sp_head::execute_procedure (this=0x71e000015030, thd=0x72b00004d018, args=0x72b0000522a8) at /mariadb/10.6/sql/sp_head.cc:2452 #10 0x000056129f946b14 in do_execute_sp (thd=0x72b00004d018, sp=sp@entry=0x71e000015030) at /mariadb/10.6/sql/sql_parse.cc:3026 #11 0x000056129f945c84 in Sql_cmd_call::execute (this=<optimized out>, thd=0x7f2a7668d000) at /mariadb/10.6/sql/sql_parse.cc:3272 #12 0x000056129f94c7b5 in mysql_execute_command (thd=thd@entry=0x72b00004d018, is_called_from_prepared_stmt=false) at /mariadb/10.6/sql/sql_parse.cc:6002 #13 0x000056129f935a52 in mysql_parse (thd=thd@entry=0x72b00004d018, rawbuf=0x70a00001b5b0 "call cur1()", length=11, parser_state=parser_state@entry=0x7f2a766d42a0) at /mariadb/10.6/sql/sql_parse.cc:8021 This would cause an error in free_root() for comparing root->used to nullptr . I am not familiar enough with the code to say how this should be best. A possible work-around might be as follows, but it feels like an overkill: diff --git a/mysys/my_alloc.c b/mysys/my_alloc.c index aa0182c755e..b9071ad7eee 100644 --- a/mysys/my_alloc.c +++ b/mysys/my_alloc.c @@ -415,14 +415,26 @@ void free_root(MEM_ROOT *root, myf MyFlags) if (!(MyFlags & MY_KEEP_PREALLOC)) root->pre_alloc=0; +#if __has_feature(memory_sanitizer) + /* Work around MSAN_OPTIONS=poison_in_dtor=1 */ + MEM_MAKE_DEFINED(&root->used, sizeof root->used); + MEM_MAKE_DEFINED(&root->free, sizeof root->free); +#endif + for (next=root->used; next ;) { +#if __has_feature(memory_sanitizer) + MEM_MAKE_DEFINED(&next->next, sizeof next->next); +#endif old=next; next= next->next ; if (old != root->pre_alloc) my_free(old); } for (next=root->free ; next ;) { +#if __has_feature(memory_sanitizer) + MEM_MAKE_DEFINED(&next->next, sizeof next->next); +#endif old=next; next= next->next; if (old != root->pre_alloc) my_free(old); Another work-around might be to explicitly set MSAN_OPTIONS=poison_in_dtor=0 to revert the change of the default value .

            The mystery llvm-symbolizer failures actually started with llvm-symbolizer-14, which was the first one to link dynamically to libgmp. Because we are passing an MSAN-instrumented libgmp to all binaries, also the newer llvm-symbolizer will fail to load, because we are not using a MSAN-instrumented llvm-symbolizer. In other words, the following errors indicate that the loading of dynamic libraries while starting up llvm-symbolizer failed:

            /usr/bin/llvm-symbolizer-15: symbol lookup error: /home/marko/libmsan-15/libgmp.so.10: undefined symbol: __msan_va_arg_overflow_size_tls
            ==3567352==WARNING: Can't read from symbolizer at fd 84
            ==3567352==WARNING: Can't write to symbolizer at fd 87
            

            The subsequent warnings are for something related to the failed popen() or whatever the instrumented code used to invoke llvm-symbolizer.

            The fix to this is to write a simple wrapper script that discards the environment variable LD_LIBRARY_PATH:

            #!/bin/sh
            unset LD_LIBRARY_PATH
            exec llvm-symbolizer-15 "$@"
            

            This script can be pointed to by the environment variable MSAN_SYMBOLIZER_PATH, to get nice stack traces for MSAN failures. I will update also MDEV-20377 and MDEV-26758 with this information.

            marko Marko Mäkelä added a comment - The mystery llvm-symbolizer failures actually started with llvm-symbolizer-14 , which was the first one to link dynamically to libgmp . Because we are passing an MSAN-instrumented libgmp to all binaries, also the newer llvm-symbolizer will fail to load, because we are not using a MSAN-instrumented llvm-symbolizer . In other words, the following errors indicate that the loading of dynamic libraries while starting up llvm-symbolizer failed: /usr/bin/llvm-symbolizer-15: symbol lookup error: /home/marko/libmsan-15/libgmp.so.10: undefined symbol: __msan_va_arg_overflow_size_tls ==3567352==WARNING: Can't read from symbolizer at fd 84 ==3567352==WARNING: Can't write to symbolizer at fd 87 The subsequent warnings are for something related to the failed popen() or whatever the instrumented code used to invoke llvm-symbolizer . The fix to this is to write a simple wrapper script that discards the environment variable LD_LIBRARY_PATH : #!/bin/sh unset LD_LIBRARY_PATH exec llvm-symbolizer-15 "$@" This script can be pointed to by the environment variable MSAN_SYMBOLIZER_PATH , to get nice stack traces for MSAN failures. I will update also MDEV-20377 and MDEV-26758 with this information.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.