Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34625

MariaDB server crashes in 10.11.8 due to using uninitialized member variables

Details

    Description

      Issue:

      While building MariaDB 10.11.8 with `-O3` and some customized build flags, we noticed that mysqld crashes during MTR and could not perform any table operations. The issue caused by commit https://github.com/MariaDB/server/commit/a8a75ba2d so all branches have the commit are affected.

      When reproducing the issue, it seems the engine could crash in different scenarios and different code lines with different invalid pointer values.

      Example:

      (gdb) bt
      #0  sp_update_stmt_used_routines (belong_to_view=0x55870f6debb0 <end_of_list>, src=<optimized out>, prelocking_ctx=0x7fdceb20bb40, thd=0x7fdceb207958) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sp.cc:2766
      #1  DML_prelocking_strategy::handle_view (this=<optimized out>, thd=0x7fdceb207958, prelocking_ctx=0x7fdceb20bb40, table_list=0x7fdceb22eda0, need_prelocking=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:5220
      #2  0x000055870e8caf6f in open_and_process_table (ot_ctx=0x7fdd10f22900, has_prelocking_list=<optimized out>, prelocking_strategy=<optimized out>, flags=<optimized out>, counter=<optimized out>, tables=0x7fdceb22eda0, thd=0x7fdceb207958)
          at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:4239
      #3  open_tables(THD*, DDL_options_st const&, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) [clone .constprop.0] (thd=thd@entry=0x7fdceb207958, options=..., start=start@entry=0x7fdd10f22988, counter=counter@entry=0x7fdd10f2299c, flags=flags@entry=0, 
          prelocking_strategy=0x7fdd10f22a20) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:4634
      #4  0x000055870e8cd4d7 in open_and_lock_tables(THD*, DDL_options_st const&, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) [clone .constprop.0] (thd=0x7fdceb207958, options=..., tables=<optimized out>, derived=true, flags=0, prelocking_strategy=<optimized out>)
          at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:5607
      #5  0x000055870e0e0a95 in open_and_lock_tables (flags=<optimized out>, derived=<optimized out>, tables=<optimized out>, thd=<optimized out>, thd=<optimized out>, tables=<optimized out>, derived=<optimized out>, flags=<optimized out>)
          at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.h:514
      #6  execute_sqlcom_select (thd=thd@entry=0x7fdceb207958, all_tables=all_tables@entry=0x7fdceb22eda0) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:6341
      #7  0x000055870e0e90a1 in mysql_execute_command (thd=thd@entry=0x7fdceb207958, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:4023
      #8  0x000055870e0ea125 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x7fdceb207958) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:8249
      #9  mysql_parse (thd=0x7fdceb207958, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:8171
      #10 0x000055870e0eb3da in dispatch_command (command=COM_QUERY, thd=0x7fdceb207958, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_class.h:244
      #11 0x000055870e0d7503 in do_command (thd=0x7fdceb207958, blocking=true) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:1415
      l
      #12 0x000055870e24086d in do_handle_one_connection (connect=<optimized out>, connect@entry=0x7fdd10a131b8, put_in_cache=put_in_cache@entry=true) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_connect.cc:1458
      #13 0x000055870e240b7b in handle_one_connection (arg=arg@entry=0x7fdd10a131b8) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_connect.cc:1360
      #14 0x000055870e593bcd in pfs_spawn_thread (arg=0x7fdd10bc2718) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/storage/perfschema/pfs.cc:2201
      #15 0x00007fdd1132544b in start_thread () from /lib64/libpthread.so.0
      #16 0x00007fdd1106052f in clone () from /lib64/libc.so.6
      (gdb) l
      2761	void sp_update_stmt_used_routines(THD *thd, Query_tables_list *prelocking_ctx,
      2762	                                  SQL_I_List<Sroutine_hash_entry> *src,
      2763	                                  TABLE_LIST *belong_to_view)
      2764	{
      2765	  for (Sroutine_hash_entry *rt= src->first; rt; rt= rt->next)
      2766	    (void)sp_add_used_routine(prelocking_ctx, thd->active_stmt_arena_to_use(),
      2767	                              &rt->mdl_request.key, rt->m_handler,
      2768	                              belong_to_view);
      2769	}
      2770	
      (gdb) p rt
      $1 = (Sroutine_hash_entry *) 0x8
      (gdb) q
      

      Root cause:

      After further investigating, I figured out the root cause is "undefined behavior" of the using uninitialized variables. That's why we see it could crash at different code in different versions and with random invalid pointer values.

      Detailed analysis:
      Following code introduced in commit https://github.com/MariaDB/server/commit/a8a75ba2d mainly does three things:

      1. It allocates memory from mem_root using the calloc function implemented in THD class.
      2. In the same calloc , it sets all allocated memory to 0.
      3. In the placement new it initializes the TABLE_LIST at the memory allocated in step 1.

        void *memregion= thd->calloc(sizeof(TABLE_LIST));
        TABLE_LIST *ptr= new (memregion) TABLE_LIST(thd, db, fqtn, alias_str,
                                                    has_alias_ptr, table, lock_type,
                                                    mdl_type, table_options,
                                                    info_schema, this,
                                                    index_hints_arg, option);
      

      There's a critical issue in above code because it depends on step 2 to initialize all the member variables of TABLE_LIST that are not explicitly set in the constructor function. There's no guarantee that the memset/bfill in step 2 will always happen as compiler could optimize it out. Using the uninitialized value can lead to unpredictable issues which are hard to debug.

      It can be proved by this https://gcc.godbolt.org/z/5n87z1raG test which I wrote to test the assembly code of this behavior.
      The code in MariaDB can be abstracted to following but with many layers wrapped so the optimization is slightly different. ( and in default MariaDB build, it seems it's not optimized out so there's no issue, but there are chances to be optimized out with some customized flags which are happening in our build )
      To summarize, in x86, memset of following code will be optimized out with both -O2 and -O3 in GCC 13, only kept the memset in very old GCC 4.9.

      struct S {
        int i;     // uninitialized in consturctor
        S() {};
      };
       
      int bar() {
        void *buf = malloc(sizeof(S));
        memset(buf, 0, sizeof(S));       // optimized out
        S* s = new(buf) S;
        return s->i;
      }
      

      GCC13 -O3

      bar():
              sub     rsp, 8
              mov     edi, 4
              call    malloc
              mov     eax, DWORD PTR [rax]
              add     rsp, 8
              ret
      

      GCC 4.9 -O3

      bar():
              sub     rsp, 8
              mov     edi, 4
              call    malloc
              mov     DWORD PTR [rax], 0
              xor     eax, eax
              add     rsp, 8
              ret
      

      Fix:

      Including protective code in steps 1 or 2 is not effective and typically doesn't ensure anything. The proper solution is to ensure that the constructor function initializes the variables correctly. I found a reset() function in the class that performs the memset/bfill(0) operation.

      After the fix the crash is gone and I'm preparing a PR to use alloc to allocate the memory but inside the constructor do the reset first.

      I'm now preparing a Pull Request for it.

       

      Attachments

        Issue Links

          Activity

            wenhug Hugo Wen added a comment -

            This is indeed confusing and took me time to figure out the root cause.

            Submitted PR to fix it https://github.com/MariaDB/server/pull/3408 .

            wenhug Hugo Wen added a comment - This is indeed confusing and took me time to figure out the root cause. Submitted PR to fix it https://github.com/MariaDB/server/pull/3408 .
            Gosselin Dave Gosselin added a comment -

            Hi wenhug thank you for taking the time to track this down, we are always grateful for community involvement and PR submissions. I left a few of questions on your PR, please take a look, nothing critical but I'm curious. According to the Compiler Explorer example, g++ 4.9.0 leaves the memset in while the later version optimizes it away; IMHO that seems like a regression in the compiler.

            Gosselin Dave Gosselin added a comment - Hi wenhug thank you for taking the time to track this down, we are always grateful for community involvement and PR submissions. I left a few of questions on your PR, please take a look, nothing critical but I'm curious. According to the Compiler Explorer example, g++ 4.9.0 leaves the memset in while the later version optimizes it away; IMHO that seems like a regression in the compiler.

            Years ago, I fixed a similar bug MDEV-18749. The relevant compiler optimization was introduced in GCC 6; it is called -flifetime-dse. Writes that precede a call to a constructor may be optimized away, because the constructor is supposed to initialize all the fields of the object.

            For what it is worth, I think that it would be a good practice to use C++11 initialization expressions for all data members and let the compiler create or amend constructors. If all fields are initialized to zero, the compiler may emit SIMD instructions such as the x86-64 movaps or movups for initializing many fields at a time.

            marko Marko Mäkelä added a comment - Years ago, I fixed a similar bug MDEV-18749 . The relevant compiler optimization was introduced in GCC 6; it is called -flifetime-dse . Writes that precede a call to a constructor may be optimized away, because the constructor is supposed to initialize all the fields of the object. For what it is worth, I think that it would be a good practice to use C++11 initialization expressions for all data members and let the compiler create or amend constructors. If all fields are initialized to zero, the compiler may emit SIMD instructions such as the x86-64 movaps or movups for initializing many fields at a time.

            Back in 2019, I dug up the references for my commit message:

            GCC 6 and later appear to be able to optimize away the memset() that is part of mem_heap_zalloc() in the placement new call. Let us avoid using placement new in order to ensure that the objects will actually be initialized.

            https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71388

            https://gcc.gnu.org/ml/gcc/2016-02/msg00207.html

            While the latter reference hints that the optimization is only applicable to non-POD types (and dict_col_t does not define any member functions before 10.2), it is most consistent to use the same initialization across all versions.

            I do not disagree with the compiler developers. Dead code elimination (DCE), or its subset dead store elimination here, is a useful optimization technique that frees the programmer from having to pollute the code with #ifdef; MDEV-25340 is a good example where some loop bodies and maybe entire loops are being optimized away in non-debug builds. For the LLVM family of compilers, I found this paper on dead code elimination interesting.

            marko Marko Mäkelä added a comment - Back in 2019, I dug up the references for my commit message : GCC 6 and later appear to be able to optimize away the memset() that is part of mem_heap_zalloc() in the placement new call. Let us avoid using placement new in order to ensure that the objects will actually be initialized. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71388 https://gcc.gnu.org/ml/gcc/2016-02/msg00207.html While the latter reference hints that the optimization is only applicable to non-POD types (and dict_col_t does not define any member functions before 10.2), it is most consistent to use the same initialization across all versions. I do not disagree with the compiler developers. Dead code elimination (DCE), or its subset dead store elimination here, is a useful optimization technique that frees the programmer from having to pollute the code with #ifdef ; MDEV-25340 is a good example where some loop bodies and maybe entire loops are being optimized away in non-debug builds. For the LLVM family of compilers, I found this paper on dead code elimination interesting.
            wenhug Hugo Wen added a comment -

            Thank you marko for sharing a similar issue. The context and background information is super helpful. It's great to know about this `lifetime-dse` optimize option.

            https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html Looks it's enabled by default -flifetime-dse=2 and that explains why the memset() is optimized away.

            -fno-lifetime-dse In C++ the value of an object is only affected by changes within its lifetime: when the constructor begins, the object has an indeterminate value, and any changes during the lifetime of the object are dead when the object is destroyed. Normally dead store elimination will take advantage of this; if your code relies on the value of the object storage persisting beyond the lifetime of the object, you can use this flag to disable this optimization. To preserve stores before the constructor starts (e.g. because your operator new clears the object storage) but still treat the object as dead after the destructor, you can use -flifetime-dse=1. The default behavior can be explicitly selected with -flifetime-dse=2. -flifetime-dse=0 is equivalent to -fno-lifetime-dse.

            wenhug Hugo Wen added a comment - Thank you marko for sharing a similar issue. The context and background information is super helpful. It's great to know about this `lifetime-dse` optimize option. https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html Looks it's enabled by default -flifetime-dse=2 and that explains why the memset() is optimized away. -fno-lifetime-dse In C++ the value of an object is only affected by changes within its lifetime: when the constructor begins, the object has an indeterminate value, and any changes during the lifetime of the object are dead when the object is destroyed. Normally dead store elimination will take advantage of this; if your code relies on the value of the object storage persisting beyond the lifetime of the object, you can use this flag to disable this optimization. To preserve stores before the constructor starts (e.g. because your operator new clears the object storage) but still treat the object as dead after the destructor, you can use -flifetime-dse=1. The default behavior can be explicitly selected with -flifetime-dse=2. -flifetime-dse=0 is equivalent to -fno-lifetime-dse.

            People

              Gosselin Dave Gosselin
              wenhug Hugo Wen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.