Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34625

MariaDB server crashes in 10.11.8 due to using uninitialized member variables

Details

    Description

      Issue:

      While building MariaDB 10.11.8 with `-O3` and some customized build flags, we noticed that mysqld crashes during MTR and could not perform any table operations. The issue caused by commit https://github.com/MariaDB/server/commit/a8a75ba2d so all branches have the commit are affected.

      When reproducing the issue, it seems the engine could crash in different scenarios and different code lines with different invalid pointer values.

      Example:

      (gdb) bt
      #0  sp_update_stmt_used_routines (belong_to_view=0x55870f6debb0 <end_of_list>, src=<optimized out>, prelocking_ctx=0x7fdceb20bb40, thd=0x7fdceb207958) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sp.cc:2766
      #1  DML_prelocking_strategy::handle_view (this=<optimized out>, thd=0x7fdceb207958, prelocking_ctx=0x7fdceb20bb40, table_list=0x7fdceb22eda0, need_prelocking=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:5220
      #2  0x000055870e8caf6f in open_and_process_table (ot_ctx=0x7fdd10f22900, has_prelocking_list=<optimized out>, prelocking_strategy=<optimized out>, flags=<optimized out>, counter=<optimized out>, tables=0x7fdceb22eda0, thd=0x7fdceb207958)
          at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:4239
      #3  open_tables(THD*, DDL_options_st const&, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) [clone .constprop.0] (thd=thd@entry=0x7fdceb207958, options=..., start=start@entry=0x7fdd10f22988, counter=counter@entry=0x7fdd10f2299c, flags=flags@entry=0, 
          prelocking_strategy=0x7fdd10f22a20) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:4634
      #4  0x000055870e8cd4d7 in open_and_lock_tables(THD*, DDL_options_st const&, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) [clone .constprop.0] (thd=0x7fdceb207958, options=..., tables=<optimized out>, derived=true, flags=0, prelocking_strategy=<optimized out>)
          at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:5607
      #5  0x000055870e0e0a95 in open_and_lock_tables (flags=<optimized out>, derived=<optimized out>, tables=<optimized out>, thd=<optimized out>, thd=<optimized out>, tables=<optimized out>, derived=<optimized out>, flags=<optimized out>)
          at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.h:514
      #6  execute_sqlcom_select (thd=thd@entry=0x7fdceb207958, all_tables=all_tables@entry=0x7fdceb22eda0) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:6341
      #7  0x000055870e0e90a1 in mysql_execute_command (thd=thd@entry=0x7fdceb207958, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:4023
      #8  0x000055870e0ea125 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x7fdceb207958) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:8249
      #9  mysql_parse (thd=0x7fdceb207958, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:8171
      #10 0x000055870e0eb3da in dispatch_command (command=COM_QUERY, thd=0x7fdceb207958, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_class.h:244
      #11 0x000055870e0d7503 in do_command (thd=0x7fdceb207958, blocking=true) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:1415
      l
      #12 0x000055870e24086d in do_handle_one_connection (connect=<optimized out>, connect@entry=0x7fdd10a131b8, put_in_cache=put_in_cache@entry=true) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_connect.cc:1458
      #13 0x000055870e240b7b in handle_one_connection (arg=arg@entry=0x7fdd10a131b8) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_connect.cc:1360
      #14 0x000055870e593bcd in pfs_spawn_thread (arg=0x7fdd10bc2718) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/storage/perfschema/pfs.cc:2201
      #15 0x00007fdd1132544b in start_thread () from /lib64/libpthread.so.0
      #16 0x00007fdd1106052f in clone () from /lib64/libc.so.6
      (gdb) l
      2761	void sp_update_stmt_used_routines(THD *thd, Query_tables_list *prelocking_ctx,
      2762	                                  SQL_I_List<Sroutine_hash_entry> *src,
      2763	                                  TABLE_LIST *belong_to_view)
      2764	{
      2765	  for (Sroutine_hash_entry *rt= src->first; rt; rt= rt->next)
      2766	    (void)sp_add_used_routine(prelocking_ctx, thd->active_stmt_arena_to_use(),
      2767	                              &rt->mdl_request.key, rt->m_handler,
      2768	                              belong_to_view);
      2769	}
      2770	
      (gdb) p rt
      $1 = (Sroutine_hash_entry *) 0x8
      (gdb) q
      

      Root cause:

      After further investigating, I figured out the root cause is "undefined behavior" of the using uninitialized variables. That's why we see it could crash at different code in different versions and with random invalid pointer values.

      Detailed analysis:
      Following code introduced in commit https://github.com/MariaDB/server/commit/a8a75ba2d mainly does three things:

      1. It allocates memory from mem_root using the calloc function implemented in THD class.
      2. In the same calloc , it sets all allocated memory to 0.
      3. In the placement new it initializes the TABLE_LIST at the memory allocated in step 1.

        void *memregion= thd->calloc(sizeof(TABLE_LIST));
        TABLE_LIST *ptr= new (memregion) TABLE_LIST(thd, db, fqtn, alias_str,
                                                    has_alias_ptr, table, lock_type,
                                                    mdl_type, table_options,
                                                    info_schema, this,
                                                    index_hints_arg, option);
      

      There's a critical issue in above code because it depends on step 2 to initialize all the member variables of TABLE_LIST that are not explicitly set in the constructor function. There's no guarantee that the memset/bfill in step 2 will always happen as compiler could optimize it out. Using the uninitialized value can lead to unpredictable issues which are hard to debug.

      It can be proved by this https://gcc.godbolt.org/z/5n87z1raG test which I wrote to test the assembly code of this behavior.
      The code in MariaDB can be abstracted to following but with many layers wrapped so the optimization is slightly different. ( and in default MariaDB build, it seems it's not optimized out so there's no issue, but there are chances to be optimized out with some customized flags which are happening in our build )
      To summarize, in x86, memset of following code will be optimized out with both -O2 and -O3 in GCC 13, only kept the memset in very old GCC 4.9.

      struct S {
        int i;     // uninitialized in consturctor
        S() {};
      };
       
      int bar() {
        void *buf = malloc(sizeof(S));
        memset(buf, 0, sizeof(S));       // optimized out
        S* s = new(buf) S;
        return s->i;
      }
      

      GCC13 -O3

      bar():
              sub     rsp, 8
              mov     edi, 4
              call    malloc
              mov     eax, DWORD PTR [rax]
              add     rsp, 8
              ret
      

      GCC 4.9 -O3

      bar():
              sub     rsp, 8
              mov     edi, 4
              call    malloc
              mov     DWORD PTR [rax], 0
              xor     eax, eax
              add     rsp, 8
              ret
      

      Fix:

      Including protective code in steps 1 or 2 is not effective and typically doesn't ensure anything. The proper solution is to ensure that the constructor function initializes the variables correctly. I found a reset() function in the class that performs the memset/bfill(0) operation.

      After the fix the crash is gone and I'm preparing a PR to use alloc to allocate the memory but inside the constructor do the reset first.

      I'm now preparing a Pull Request for it.

       

      Attachments

        Issue Links

          Activity

            wenhug Hugo Wen created issue -
            wenhug Hugo Wen made changes -
            Field Original Value New Value
            Description h3. Issue:

            *While building MariaDB 10.11.8 with `-O3` and some customized build flags, we noticed that mysqld crashes during MTR and could not perform any table operations. The issue caused by commit https://github.com/MariaDB/server/commit/a8a75ba2d so all branches have the commit are affected.*

            When reproducing the issue, it seems the engine could crash in different scenarios and different code lines with different invalid pointer values.

            Example:
            {code:java}
            (gdb) bt
            #0 sp_update_stmt_used_routines (belong_to_view=0x55870f6debb0 <end_of_list>, src=<optimized out>, prelocking_ctx=0x7fdceb20bb40, thd=0x7fdceb207958) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sp.cc:2766
            #1 DML_prelocking_strategy::handle_view (this=<optimized out>, thd=0x7fdceb207958, prelocking_ctx=0x7fdceb20bb40, table_list=0x7fdceb22eda0, need_prelocking=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:5220
            #2 0x000055870e8caf6f in open_and_process_table (ot_ctx=0x7fdd10f22900, has_prelocking_list=<optimized out>, prelocking_strategy=<optimized out>, flags=<optimized out>, counter=<optimized out>, tables=0x7fdceb22eda0, thd=0x7fdceb207958)
                at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:4239
            #3 open_tables(THD*, DDL_options_st const&, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) [clone .constprop.0] (thd=thd@entry=0x7fdceb207958, options=..., start=start@entry=0x7fdd10f22988, counter=counter@entry=0x7fdd10f2299c, flags=flags@entry=0,
                prelocking_strategy=0x7fdd10f22a20) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:4634
            #4 0x000055870e8cd4d7 in open_and_lock_tables(THD*, DDL_options_st const&, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) [clone .constprop.0] (thd=0x7fdceb207958, options=..., tables=<optimized out>, derived=true, flags=0, prelocking_strategy=<optimized out>)
                at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:5607
            #5 0x000055870e0e0a95 in open_and_lock_tables (flags=<optimized out>, derived=<optimized out>, tables=<optimized out>, thd=<optimized out>, thd=<optimized out>, tables=<optimized out>, derived=<optimized out>, flags=<optimized out>)
                at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.h:514
            #6 execute_sqlcom_select (thd=thd@entry=0x7fdceb207958, all_tables=all_tables@entry=0x7fdceb22eda0) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:6341
            #7 0x000055870e0e90a1 in mysql_execute_command (thd=thd@entry=0x7fdceb207958, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:4023
            #8 0x000055870e0ea125 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x7fdceb207958) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:8249
            #9 mysql_parse (thd=0x7fdceb207958, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:8171
            #10 0x000055870e0eb3da in dispatch_command (command=COM_QUERY, thd=0x7fdceb207958, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_class.h:244
            #11 0x000055870e0d7503 in do_command (thd=0x7fdceb207958, blocking=true) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:1415
            l
            #12 0x000055870e24086d in do_handle_one_connection (connect=<optimized out>, connect@entry=0x7fdd10a131b8, put_in_cache=put_in_cache@entry=true) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_connect.cc:1458
            #13 0x000055870e240b7b in handle_one_connection (arg=arg@entry=0x7fdd10a131b8) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_connect.cc:1360
            #14 0x000055870e593bcd in pfs_spawn_thread (arg=0x7fdd10bc2718) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/storage/perfschema/pfs.cc:2201
            #15 0x00007fdd1132544b in start_thread () from /lib64/libpthread.so.0
            #16 0x00007fdd1106052f in clone () from /lib64/libc.so.6
            (gdb) l
            2761 void sp_update_stmt_used_routines(THD *thd, Query_tables_list *prelocking_ctx,
            2762 SQL_I_List<Sroutine_hash_entry> *src,
            2763 TABLE_LIST *belong_to_view)
            2764 {
            2765 for (Sroutine_hash_entry *rt= src->first; rt; rt= rt->next)
            2766 (void)sp_add_used_routine(prelocking_ctx, thd->active_stmt_arena_to_use(),
            2767 &rt->mdl_request.key, rt->m_handler,
            2768 belong_to_view);
            2769 }
            2770
            (gdb) p rt
            $1 = (Sroutine_hash_entry *) 0x8
            (gdb) q
            {code}
            h3. Root cause:

            *After further investigating, I figured out the root cause is "undefined behavior" of the using uninitialized variables. That's why we see it could crash at different code in different versions and with random invalid pointer values.*

            *Detailed analysis:*
            [Following code|https://github.com/MariaDB/server/blob/a8a75ba2d0dd639b2b14337a6c4f495fcaaff7e4/sql/sql_parse.cc#L8352-L8357] introduced in commit [https://github.com/MariaDB/server/commit/a8a75ba2d] mainly does three things:
             # It allocates memory from mem_root using the *calloc* function implemented in *THD* class.
             # In the same *calloc* , it sets all allocated memory to 0.
             # In the *placement new* it initializes the *TABLE_LIST* at the memory allocated in step 1.

            {code:java}
              void *memregion= thd->calloc(sizeof(TABLE_LIST));
              TABLE_LIST *ptr= new (memregion) TABLE_LIST(thd, db, fqtn, alias_str,
                                                          has_alias_ptr, table, lock_type,
                                                          mdl_type, table_options,
                                                          info_schema, this,
                                                          index_hints_arg, option);
            {code}
            *There's a critical issue in above code because it depends on step 2 to initialize all the member variables of +TABLE_LIST+ that are not explicitly set in the constructor function. There's no guarantee that the memset/bfill in step 2 will always happen as compiler could optimize it out. Using the uninitialized value can lead to unpredictable issues which are hard to debug.*

            *It can be proved by this [https://gcc.godbolt.org/z/5n87z1raG] test which I wrote to test the assembly code of this behavior.*
            The code in MariaDB can be abstracted to following but with many layers wrapped so the optimization is slightly different. ( and in default MariaDB build, it seems it's not optimized out so there's no issue, but there are chances to be optimized out with some customized flags which are happening in our build )
            *To summarize, in x86, memset of following code will be optimized out with both -O2 and -O3 in GCC 13, only kept the memset in very old GCC 4.9.*
            {code:c}
            struct S {
              int i; // uninitialized in consturctor
              S() {};
            };

            int bar() {
              void *buf = malloc(sizeof(S));
              memset(buf, 0, sizeof(S)); // optimized out
              S* s = new(buf) S;
              return s->i;
            }
            {code}
            GCC13 -O3
            {code:c}
            bar():
                    sub rsp, 8
                    mov edi, 4
                    call malloc
                    mov eax, DWORD PTR [rax]
                    add rsp, 8
                    ret
            {code}
            GCC 4.9 -O3
            {code:c}
            bar():
                    sub rsp, 8
                    mov edi, 4
                    call malloc
                    mov DWORD PTR [rax], 0
                    xor eax, eax
                    add rsp, 8
                    ret
            {code}
            h3. Fix:

            Including protective code in steps 1 or 2 is not effective and typically doesn't ensure anything. The proper solution is to ensure that the constructor function initializes the variables correctly. I found a {{reset()}} function in the class that performs the memset/bfill(0) operation.

            After the fix the crash is gone and I'm preparing a PR to use {*}alloc to allocate the memory but inside the constructor do the reset first{*}.

             
            h3. Issue:

            *While building MariaDB 10.11.8 with `-O3` and some customized build flags, we noticed that mysqld crashes during MTR and could not perform any table operations. The issue caused by commit https://github.com/MariaDB/server/commit/a8a75ba2d so all branches have the commit are affected.*

            When reproducing the issue, it seems the engine could crash in different scenarios and different code lines with different invalid pointer values.

            Example:
            {code:java}
            (gdb) bt
            #0 sp_update_stmt_used_routines (belong_to_view=0x55870f6debb0 <end_of_list>, src=<optimized out>, prelocking_ctx=0x7fdceb20bb40, thd=0x7fdceb207958) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sp.cc:2766
            #1 DML_prelocking_strategy::handle_view (this=<optimized out>, thd=0x7fdceb207958, prelocking_ctx=0x7fdceb20bb40, table_list=0x7fdceb22eda0, need_prelocking=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:5220
            #2 0x000055870e8caf6f in open_and_process_table (ot_ctx=0x7fdd10f22900, has_prelocking_list=<optimized out>, prelocking_strategy=<optimized out>, flags=<optimized out>, counter=<optimized out>, tables=0x7fdceb22eda0, thd=0x7fdceb207958)
                at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:4239
            #3 open_tables(THD*, DDL_options_st const&, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) [clone .constprop.0] (thd=thd@entry=0x7fdceb207958, options=..., start=start@entry=0x7fdd10f22988, counter=counter@entry=0x7fdd10f2299c, flags=flags@entry=0,
                prelocking_strategy=0x7fdd10f22a20) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:4634
            #4 0x000055870e8cd4d7 in open_and_lock_tables(THD*, DDL_options_st const&, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) [clone .constprop.0] (thd=0x7fdceb207958, options=..., tables=<optimized out>, derived=true, flags=0, prelocking_strategy=<optimized out>)
                at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:5607
            #5 0x000055870e0e0a95 in open_and_lock_tables (flags=<optimized out>, derived=<optimized out>, tables=<optimized out>, thd=<optimized out>, thd=<optimized out>, tables=<optimized out>, derived=<optimized out>, flags=<optimized out>)
                at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.h:514
            #6 execute_sqlcom_select (thd=thd@entry=0x7fdceb207958, all_tables=all_tables@entry=0x7fdceb22eda0) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:6341
            #7 0x000055870e0e90a1 in mysql_execute_command (thd=thd@entry=0x7fdceb207958, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:4023
            #8 0x000055870e0ea125 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x7fdceb207958) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:8249
            #9 mysql_parse (thd=0x7fdceb207958, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:8171
            #10 0x000055870e0eb3da in dispatch_command (command=COM_QUERY, thd=0x7fdceb207958, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_class.h:244
            #11 0x000055870e0d7503 in do_command (thd=0x7fdceb207958, blocking=true) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:1415
            l
            #12 0x000055870e24086d in do_handle_one_connection (connect=<optimized out>, connect@entry=0x7fdd10a131b8, put_in_cache=put_in_cache@entry=true) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_connect.cc:1458
            #13 0x000055870e240b7b in handle_one_connection (arg=arg@entry=0x7fdd10a131b8) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_connect.cc:1360
            #14 0x000055870e593bcd in pfs_spawn_thread (arg=0x7fdd10bc2718) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/storage/perfschema/pfs.cc:2201
            #15 0x00007fdd1132544b in start_thread () from /lib64/libpthread.so.0
            #16 0x00007fdd1106052f in clone () from /lib64/libc.so.6
            (gdb) l
            2761 void sp_update_stmt_used_routines(THD *thd, Query_tables_list *prelocking_ctx,
            2762 SQL_I_List<Sroutine_hash_entry> *src,
            2763 TABLE_LIST *belong_to_view)
            2764 {
            2765 for (Sroutine_hash_entry *rt= src->first; rt; rt= rt->next)
            2766 (void)sp_add_used_routine(prelocking_ctx, thd->active_stmt_arena_to_use(),
            2767 &rt->mdl_request.key, rt->m_handler,
            2768 belong_to_view);
            2769 }
            2770
            (gdb) p rt
            $1 = (Sroutine_hash_entry *) 0x8
            (gdb) q
            {code}
            h3. Root cause:

            *After further investigating, I figured out the root cause is "undefined behavior" of the using uninitialized variables. That's why we see it could crash at different code in different versions and with random invalid pointer values.*

            *Detailed analysis:*
            [Following code|https://github.com/MariaDB/server/blob/a8a75ba2d0dd639b2b14337a6c4f495fcaaff7e4/sql/sql_parse.cc#L8352-L8357] introduced in commit [https://github.com/MariaDB/server/commit/a8a75ba2d] mainly does three things:
             # It allocates memory from mem_root using the *calloc* function implemented in *THD* class.
             # In the same *calloc* , it sets all allocated memory to 0.
             # In the *placement new* it initializes the *TABLE_LIST* at the memory allocated in step 1.

            {code:java}
              void *memregion= thd->calloc(sizeof(TABLE_LIST));
              TABLE_LIST *ptr= new (memregion) TABLE_LIST(thd, db, fqtn, alias_str,
                                                          has_alias_ptr, table, lock_type,
                                                          mdl_type, table_options,
                                                          info_schema, this,
                                                          index_hints_arg, option);
            {code}
            *There's a critical issue in above code because it depends on step 2 to initialize all the member variables of +TABLE_LIST+ that are not explicitly set in the constructor function. There's no guarantee that the memset/bfill in step 2 will always happen as compiler could optimize it out. Using the uninitialized value can lead to unpredictable issues which are hard to debug.*

            *It can be proved by this [https://gcc.godbolt.org/z/5n87z1raG] test which I wrote to test the assembly code of this behavior.*
            The code in MariaDB can be abstracted to following but with many layers wrapped so the optimization is slightly different. ( and in default MariaDB build, it seems it's not optimized out so there's no issue, but there are chances to be optimized out with some customized flags which are happening in our build )
            *To summarize, in x86, memset of following code will be optimized out with both -O2 and -O3 in GCC 13, only kept the memset in very old GCC 4.9.*
            {code:c}
            struct S {
              int i; // uninitialized in consturctor
              S() {};
            };

            int bar() {
              void *buf = malloc(sizeof(S));
              memset(buf, 0, sizeof(S)); // optimized out
              S* s = new(buf) S;
              return s->i;
            }
            {code}
            GCC13 -O3
            {code:c}
            bar():
                    sub rsp, 8
                    mov edi, 4
                    call malloc
                    mov eax, DWORD PTR [rax]
                    add rsp, 8
                    ret
            {code}
            GCC 4.9 -O3
            {code:c}
            bar():
                    sub rsp, 8
                    mov edi, 4
                    call malloc
                    mov DWORD PTR [rax], 0
                    xor eax, eax
                    add rsp, 8
                    ret
            {code}
            h3. Fix:

            Including protective code in steps 1 or 2 is not effective and typically doesn't ensure anything. The proper solution is to ensure that the constructor function initializes the variables correctly. I found a {{reset()}} function in the class that performs the memset/bfill(0) operation.

            After the fix the crash is gone and I'm preparing a PR to use {*}alloc to allocate the memory but inside the constructor do the reset first{*}.

            I'm now preparing a Pull Request for it.

             
            wlad Vladislav Vaintroub added a comment - - edited

            It is true that constructor should initialize all member variables, to avoid undefined behavior. But if it does not, and gcc optimizes away memset, it is gcc bug, which is what your short example wonderfully demonstrates. I fail to understand how it can be classified differently than a compiler bug, seriously. So, did you file your findings for gcc, too? There can be more of this stuff, and we do not know where it would strike next time.

            wlad Vladislav Vaintroub added a comment - - edited It is true that constructor should initialize all member variables, to avoid undefined behavior. But if it does not, and gcc optimizes away memset, it is gcc bug, which is what your short example wonderfully demonstrates. I fail to understand how it can be classified differently than a compiler bug, seriously. So, did you file your findings for gcc, too? There can be more of this stuff, and we do not know where it would strike next time.
            wenhug Hugo Wen added a comment -

            > But if it does not, and gcc optimizes away memset, it is gcc bug.

            Hi wlad, thank you for the quick response. While I don't consider this a "bug" in GCC, my understanding is that once a constructor function initializes an object, any uninitialized variables within that object are subject to undefined behavior. The state of memory before the constructor runs—whether it involves memset or was used for other purposes—is irrelevant after the placement new operation.

            wenhug Hugo Wen added a comment - > But if it does not, and gcc optimizes away memset, it is gcc bug. Hi wlad , thank you for the quick response. While I don't consider this a "bug" in GCC, my understanding is that once a constructor function initializes an object, any uninitialized variables within that object are subject to undefined behavior . The state of memory before the constructor runs—whether it involves memset or was used for other purposes—is irrelevant after the placement new operation.

            Nah, I have to disagree. If you have object with static or thread-local duration , members not explicitly initialized will be zeroed. So, people can rely on zero initialization in other cases, and explicitly zeroing memory sounds like a valid way to go. Now, at some optimization level compiler removes the memset call, while it thinks it would have no side effects, and it has side effects. Whose fault is it?

            wlad Vladislav Vaintroub added a comment - Nah, I have to disagree. If you have object with static or thread-local duration , members not explicitly initialized will be zeroed. So, people can rely on zero initialization in other cases, and explicitly zeroing memory sounds like a valid way to go. Now, at some optimization level compiler removes the memset call, while it thinks it would have no side effects, and it has side effects. Whose fault is it?

            Alas, https://stackoverflow.com/questions/68713608/c-placement-new-after-memset there is a specific discussion about this behavior, and some people think GCC is fine. But, that it does not even warn about unitialized members (unless -Weffc++ is set), and removes memsets, this is definitely confusing.

            wlad Vladislav Vaintroub added a comment - Alas, https://stackoverflow.com/questions/68713608/c-placement-new-after-memset there is a specific discussion about this behavior, and some people think GCC is fine. But, that it does not even warn about unitialized members (unless -Weffc++ is set), and removes memsets, this is definitely confusing.
            wenhug Hugo Wen added a comment -

            This is indeed confusing and took me time to figure out the root cause.

            Submitted PR to fix it https://github.com/MariaDB/server/pull/3408 .

            wenhug Hugo Wen added a comment - This is indeed confusing and took me time to figure out the root cause. Submitted PR to fix it https://github.com/MariaDB/server/pull/3408 .
            serg Sergei Golubchik made changes -
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 11.4 [ 29301 ]
            Fix Version/s 11.5 [ 29506 ]
            serg Sergei Golubchik made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            serg Sergei Golubchik made changes -
            Status Confirmed [ 10101 ] Open [ 1 ]
            serg Sergei Golubchik made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            serg Sergei Golubchik made changes -
            Assignee Dave Gosselin [ JIRAUSER52216 ]
            serg Sergei Golubchik made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]
            serg Sergei Golubchik made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            Gosselin Dave Gosselin made changes -
            Status Stalled [ 10000 ] In Review [ 10002 ]
            Gosselin Dave Gosselin added a comment -

            Hi wenhug thank you for taking the time to track this down, we are always grateful for community involvement and PR submissions. I left a few of questions on your PR, please take a look, nothing critical but I'm curious. According to the Compiler Explorer example, g++ 4.9.0 leaves the memset in while the later version optimizes it away; IMHO that seems like a regression in the compiler.

            Gosselin Dave Gosselin added a comment - Hi wenhug thank you for taking the time to track this down, we are always grateful for community involvement and PR submissions. I left a few of questions on your PR, please take a look, nothing critical but I'm curious. According to the Compiler Explorer example, g++ 4.9.0 leaves the memset in while the later version optimizes it away; IMHO that seems like a regression in the compiler.
            marko Marko Mäkelä made changes -
            Component/s Server [ 13907 ]
            Fix Version/s 10.6.19 [ 29833 ]
            Fix Version/s 10.11.9 [ 29834 ]
            Fix Version/s 11.1.6 [ 29835 ]
            Fix Version/s 11.2.5 [ 29836 ]
            Fix Version/s 11.4.3 [ 29837 ]
            Fix Version/s 11.5.2 [ 29838 ]
            Fix Version/s 11.6.1 [ 29847 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 11.4 [ 29301 ]
            Fix Version/s 11.5 [ 29506 ]
            Resolution Fixed [ 1 ]
            Status In Review [ 10002 ] Closed [ 6 ]

            Years ago, I fixed a similar bug MDEV-18749. The relevant compiler optimization was introduced in GCC 6; it is called -flifetime-dse. Writes that precede a call to a constructor may be optimized away, because the constructor is supposed to initialize all the fields of the object.

            For what it is worth, I think that it would be a good practice to use C++11 initialization expressions for all data members and let the compiler create or amend constructors. If all fields are initialized to zero, the compiler may emit SIMD instructions such as the x86-64 movaps or movups for initializing many fields at a time.

            marko Marko Mäkelä added a comment - Years ago, I fixed a similar bug MDEV-18749 . The relevant compiler optimization was introduced in GCC 6; it is called -flifetime-dse . Writes that precede a call to a constructor may be optimized away, because the constructor is supposed to initialize all the fields of the object. For what it is worth, I think that it would be a good practice to use C++11 initialization expressions for all data members and let the compiler create or amend constructors. If all fields are initialized to zero, the compiler may emit SIMD instructions such as the x86-64 movaps or movups for initializing many fields at a time.
            marko Marko Mäkelä made changes -

            Back in 2019, I dug up the references for my commit message:

            GCC 6 and later appear to be able to optimize away the memset() that is part of mem_heap_zalloc() in the placement new call. Let us avoid using placement new in order to ensure that the objects will actually be initialized.

            https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71388

            https://gcc.gnu.org/ml/gcc/2016-02/msg00207.html

            While the latter reference hints that the optimization is only applicable to non-POD types (and dict_col_t does not define any member functions before 10.2), it is most consistent to use the same initialization across all versions.

            I do not disagree with the compiler developers. Dead code elimination (DCE), or its subset dead store elimination here, is a useful optimization technique that frees the programmer from having to pollute the code with #ifdef; MDEV-25340 is a good example where some loop bodies and maybe entire loops are being optimized away in non-debug builds. For the LLVM family of compilers, I found this paper on dead code elimination interesting.

            marko Marko Mäkelä added a comment - Back in 2019, I dug up the references for my commit message : GCC 6 and later appear to be able to optimize away the memset() that is part of mem_heap_zalloc() in the placement new call. Let us avoid using placement new in order to ensure that the objects will actually be initialized. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71388 https://gcc.gnu.org/ml/gcc/2016-02/msg00207.html While the latter reference hints that the optimization is only applicable to non-POD types (and dict_col_t does not define any member functions before 10.2), it is most consistent to use the same initialization across all versions. I do not disagree with the compiler developers. Dead code elimination (DCE), or its subset dead store elimination here, is a useful optimization technique that frees the programmer from having to pollute the code with #ifdef ; MDEV-25340 is a good example where some loop bodies and maybe entire loops are being optimized away in non-debug builds. For the LLVM family of compilers, I found this paper on dead code elimination interesting.
            wenhug Hugo Wen added a comment -

            Thank you marko for sharing a similar issue. The context and background information is super helpful. It's great to know about this `lifetime-dse` optimize option.

            https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html Looks it's enabled by default -flifetime-dse=2 and that explains why the memset() is optimized away.

            -fno-lifetime-dse In C++ the value of an object is only affected by changes within its lifetime: when the constructor begins, the object has an indeterminate value, and any changes during the lifetime of the object are dead when the object is destroyed. Normally dead store elimination will take advantage of this; if your code relies on the value of the object storage persisting beyond the lifetime of the object, you can use this flag to disable this optimization. To preserve stores before the constructor starts (e.g. because your operator new clears the object storage) but still treat the object as dead after the destructor, you can use -flifetime-dse=1. The default behavior can be explicitly selected with -flifetime-dse=2. -flifetime-dse=0 is equivalent to -fno-lifetime-dse.

            wenhug Hugo Wen added a comment - Thank you marko for sharing a similar issue. The context and background information is super helpful. It's great to know about this `lifetime-dse` optimize option. https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html Looks it's enabled by default -flifetime-dse=2 and that explains why the memset() is optimized away. -fno-lifetime-dse In C++ the value of an object is only affected by changes within its lifetime: when the constructor begins, the object has an indeterminate value, and any changes during the lifetime of the object are dead when the object is destroyed. Normally dead store elimination will take advantage of this; if your code relies on the value of the object storage persisting beyond the lifetime of the object, you can use this flag to disable this optimization. To preserve stores before the constructor starts (e.g. because your operator new clears the object storage) but still treat the object as dead after the destructor, you can use -flifetime-dse=1. The default behavior can be explicitly selected with -flifetime-dse=2. -flifetime-dse=0 is equivalent to -fno-lifetime-dse.
            marko Marko Mäkelä made changes -

            People

              Gosselin Dave Gosselin
              wenhug Hugo Wen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.