Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34625

MariaDB server crashes in 10.11.8 due to using uninitialized member variables

    XMLWordPrintable

Details

    Description

      Issue:

      While building MariaDB 10.11.8 with `-O3` and some customized build flags, we noticed that mysqld crashes during MTR and could not perform any table operations. The issue caused by commit https://github.com/MariaDB/server/commit/a8a75ba2d so all branches have the commit are affected.

      When reproducing the issue, it seems the engine could crash in different scenarios and different code lines with different invalid pointer values.

      Example:

      (gdb) bt
      #0  sp_update_stmt_used_routines (belong_to_view=0x55870f6debb0 <end_of_list>, src=<optimized out>, prelocking_ctx=0x7fdceb20bb40, thd=0x7fdceb207958) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sp.cc:2766
      #1  DML_prelocking_strategy::handle_view (this=<optimized out>, thd=0x7fdceb207958, prelocking_ctx=0x7fdceb20bb40, table_list=0x7fdceb22eda0, need_prelocking=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:5220
      #2  0x000055870e8caf6f in open_and_process_table (ot_ctx=0x7fdd10f22900, has_prelocking_list=<optimized out>, prelocking_strategy=<optimized out>, flags=<optimized out>, counter=<optimized out>, tables=0x7fdceb22eda0, thd=0x7fdceb207958)
          at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:4239
      #3  open_tables(THD*, DDL_options_st const&, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) [clone .constprop.0] (thd=thd@entry=0x7fdceb207958, options=..., start=start@entry=0x7fdd10f22988, counter=counter@entry=0x7fdd10f2299c, flags=flags@entry=0, 
          prelocking_strategy=0x7fdd10f22a20) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:4634
      #4  0x000055870e8cd4d7 in open_and_lock_tables(THD*, DDL_options_st const&, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) [clone .constprop.0] (thd=0x7fdceb207958, options=..., tables=<optimized out>, derived=true, flags=0, prelocking_strategy=<optimized out>)
          at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:5607
      #5  0x000055870e0e0a95 in open_and_lock_tables (flags=<optimized out>, derived=<optimized out>, tables=<optimized out>, thd=<optimized out>, thd=<optimized out>, tables=<optimized out>, derived=<optimized out>, flags=<optimized out>)
          at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.h:514
      #6  execute_sqlcom_select (thd=thd@entry=0x7fdceb207958, all_tables=all_tables@entry=0x7fdceb22eda0) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:6341
      #7  0x000055870e0e90a1 in mysql_execute_command (thd=thd@entry=0x7fdceb207958, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:4023
      #8  0x000055870e0ea125 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x7fdceb207958) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:8249
      #9  mysql_parse (thd=0x7fdceb207958, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:8171
      #10 0x000055870e0eb3da in dispatch_command (command=COM_QUERY, thd=0x7fdceb207958, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_class.h:244
      #11 0x000055870e0d7503 in do_command (thd=0x7fdceb207958, blocking=true) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:1415
      l
      #12 0x000055870e24086d in do_handle_one_connection (connect=<optimized out>, connect@entry=0x7fdd10a131b8, put_in_cache=put_in_cache@entry=true) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_connect.cc:1458
      #13 0x000055870e240b7b in handle_one_connection (arg=arg@entry=0x7fdd10a131b8) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_connect.cc:1360
      #14 0x000055870e593bcd in pfs_spawn_thread (arg=0x7fdd10bc2718) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/storage/perfschema/pfs.cc:2201
      #15 0x00007fdd1132544b in start_thread () from /lib64/libpthread.so.0
      #16 0x00007fdd1106052f in clone () from /lib64/libc.so.6
      (gdb) l
      2761	void sp_update_stmt_used_routines(THD *thd, Query_tables_list *prelocking_ctx,
      2762	                                  SQL_I_List<Sroutine_hash_entry> *src,
      2763	                                  TABLE_LIST *belong_to_view)
      2764	{
      2765	  for (Sroutine_hash_entry *rt= src->first; rt; rt= rt->next)
      2766	    (void)sp_add_used_routine(prelocking_ctx, thd->active_stmt_arena_to_use(),
      2767	                              &rt->mdl_request.key, rt->m_handler,
      2768	                              belong_to_view);
      2769	}
      2770	
      (gdb) p rt
      $1 = (Sroutine_hash_entry *) 0x8
      (gdb) q
      

      Root cause:

      After further investigating, I figured out the root cause is "undefined behavior" of the using uninitialized variables. That's why we see it could crash at different code in different versions and with random invalid pointer values.

      Detailed analysis:
      Following code introduced in commit https://github.com/MariaDB/server/commit/a8a75ba2d mainly does three things:

      1. It allocates memory from mem_root using the calloc function implemented in THD class.
      2. In the same calloc , it sets all allocated memory to 0.
      3. In the placement new it initializes the TABLE_LIST at the memory allocated in step 1.

        void *memregion= thd->calloc(sizeof(TABLE_LIST));
        TABLE_LIST *ptr= new (memregion) TABLE_LIST(thd, db, fqtn, alias_str,
                                                    has_alias_ptr, table, lock_type,
                                                    mdl_type, table_options,
                                                    info_schema, this,
                                                    index_hints_arg, option);
      

      There's a critical issue in above code because it depends on step 2 to initialize all the member variables of TABLE_LIST that are not explicitly set in the constructor function. There's no guarantee that the memset/bfill in step 2 will always happen as compiler could optimize it out. Using the uninitialized value can lead to unpredictable issues which are hard to debug.

      It can be proved by this https://gcc.godbolt.org/z/5n87z1raG test which I wrote to test the assembly code of this behavior.
      The code in MariaDB can be abstracted to following but with many layers wrapped so the optimization is slightly different. ( and in default MariaDB build, it seems it's not optimized out so there's no issue, but there are chances to be optimized out with some customized flags which are happening in our build )
      To summarize, in x86, memset of following code will be optimized out with both -O2 and -O3 in GCC 13, only kept the memset in very old GCC 4.9.

      struct S {
        int i;     // uninitialized in consturctor
        S() {};
      };
       
      int bar() {
        void *buf = malloc(sizeof(S));
        memset(buf, 0, sizeof(S));       // optimized out
        S* s = new(buf) S;
        return s->i;
      }
      

      GCC13 -O3

      bar():
              sub     rsp, 8
              mov     edi, 4
              call    malloc
              mov     eax, DWORD PTR [rax]
              add     rsp, 8
              ret
      

      GCC 4.9 -O3

      bar():
              sub     rsp, 8
              mov     edi, 4
              call    malloc
              mov     DWORD PTR [rax], 0
              xor     eax, eax
              add     rsp, 8
              ret
      

      Fix:

      Including protective code in steps 1 or 2 is not effective and typically doesn't ensure anything. The proper solution is to ensure that the constructor function initializes the variables correctly. I found a reset() function in the class that performs the memset/bfill(0) operation.

      After the fix the crash is gone and I'm preparing a PR to use alloc to allocate the memory but inside the constructor do the reset first.

      I'm now preparing a Pull Request for it.

       

      Attachments

        Issue Links

          Activity

            People

              Gosselin Dave Gosselin
              wenhug Hugo Wen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.