Details
Description
Issue:
While building MariaDB 10.11.8 with `-O3` and some customized build flags, we noticed that mysqld crashes during MTR and could not perform any table operations. The issue caused by commit https://github.com/MariaDB/server/commit/a8a75ba2d so all branches have the commit are affected.
When reproducing the issue, it seems the engine could crash in different scenarios and different code lines with different invalid pointer values.
Example:
(gdb) bt
|
#0 sp_update_stmt_used_routines (belong_to_view=0x55870f6debb0 <end_of_list>, src=<optimized out>, prelocking_ctx=0x7fdceb20bb40, thd=0x7fdceb207958) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sp.cc:2766 |
#1 DML_prelocking_strategy::handle_view (this=<optimized out>, thd=0x7fdceb207958, prelocking_ctx=0x7fdceb20bb40, table_list=0x7fdceb22eda0, need_prelocking=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:5220 |
#2 0x000055870e8caf6f in open_and_process_table (ot_ctx=0x7fdd10f22900, has_prelocking_list=<optimized out>, prelocking_strategy=<optimized out>, flags=<optimized out>, counter=<optimized out>, tables=0x7fdceb22eda0, thd=0x7fdceb207958) |
at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:4239 |
#3 open_tables(THD*, DDL_options_st const&, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) [clone .constprop.0] (thd=thd@entry=0x7fdceb207958, options=..., start=start@entry=0x7fdd10f22988, counter=counter@entry=0x7fdd10f2299c, flags=flags@entry=0, |
prelocking_strategy=0x7fdd10f22a20) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:4634 |
#4 0x000055870e8cd4d7 in open_and_lock_tables(THD*, DDL_options_st const&, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) [clone .constprop.0] (thd=0x7fdceb207958, options=..., tables=<optimized out>, derived=true, flags=0, prelocking_strategy=<optimized out>) |
at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.cc:5607 |
#5 0x000055870e0e0a95 in open_and_lock_tables (flags=<optimized out>, derived=<optimized out>, tables=<optimized out>, thd=<optimized out>, thd=<optimized out>, tables=<optimized out>, derived=<optimized out>, flags=<optimized out>) |
at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_base.h:514 |
#6 execute_sqlcom_select (thd=thd@entry=0x7fdceb207958, all_tables=all_tables@entry=0x7fdceb22eda0) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:6341 |
#7 0x000055870e0e90a1 in mysql_execute_command (thd=thd@entry=0x7fdceb207958, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:4023 |
#8 0x000055870e0ea125 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x7fdceb207958) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:8249 |
#9 mysql_parse (thd=0x7fdceb207958, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:8171 |
#10 0x000055870e0eb3da in dispatch_command (command=COM_QUERY, thd=0x7fdceb207958, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_class.h:244 |
#11 0x000055870e0d7503 in do_command (thd=0x7fdceb207958, blocking=true) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_parse.cc:1415 |
l
|
#12 0x000055870e24086d in do_handle_one_connection (connect=<optimized out>, connect@entry=0x7fdd10a131b8, put_in_cache=put_in_cache@entry=true) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_connect.cc:1458 |
#13 0x000055870e240b7b in handle_one_connection (arg=arg@entry=0x7fdd10a131b8) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/sql/sql_connect.cc:1360 |
#14 0x000055870e593bcd in pfs_spawn_thread (arg=0x7fdd10bc2718) at /local/home/wenhug/workspace/RDSMariaDB/src/RDSMariaDB/storage/perfschema/pfs.cc:2201 |
#15 0x00007fdd1132544b in start_thread () from /lib64/libpthread.so.0 |
#16 0x00007fdd1106052f in clone () from /lib64/libc.so.6 |
(gdb) l
|
2761 void sp_update_stmt_used_routines(THD *thd, Query_tables_list *prelocking_ctx, |
2762 SQL_I_List<Sroutine_hash_entry> *src, |
2763 TABLE_LIST *belong_to_view) |
2764 { |
2765 for (Sroutine_hash_entry *rt= src->first; rt; rt= rt->next) |
2766 (void)sp_add_used_routine(prelocking_ctx, thd->active_stmt_arena_to_use(), |
2767 &rt->mdl_request.key, rt->m_handler, |
2768 belong_to_view); |
2769 } |
2770 |
(gdb) p rt
|
$1 = (Sroutine_hash_entry *) 0x8 |
(gdb) q
|
Root cause:
After further investigating, I figured out the root cause is "undefined behavior" of the using uninitialized variables. That's why we see it could crash at different code in different versions and with random invalid pointer values.
Detailed analysis:
Following code introduced in commit https://github.com/MariaDB/server/commit/a8a75ba2d mainly does three things:
- It allocates memory from mem_root using the calloc function implemented in THD class.
- In the same calloc , it sets all allocated memory to 0.
- In the placement new it initializes the TABLE_LIST at the memory allocated in step 1.
void *memregion= thd->calloc(sizeof(TABLE_LIST)); |
TABLE_LIST *ptr= new (memregion) TABLE_LIST(thd, db, fqtn, alias_str, |
has_alias_ptr, table, lock_type,
|
mdl_type, table_options,
|
info_schema, this, |
index_hints_arg, option);
|
There's a critical issue in above code because it depends on step 2 to initialize all the member variables of TABLE_LIST that are not explicitly set in the constructor function. There's no guarantee that the memset/bfill in step 2 will always happen as compiler could optimize it out. Using the uninitialized value can lead to unpredictable issues which are hard to debug.
It can be proved by this https://gcc.godbolt.org/z/5n87z1raG test which I wrote to test the assembly code of this behavior.
The code in MariaDB can be abstracted to following but with many layers wrapped so the optimization is slightly different. ( and in default MariaDB build, it seems it's not optimized out so there's no issue, but there are chances to be optimized out with some customized flags which are happening in our build )
To summarize, in x86, memset of following code will be optimized out with both -O2 and -O3 in GCC 13, only kept the memset in very old GCC 4.9.
struct S { |
int i; // uninitialized in consturctor |
S() {};
|
};
|
 |
int bar() { |
void *buf = malloc(sizeof(S)); |
memset(buf, 0, sizeof(S)); // optimized out |
S* s = new(buf) S; |
return s->i; |
}
|
GCC13 -O3
bar():
|
sub rsp, 8
|
mov edi, 4
|
call malloc |
mov eax, DWORD PTR [rax] |
add rsp, 8
|
ret
|
GCC 4.9 -O3
bar():
|
sub rsp, 8
|
mov edi, 4
|
call malloc |
mov DWORD PTR [rax], 0 |
xor eax, eax
|
add rsp, 8
|
ret
|
Fix:
Including protective code in steps 1 or 2 is not effective and typically doesn't ensure anything. The proper solution is to ensure that the constructor function initializes the variables correctly. I found a reset() function in the class that performs the memset/bfill(0) operation.
After the fix the crash is gone and I'm preparing a PR to use alloc to allocate the memory but inside the constructor do the reset first.
I'm now preparing a Pull Request for it.
Â
Attachments
Issue Links
- relates to
-
MDEV-18749 rec_get_converted_size_comp_prefix_low: Conditional jump or move depends on uninitialised value upon adding FULLTEXT index
- Closed
-
MDEV-35494 fil_space_t::fil_space_t() is potentially unsafe with GCC -flifetime-dse
- Closed