[MDEV-10835] Denial Of Service - Crash Any Galera Node - Race Condition: CREATE TABLE IF NOT EXISTS / INSERT Created: 2016-09-19  Updated: 2016-09-29  Resolved: 2016-09-29

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.1.17
Fix Version/s: 10.1.18

Type: Bug Priority: Critical
Reporter: Rob Brown Assignee: Nirbhay Choubey (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Occurs on CentOS 6 and CentOS 7 64-bit


Issue Links:
Duplicate
duplicates MDEV-9416 MariaDB galera got signal 11 when alt... Closed

 Description   

WARNING! This will crash any Galera Node 100% of the time. It's very easy to replicate this Race Condition vulnerability.



 Comments   
Comment by Rob Brown [ 2016-09-19 ]

Here are the steps to duplicate:

1. Setup galera cluster with at least two nodes. For example, let's say "node1" and "node2".

2. On node1, run this command:

[root@node1 ~]# perl -e 'for (1..100) { print qq

{CREATE TABLE IF NOT EXISTS test.foo (id INT AUTO_INCREMENT, p INT, PRIMARY KEY (id));\n}

; }' | mysql -f
ERROR 1213 (40001) at line 6: Deadlock found when trying to get lock; try restarting transaction
[root@node1 ~]#

3. While that "CREATE" grinder is running on node1, quickly run this on node2:

[root@node2 ~]# perl -e 'for (1..100) { print qq

{INSERT INTO test.foo VALUES (NULL, $$)\n;}

; }' | mysql -f
ERROR 1213 (40001) at line 1: Deadlock found when trying to get lock; try restarting transaction
ERROR 1213 (40001) at line 1: Deadlock found when trying to get lock; try restarting transaction
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
ERROR 2006 (HY000) at line 1: MySQL server has gone away
ERROR 2006 (HY000) at line 1: MySQL server has gone away
ERROR 2006 (HY000) at line 1: MySQL server has gone away
[...]
[root@node2 ~]#

BOOM!!! And node2 will surely crash.

Comment by Rob Brown [ 2016-09-19 ]

The mysqld.log on node2 suddenly shows the following message:

160919 16:41:39 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.1.17-MariaDB
key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=23
max_threads=401
thread_count=8
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 897175 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x7f683236a008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f6836d7b950 thread_stack 0x20000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5581ccea6ebe]
/usr/sbin/mysqld(handle_fatal_signal+0x2d5)[0x5581cc9cd915]
/lib64/libpthread.so.0(+0xf100)[0x7f683698f100]
/usr/sbin/mysqld(_Z20ha_abort_transactionP3THDS0_c+0xa3)[0x5581cc9d8653]
/usr/sbin/mysqld(_Z15wsrep_abort_thdPvS_c+0x139)[0x5581cc97ef39]
/usr/sbin/mysqld(_Z25wsrep_grant_mdl_exceptionP11MDL_contextP10MDL_ticketPK7MDL_key+0x2a7)[0x5581cc96f937]
/usr/sbin/mysqld(_ZNK8MDL_lock14can_grant_lockE13enum_mdl_typeP11MDL_contextb+0x11f)[0x5581cc923e8f]
/usr/sbin/mysqld(_ZN11MDL_context21try_acquire_lock_implEP11MDL_requestPP10MDL_ticket+0xf1)[0x5581cc924b11]
/usr/sbin/mysqld(_ZN11MDL_context12acquire_lockEP11MDL_requestd+0x2e)[0x5581cc92507e]
/usr/sbin/mysqld(_ZN11MDL_context13acquire_locksEP8I_P_ListI11MDL_request16I_P_List_adapterIS1_XadL_ZNS1_12next_in_listEEEXadL_ZNS1_12prev_in_listEEEE16I_P_List_counter21I_P_List_no_push_backIS1_EEd+0xdd)[0x5581cc925b4d]
/usr/sbin/mysqld(_Z16lock_table_namesP3THDRK14DDL_options_stP10TABLE_LISTS5_mj+0x2ae)[0x5581cc8059ae]
/usr/sbin/mysqld(_Z11open_tablesP3THDRK14DDL_options_stPP10TABLE_LISTPjjP19Prelocking_strategy+0xde3)[0x5581cc808b63]
/usr/sbin/mysqld(_Z20open_and_lock_tablesP3THDRK14DDL_options_stP10TABLE_LISTbjP19Prelocking_strategy+0x34)[0x5581cc809084]
/usr/sbin/mysqld(_Z18mysql_create_tableP3THDP10TABLE_LISTP22Table_specification_stP10Alter_info+0x77)[0x5581cc8d5e07]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x857f)[0x5581cc8506cf]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x28e)[0x5581cc8519be]
/usr/sbin/mysqld(_ZN15Query_log_event14do_apply_eventEP14rpl_group_infoPKcj+0x11e8)[0x5581ccaa45a8]
/usr/sbin/mysqld(wsrep_apply_cb+0x64c)[0x5581cc97bffc]
mysys/stacktrace.c:268(my_print_stacktrace)[0x7f6821333598]
sql/handler.cc:6133(ha_abort_transaction(THD*, THD*, char))[0x7f682136f93d]
sql/wsrep_thd.cc:623(wsrep_abort_thd(void*, void*, char))[0x7f68213723b0]
sql/wsrep_mysqld.cc:1742(wsrep_grant_mdl_exception(MDL_context*, MDL_ticket*, MDL_key const*))[0x7f68213753de]
sql/mdl.cc:1579(MDL_lock::can_grant_lock(enum_mdl_type, MDL_context*, bool) const)[0x7f68213524c8]
sql/sql_base.cc:4335(lock_table_names(THD*, DDL_options_st const&, TABLE_LIST*, TABLE_LIST*, unsigned long, unsigned int))[0x7f6821353c5c]
/usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x6b)[0x7f68213758fb]
sql/sql_table.cc:4995(mysql_create_table(THD*, TABLE_LIST*, Table_specification_st*, Alter_info*))[0x7f6821384228]
sql/sql_parse.cc:3456(mysql_execute_command(THD*))[0x5581cc97cfa5]
sql/log_event.cc:4458(Query_log_event::do_apply_event(rpl_group_info*, char const*, unsigned int))[0x5581cc96d1c2]
/lib64/libpthread.so.0(+0x7dc5)[0x7f6836987dc5]
/lib64/libc.so.6(clone+0x6d)[0x7f6834da9ced]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f67f0df256f): CREATE TABLE IF NOT EXISTS test.foo (id INT AUTO_INCREMENT, p INT, PRIMARY KEY (id))
Connection ID (thread ID): 1
Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Comment by Rob Brown [ 2016-09-19 ]

The node receiving the "REPLACE INTO" or "INSERT INTO" command from a MySQL client is the only node that crashes. The node directly receiving the "CREATE TABLE" command from a MySQL client never crashes even though the mysqld.log file on the crashing node always mentions the "CREATE TABLE" query (that it must have received via sync communication from the other node).

Comment by Rob Brown [ 2016-09-19 ]

The only reason I suspect this is only pertinent to Galera is because when the nodes are configured using normal master ring, (where each node is a slave of the other instead of in a galera cluster), then none of the nodes will crash, regardless of which one receives the CREATE TABLE and which one receives the INSERT. Both servers will stay alive and strong.

Generated at Thu Feb 08 07:45:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.