[MXS-2818] malloc deadlock in sigfatal_handler Created: 2019-12-31  Updated: 2021-08-24  Resolved: 2021-08-24

Status: Closed
Project: MariaDB MaxScale
Component/s: N/A
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: lishubing Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MXS-2584 Race condition between startup/shutdo... Closed

 Description   

Callback function sigfatal_handler called backtrace, which is not signal-safe. If a fatal signal is triggered during a malloc procedure, a deadlock happened. Here is the stack:

#0 0x00007f4f62499c0c in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007f4f6241618d in _L_lock_15512 () from /lib64/libc.so.6
#2 0x00007f4f62413273 in malloc () from /lib64/libc.so.6 // call malloc again, deadlock
#3 0x00007f4f64d00540 in _dl_map_object_deps () from /lib64/ld-linux-x86-64.so.2
#4 0x00007f4f64d0727b in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#5 0x00007f4f64d02754 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#6 0x00007f4f64d06b0b in _dl_open () from /lib64/ld-linux-x86-64.so.2
#7 0x00007f4f624caa02 in do_dlopen () from /lib64/libc.so.6
#8 0x00007f4f64d02754 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#9 0x00007f4f624caac2 in __libc_dlopen_mode () from /lib64/libc.so.6
#10 0x00007f4f624a1eb5 in init () from /lib64/libc.so.6
#11 0x00007f4f644b2e90 in pthread_once () from /lib64/libpthread.so.0
#12 0x00007f4f624a1fcc in backtrace () from /lib64/libc.so.6 // call backtrace
#13 0x0000000000407fcf in sigfatal_handler (i=11) at /root/MaxScale/server/core/gateway.cc:501
#14 <signal handler called> // sig fatal triggered
#15 0x00007f4f62410250 in _int_malloc () from /lib64/libc.so.6
#16 0x00007f4f62413ca4 in calloc () from /lib64/libc.so.6 // first, call calloc here
#17 0x00007f4f6497d5be in mxs_calloc (nmemb=<optimized out>, size=<optimized out>) at /root/MaxScale/server/core/alloc.cc:58
#18 0x00007f4f6497f109 in gwbuf_clone_one (buf=buf@entry=0xa89010) at /root/MaxScale/server/core/buffer.cc:304
#19 0x00007f4f6497f18a in gwbuf_clone (buf=0xa89010) at /root/MaxScale/server/core/buffer.cc:332
#20 0x00007f4f5ef1fe37 in handle_got_target (inst=inst@entry=0x9f1140, rses=rses@entry=0xa7ef10, querybuf=querybuf@entry=0xa89010, target=std::tr1::shared_ptr (count 2, weak 0) 0xa7e2d0, store=<optimized out>) at /root/MaxScale/server/modules/routing/re
adwritesplit/rwsplit_route_stmt.cc:1506
#21 0x00007f4f5ef2319f in route_single_stmt (inst=inst@entry=0x9f1140, rses=rses@entry=0xa7ef10, querybuf=querybuf@entry=0xa89010, info=...) at /root/MaxScale/server/modules/routing/readwritesplit/rwsplit_route_stmt.cc:428
#22 0x00007f4f5ef187b0 in routeQuery (instance=0x9f1140, router_session=0xa7ef10, querybuf=0xa89010) at /root/MaxScale/server/modules/routing/readwritesplit/readwritesplit.cc:904
#23 0x00007f4f5e0d28df in route_by_statement (p_readbuf=<optimized out>, capabilities=<optimized out>, session=<optimized out>) at /root/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:1845
#24 gw_read_finish_processing (capabilities=<optimized out>, read_buffer=0x0, dcb=<optimized out>) at /root/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:1200
#25 gw_read_normal_data (nbytes_read=<optimized out>, read_buffer=0xa89010, dcb=<optimized out>) at /root/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:1142
#26 gw_read_client_event (dcb=<optimized out>) at /root/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:546
#27 0x00007f4f649949c9 in dcb_process_poll_events (dcb=0xa7de10, events=5) at /root/MaxScale/server/core/dcb.cc:3242
#28 0x00007f4f64994c5b in dcb_handler (dcb=0xa7de10, events=5) at /root/MaxScale/server/core/dcb.cc:3329
#29 0x00007f4f649d54dd in maxscale::Worker::poll_waitevents (this=this@entry=0x96e840) at /root/MaxScale/server/core/worker.cc:1235
#30 0x00007f4f649d5773 in maxscale::Worker::run (this=0x96e840) at /root/MaxScale/server/core/worker.cc:901
#31 0x0000000000406af5 in main (argc=<optimized out>, argv=<optimized out>) at /root/MaxScale/server/core/gateway.cc:2276



 Comments   
Comment by markus makela [ 2019-12-31 ]

I think this will be fixed by commit 03d45c2ace45a94c6b33a2281571d5c38cb53578.

The aforementioned commit won't prevent malloc deadlocks as the use of malloc is still not signal-safe if it acquires locks.

From the backtrace(3) manpage:

*  backtrace() and backtrace_symbols_fd() don't call malloc() explicitly, but they are part of libgcc, which gets loaded dynamically when first used.  Dynamic loading usually triggers a call to malloc(3).  If you need certain calls to these
          two functions to not allocate memory (in signal handlers, for example), you need to make sure libgcc is loaded beforehand.

Looks like the problem is that libgcc is loaded for the first time in the signal handler which causes malloc to be called. Making sure that it's loaded before any signal handlers are installed would prevent that although that would still result in a deadlock as other parts of the signal handling use non-signal-safe functions.

Comment by lishubing [ 2020-01-03 ]

So, how to make sure libgcc is loaded before backtrace is called?

Comment by markus makela [ 2020-01-16 ]

I don't think we'll solve this easily as parts of the fatal signal handling code can cause memory to be allocated via malloc regardless of whether we'll initialize backtrace. I think we'd have to investigate this a bit more.

Comment by markus makela [ 2021-08-24 ]

I'm just going to close this as Won't Fix since we assume the system malloc never segfaults. Making the fatal signal handler not use malloc is more trouble than it's worth.

Generated at Thu Feb 08 04:16:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.