[MXS-968] Maxinfo throws SIGSEGV and crashes maxscale Created: 2016-11-14  Updated: 2016-11-21  Resolved: 2016-11-21

Status: Closed
Project: MariaDB MaxScale
Component/s: maxinfo
Affects Version/s: 2.0.1
Fix Version/s: 2.0.2

Type: Bug Priority: Blocker
Reporter: Christopher Swingler Assignee: Esa Korhonen
Resolution: Fixed Votes: 0
Labels: None
Environment:

Ubuntu 14.04.5 LTS, 3.13.0-58-generic #97-Ubuntu SMP Wed Jul 8 02:56:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux


Sprint: 2016-22

 Description   

I'm still working to get some specifics, but after querying maxinfo enough times it will attempt to access a dereferenced pointer and crash. Working to determine if this is specific to the SQL listener or the HTTP listener, as we're querying both. We are dumping information out of maxinfo at a rate of probably twice per second; but I'm able to reproduce this within 30 minutes or so by simply turning our polling rate way up.

Logs report:

2016-11-12 22:50:52   error  : Fatal: MaxScale 2.0.1 received fatal signal 11. Attempting backtrace.
2016-11-12 22:50:52   error  : Commit ID: fa2a66719554d13a00db5c81c5c9ffd5b3a2ce14 System name: Linux Release string: Ubuntu 14.04.5 LTS
2016-11-12 22:50:52   error  :   /usr/bin/maxscale() [0x403c80]
2016-11-12 22:50:52   error  :   /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330) [0x7fa77b15e330]
2016-11-12 22:50:52   error  :   /usr/lib/x86_64-linux-gnu/maxscale/libmaxinfo.so(+0x314e) [0x7fa774bea14e]
2016-11-12 22:50:52   error  :   /usr/lib/x86_64-linux-gnu/maxscale/libMySQLClient.so(+0x36c4) [0x7fa772ae96c4]
2016-11-12 22:50:52   error  :   /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0x32ab2) [0x7fa77b5fdab2]
2016-11-12 22:50:52   error  :   /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0x32716) [0x7fa77b5fd716]
2016-11-12 22:50:52   error  :   /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(dcb_process_zombies+0x230) [0x7fa77b5fd51a]
2016-11-12 22:50:52   error  :   /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(poll_waitevents+0x719) [0x7fa77b613ed4]
2016-11-12 22:50:52   error  :   /usr/bin/maxscale(main+0x191b) [0x406e29]
2016-11-12 22:50:52   error  :   /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fa77a9cef45]
2016-11-12 22:50:52   error  :   /usr/bin/maxscale() [0x403639]

I was able to get a core dump, as it may possibly contain passwords/hashes that we'd like to avoid sharing please reach out of me for a copy.

Loading the core dump in GDB does get this:

root@maxscale-cluster-a-rax02:/var/log/maxscale# gdb /usr/bin/maxscale core
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/maxscale...done.
[New LWP 14660]
[New LWP 14661]
[New LWP 14662]
[New LWP 14663]
[New LWP 14664]
[New LWP 14665]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/maxscale --user=maxscale'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f30f7cea14e in closeSession (instance=0x179bca0, router_session=0x137d14b0)
    at /home/vagrant/workspace/server/modules/routing/maxinfo/maxinfo.c:243
243			while (ptr && ptr->next != session)



 Comments   
Comment by Christopher Swingler [ 2016-11-14 ]

For what it's worth, it looks like this may be directly related to MXS-325. Running this script:

#!/usr/bin/env bash
 
ENDPOINTS=("variables" "status" "services" "listeners" "modules" "sessions" "clients" "servers" "event/times")
 
while true; do
  for e in "${ENDPOINTS[@]}"; do
    echo "testing $e..."
    ab -l -n 20000 -c 500 "http://localhost:8003/$e"
  done
done

against 2.0.1 will result in infinitely increasing memory consumption until it segfaults with the information above. Switching to 1.4.4 and repeating the same test leaves memory utilization static.

Did the memory fix from MXS-325 ever end up getting back-patched?

Comment by Johan Wikman [ 2016-11-15 ]

Yes, the memory fix is in 2.0.1, so this seems to be something else. Thanks for reporting, we'll look into this.

Comment by Esa Korhonen [ 2016-11-16 ]

The leaking memory is allocated in httpd.c:362 and assigned to dcb->data. After the maxinfo query has been completed, the dcb is closed as it should. Usually, the dcb->data is freed by calling a protocol specific authenticator free function. However, because the HTTPD-protocol in MaxScale does not specify an authenticator, an empty null-authenticator is used and the data never freed. The next query then generates another dcb and leaks more memory.

This should be quickly fixable, but does not fix the crash when thread count > 0.

Comment by Esa Korhonen [ 2016-11-17 ]

The crash with multiple threads executing the maxinfo-variables-query has been found to originate in maxinfo_exec.c:884. An address to a static variable is returned, and all threads modify the same data, causing the array index to go out-of-bounds. Fix is on the way.

Generated at Thu Feb 08 04:03:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.