[MXS-2416] Possible memory leak Created: 2019-04-02  Updated: 2019-04-12  Resolved: 2019-04-12

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 2.3.4, 2.3.5
Fix Version/s: 2.3.6

Type: Bug Priority: Major
Reporter: Tim Westervoorde Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS 7 / Kernel 4.4.169 / MariaDB 10.3.13 galera



 Description   

After we upgraded from 2.2.x to 2.3.x and moved our production cluster to this version maxscale keeps increasing it's memory usage until reaches 100% (we didn't let it OOM due to the production nature). Our configuration is as follows:

type=service
router=readwritesplit
servers=db1,db2,db3
connection_keepalive=30
max_sescmd_history=0
disable_sescmd_history=true
master_accept_reads=false
master_failure_mode=fail_on_write
enable_root_user=true
master_reconnection=false
prune_sescmd_history=true
causal_reads=true
causal_reads_timeout=2

What is the best way to debug this further? We handle about 2000 qps with a mix of short and long lived sessions



 Comments   
Comment by markus makela [ 2019-04-02 ]

A quick way to debug this would be to run it under valgrind. Most distros offer valgrind as a package and you can run MaxScale in the terminal under valgrind with the following command:

/usr/bin/sudo -u maxscale --  valgrind --leak-check=full /usr/bin/maxscale -d

Then once you see the memory growing, in another terminal, execute the following command:

/usr/bin/sudo -u maxscale vgdb --pid=$(pgrep memcheck) leak_check definiteleak any

You should receive output similar to the following:

==5553== LEAK SUMMARY:
==5553==    definitely lost: 0 bytes in 0 blocks
==5553==    indirectly lost: 0 bytes in 0 blocks
==5553==      possibly lost: 775,458 bytes in 951 blocks
==5553==    still reachable: 367,868 bytes in 5,180 blocks
==5553==         suppressed: 0 bytes in 0 blocks
==5553== Reachable blocks (those to which a pointer was found) are not shown.
==5553== To see them, add 'reachable any' args to leak_check
==5553== 

If you can produce that output, we can most likely fix it quite easily.

Comment by Tim Westervoorde [ 2019-04-03 ]

Just after a few seconds (and before maxscale under valgrind crashes)

==72468== 5,043,871 (4,165,503 direct, 878,368 indirect) bytes in 28,130 blocks are definitely lost in loss record 1,751 of 1,752
==72468==    at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==72468==    by 0x4E9138D: mxs_malloc (alloc.cc:32)
==72468==    by 0x4E93668: gwbuf_alloc (buffer.cc:48)
==72468==    by 0x4EB1FEC: dcb_basic_read (dcb.cc:709)
==72468==    by 0x4EB1FEC: dcb_read (dcb.cc:612)
==72468==    by 0xB7C8E66: gw_read_client_event(dcb*) (mysql_client.cc:492)
==72468==    by 0x4EB2776: dcb_process_poll_events(dcb*, unsigned int) (dcb.cc:3136)
==72468==    by 0x4EB2910: dcb_handler(dcb*, unsigned int) (dcb.cc:3221)
==72468==    by 0x4F0E544: maxbase::Worker::poll_waitevents() (worker.cc:842)
==72468==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==72468==    by 0x671106F: ??? (in /usr/lib64/libstdc++.so.6.0.19)
==72468==    by 0x56DBDD4: start_thread (in /usr/lib64/libpthread-2.17.so)
==72468==    by 0x7737EAC: clone (in /usr/lib64/libc-2.17.so)
==72468==
==72468== 5,072,835 (4,185,123 direct, 887,712 indirect) bytes in 28,416 blocks are definitely lost in loss record 1,752 of 1,752
==72468==    at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==72468==    by 0x4E9138D: mxs_malloc (alloc.cc:32)
==72468==    by 0x4E93668: gwbuf_alloc (buffer.cc:48)
==72468==    by 0x4EB1FEC: dcb_basic_read (dcb.cc:709)
==72468==    by 0x4EB1FEC: dcb_read (dcb.cc:612)
==72468==    by 0xB7C8E66: gw_read_client_event(dcb*) (mysql_client.cc:492)
==72468==    by 0x4EB2776: dcb_process_poll_events(dcb*, unsigned int) (dcb.cc:3136)
==72468==    by 0x4EB2910: dcb_handler(dcb*, unsigned int) (dcb.cc:3221)
==72468==    by 0x4F0E544: maxbase::Worker::poll_waitevents() (worker.cc:842)
==72468==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==72468==    by 0x4087F8: run (worker.hh:409)
==72468==    by 0x4087F8: main (gateway.cc:2256)

Comment by Tim Westervoorde [ 2019-04-03 ]

The following is logged on the main valgrind terminal

==72468== Mismatched free() / delete / delete []
==72468==    at 0x4C2B16D: operator delete(void*) (vg_replace_malloc.c:576)
==72468==    by 0x4F02367: _M_release (shared_ptr_base.h:144)
==72468==    by 0x4F02367: ~__shared_count (shared_ptr_base.h:546)
==72468==    by 0x4F02367: ~__shared_ptr (shared_ptr_base.h:781)
==72468==    by 0x4F02367: ~shared_ptr (shared_ptr.h:93)
==72468==    by 0x4F02367: ~QueryInfo (session.hh:64)
==72468==    by 0x4F02367: destroy<maxscale::Session::QueryInfo> (new_allocator.h:124)
==72468==    by 0x4F02367: _M_pop_back_aux (deque.tcc:510)
==72468==    by 0x4F02367: pop_back (stl_deque.h:1451)
==72468==    by 0x4F02367: maxscale::Session::retain_statement(gwbuf*) (session.cc:1499)
==72468==    by 0xB7C9B94: route_by_statement (mysql_client.cc:1673)
==72468==    by 0xB7C9B94: gw_read_finish_processing (mysql_client.cc:1212)
==72468==    by 0xB7C9B94: gw_read_normal_data (mysql_client.cc:1155)
==72468==    by 0xB7C9B94: gw_read_client_event(dcb*) (mysql_client.cc:547)
==72468==    by 0x4EB2776: dcb_process_poll_events(dcb*, unsigned int) (dcb.cc:3136)
==72468==    by 0x4EB2910: dcb_handler(dcb*, unsigned int) (dcb.cc:3221)
==72468==    by 0x4F0E544: maxbase::Worker::poll_waitevents() (worker.cc:842)
==72468==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==72468==    by 0x4087F8: run (worker.hh:409)
==72468==    by 0x4087F8: main (gateway.cc:2256)
==72468==  Address 0xdfaa200 is 0 bytes inside a block of size 72 alloc'd
==72468==    at 0x4C2B955: calloc (vg_replace_malloc.c:711)
==72468==    by 0x4E913CD: mxs_calloc (alloc.cc:58)
==72468==    by 0x4E935C8: gwbuf_clone_one(gwbuf*) (buffer.cc:165)
==72468==    by 0x4E93789: gwbuf_clone (buffer.cc:199)
==72468==    by 0x4F01F6F: maxscale::Session::retain_statement(gwbuf*) (session.cc:1493)
==72468==    by 0xB7C9B94: route_by_statement (mysql_client.cc:1673)
==72468==    by 0xB7C9B94: gw_read_finish_processing (mysql_client.cc:1212)
==72468==    by 0xB7C9B94: gw_read_normal_data (mysql_client.cc:1155)
==72468==    by 0xB7C9B94: gw_read_client_event(dcb*) (mysql_client.cc:547)
==72468==    by 0x4EB2776: dcb_process_poll_events(dcb*, unsigned int) (dcb.cc:3136)
==72468==    by 0x4EB2910: dcb_handler(dcb*, unsigned int) (dcb.cc:3221)
==72468==    by 0x4F0E544: maxbase::Worker::poll_waitevents() (worker.cc:842)
==72468==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==72468==    by 0x4087F8: run (worker.hh:409)
==72468==    by 0x4087F8: main (gateway.cc:2256)
==72468==
==72468== Mismatched free() / delete / delete []
==72468==    at 0x4C2B16D: operator delete(void*) (vg_replace_malloc.c:576)
==72468==    by 0x4F023E1: _M_release (shared_ptr_base.h:144)
==72468==    by 0x4F023E1: ~__shared_count (shared_ptr_base.h:546)
==72468==    by 0x4F023E1: ~__shared_ptr (shared_ptr_base.h:781)
==72468==    by 0x4F023E1: ~shared_ptr (shared_ptr.h:93)
==72468==    by 0x4F023E1: ~QueryInfo (session.hh:64)
==72468==    by 0x4F023E1: destroy<maxscale::Session::QueryInfo> (new_allocator.h:124)
==72468==    by 0x4F023E1: pop_back (stl_deque.h:1448)
==72468==    by 0x4F023E1: maxscale::Session::retain_statement(gwbuf*) (session.cc:1499)
==72468==    by 0xB7C9B94: route_by_statement (mysql_client.cc:1673)
==72468==    by 0xB7C9B94: gw_read_finish_processing (mysql_client.cc:1212)
==72468==    by 0xB7C9B94: gw_read_normal_data (mysql_client.cc:1155)
==72468==    by 0xB7C9B94: gw_read_client_event(dcb*) (mysql_client.cc:547)
==72468==    by 0x4EB2776: dcb_process_poll_events(dcb*, unsigned int) (dcb.cc:3136)
==72468==    by 0x4EB2910: dcb_handler(dcb*, unsigned int) (dcb.cc:3221)
==72468==    by 0x4F0E544: maxbase::Worker::poll_waitevents() (worker.cc:842)
==72468==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==72468==    by 0x4087F8: run (worker.hh:409)
==72468==    by 0x4087F8: main (gateway.cc:2256)
==72468==  Address 0xdff3a40 is 0 bytes inside a block of size 72 alloc'd
==72468==    at 0x4C2B955: calloc (vg_replace_malloc.c:711)
==72468==    by 0x4E913CD: mxs_calloc (alloc.cc:58)
==72468==    by 0x4E935C8: gwbuf_clone_one(gwbuf*) (buffer.cc:165)
==72468==    by 0x4E93789: gwbuf_clone (buffer.cc:199)
==72468==    by 0x4F01F6F: maxscale::Session::retain_statement(gwbuf*) (session.cc:1493)
==72468==    by 0xB7C9B94: route_by_statement (mysql_client.cc:1673)
==72468==    by 0xB7C9B94: gw_read_finish_processing (mysql_client.cc:1212)
==72468==    by 0xB7C9B94: gw_read_normal_data (mysql_client.cc:1155)
==72468==    by 0xB7C9B94: gw_read_client_event(dcb*) (mysql_client.cc:547)
==72468==    by 0x4EB2776: dcb_process_poll_events(dcb*, unsigned int) (dcb.cc:3136)
==72468==    by 0x4EB2910: dcb_handler(dcb*, unsigned int) (dcb.cc:3221)
==72468==    by 0x4F0E544: maxbase::Worker::poll_waitevents() (worker.cc:842)
==72468==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==72468==    by 0x4087F8: run (worker.hh:409)
==72468==    by 0x4087F8: main (gateway.cc:2256)
==72468==
==72468== Mismatched free() / delete / delete []
==72468==    at 0x4C2B16D: operator delete(void*) (vg_replace_malloc.c:576)
==72468==    by 0x4F04864: _M_release (shared_ptr_base.h:144)
==72468==    by 0x4F04864: ~__shared_count (shared_ptr_base.h:546)
==72468==    by 0x4F04864: ~__shared_ptr (shared_ptr_base.h:781)
==72468==    by 0x4F04864: ~shared_ptr (shared_ptr.h:93)
==72468==    by 0x4F04864: ~QueryInfo (session.hh:64)
==72468==    by 0x4F04864: _Destroy<maxscale::Session::QueryInfo> (stl_construct.h:93)
==72468==    by 0x4F04864: __destroy<maxscale::Session::QueryInfo*> (stl_construct.h:103)
==72468==    by 0x4F04864: _Destroy<maxscale::Session::QueryInfo*> (stl_construct.h:126)
==72468==    by 0x4F04864: _Destroy<maxscale::Session::QueryInfo*, maxscale::Session::QueryInfo> (stl_construct.h:151)
==72468==    by 0x4F04864: std::deque<maxscale::Session::QueryInfo, std::allocator<maxscale::Session::QueryInfo> >::_M_destroy_data_aux(std::_Deque_iterator<maxscale::Session::QueryInfo, maxscale::Session::QueryInfo&, maxscale::Session::QueryInfo*>, std::_Deque_iterator<maxscale::Session::QueryInfo, maxscale::Session::QueryInfo&, maxscale::Session::QueryInfo*>) (deque.tcc:811)
==72468==    by 0x4F01AEA: _M_destroy_data (stl_deque.h:1853)
==72468==    by 0x4F01AEA: ~deque (stl_deque.h:918)
==72468==    by 0x4F01AEA: maxscale::Session::~Session() (session.cc:1213)
==72468==    by 0x4F01CFC: session_final_free (session.cc:398)
==72468==    by 0x4F01CFC: session_free (session.cc:364)
==72468==    by 0x4F01CFC: session_put_ref (session.cc:823)
==72468==    by 0x4F01E4D: session_unlink_backend_dcb (session.cc:286)
==72468==    by 0x4EB39E0: dcb_maybe_add_persistent (dcb.cc:1335)
==72468==    by 0x4EB39E0: dcb_final_close (dcb.cc:1237)
==72468==    by 0x4EEC803: maxscale::RoutingWorker::delete_zombies() (routingworker.cc:553)
==72468==    by 0x4EED075: maxscale::RoutingWorker::epoll_tick() (routingworker.cc:638)
==72468==    by 0x4F0E6B5: maxbase::Worker::poll_waitevents() (worker.cc:884)
==72468==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==72468==    by 0x4087F8: run (worker.hh:409)
==72468==    by 0x4087F8: main (gateway.cc:2256)
==72468==  Address 0xe0f29e0 is 0 bytes inside a block of size 72 alloc'd
==72468==    at 0x4C2B955: calloc (vg_replace_malloc.c:711)
==72468==    by 0x4E913CD: mxs_calloc (alloc.cc:58)
==72468==    by 0x4E935C8: gwbuf_clone_one(gwbuf*) (buffer.cc:165)
==72468==    by 0x4E93789: gwbuf_clone (buffer.cc:199)
==72468==    by 0x4F01F6F: maxscale::Session::retain_statement(gwbuf*) (session.cc:1493)
==72468==    by 0xB7C9B94: route_by_statement (mysql_client.cc:1673)
==72468==    by 0xB7C9B94: gw_read_finish_processing (mysql_client.cc:1212)
==72468==    by 0xB7C9B94: gw_read_normal_data (mysql_client.cc:1155)
==72468==    by 0xB7C9B94: gw_read_client_event(dcb*) (mysql_client.cc:547)
==72468==    by 0x4EB2776: dcb_process_poll_events(dcb*, unsigned int) (dcb.cc:3136)
==72468==    by 0x4EB2910: dcb_handler(dcb*, unsigned int) (dcb.cc:3221)
==72468==    by 0x4F0E544: maxbase::Worker::poll_waitevents() (worker.cc:842)
==72468==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==72468==    by 0x4087F8: run (worker.hh:409)
==72468==    by 0x4087F8: main (gateway.cc:2256)
==72468==
==72468== Warning: invalid file descriptor 1031 in syscall socket()
==72468== Warning: invalid file descriptor 1031 in syscall socket()
==72468== Warning: invalid file descriptor 1031 in syscall socket()
<cut>
==72468== Mismatched free() / delete / delete []
==72468==    at 0x4C2B16D: operator delete(void*) (vg_replace_malloc.c:576)
==72468==    by 0x4F049FF: _M_release (shared_ptr_base.h:144)
==72468==    by 0x4F049FF: ~__shared_count (shared_ptr_base.h:546)
==72468==    by 0x4F049FF: ~__shared_ptr (shared_ptr_base.h:781)
==72468==    by 0x4F049FF: ~shared_ptr (shared_ptr.h:93)
==72468==    by 0x4F049FF: ~QueryInfo (session.hh:64)
==72468==    by 0x4F049FF: _Destroy<maxscale::Session::QueryInfo> (stl_construct.h:93)
==72468==    by 0x4F049FF: __destroy<maxscale::Session::QueryInfo*> (stl_construct.h:103)
==72468==    by 0x4F049FF: _Destroy<maxscale::Session::QueryInfo*> (stl_construct.h:126)
==72468==    by 0x4F049FF: _Destroy<maxscale::Session::QueryInfo*, maxscale::Session::QueryInfo> (stl_construct.h:151)
==72468==    by 0x4F049FF: std::deque<maxscale::Session::QueryInfo, std::allocator<maxscale::Session::QueryInfo> >::_M_destroy_data_aux(std::_Deque_iterator<maxscale::Session::QueryInfo, maxscale::Session::QueryInfo&, maxscale::Session::QueryInfo*>, std::_Deque_iterator<maxscale::Session::QueryInfo, maxscale::Session::QueryInfo&, maxscale::Session::QueryInfo*>) (deque.tcc:817)
==72468==    by 0x4F01AEA: _M_destroy_data (stl_deque.h:1853)
==72468==    by 0x4F01AEA: ~deque (stl_deque.h:918)
==72468==    by 0x4F01AEA: maxscale::Session::~Session() (session.cc:1213)
==72468==    by 0x4F01CFC: session_final_free (session.cc:398)
==72468==    by 0x4F01CFC: session_free (session.cc:364)
==72468==    by 0x4F01CFC: session_put_ref (session.cc:823)
==72468==    by 0x4F01E4D: session_unlink_backend_dcb (session.cc:286)
==72468==    by 0x4EB39E0: dcb_maybe_add_persistent (dcb.cc:1335)
==72468==    by 0x4EB39E0: dcb_final_close (dcb.cc:1237)
==72468==    by 0x4EEC803: maxscale::RoutingWorker::delete_zombies() (routingworker.cc:553)
==72468==    by 0x4EED075: maxscale::RoutingWorker::epoll_tick() (routingworker.cc:638)
==72468==    by 0x4F0E6B5: maxbase::Worker::poll_waitevents() (worker.cc:884)
==72468==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==72468==    by 0x4087F8: run (worker.hh:409)
==72468==    by 0x4087F8: main (gateway.cc:2256)
==72468==  Address 0xd2129d0 is 0 bytes inside a block of size 72 alloc'd
==72468==    at 0x4C2B955: calloc (vg_replace_malloc.c:711)
==72468==    by 0x4E913CD: mxs_calloc (alloc.cc:58)
==72468==    by 0x4E935C8: gwbuf_clone_one(gwbuf*) (buffer.cc:165)
==72468==    by 0x4E93789: gwbuf_clone (buffer.cc:199)
==72468==    by 0x4F01F6F: maxscale::Session::retain_statement(gwbuf*) (session.cc:1493)
==72468==    by 0xB7C9B94: route_by_statement (mysql_client.cc:1673)
==72468==    by 0xB7C9B94: gw_read_finish_processing (mysql_client.cc:1212)
==72468==    by 0xB7C9B94: gw_read_normal_data (mysql_client.cc:1155)
==72468==    by 0xB7C9B94: gw_read_client_event(dcb*) (mysql_client.cc:547)
==72468==    by 0x4EB2776: dcb_process_poll_events(dcb*, unsigned int) (dcb.cc:3136)
==72468==    by 0x4EB2910: dcb_handler(dcb*, unsigned int) (dcb.cc:3221)
==72468==    by 0x4F0E544: maxbase::Worker::poll_waitevents() (worker.cc:842)
==72468==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==72468==    by 0x4087F8: run (worker.hh:409)
==72468==    by 0x4087F8: main (gateway.cc:2256)
==72468==
==72468== Thread 4:
==72468== Mismatched free() / delete / delete []
==72468==    at 0x4C2B16D: operator delete(void*) (vg_replace_malloc.c:576)
==72468==    by 0x4F0492F: _M_release (shared_ptr_base.h:144)
==72468==    by 0x4F0492F: ~__shared_count (shared_ptr_base.h:546)
==72468==    by 0x4F0492F: ~__shared_ptr (shared_ptr_base.h:781)
==72468==    by 0x4F0492F: ~shared_ptr (shared_ptr.h:93)
==72468==    by 0x4F0492F: ~QueryInfo (session.hh:64)
==72468==    by 0x4F0492F: _Destroy<maxscale::Session::QueryInfo> (stl_construct.h:93)
==72468==    by 0x4F0492F: __destroy<maxscale::Session::QueryInfo*> (stl_construct.h:103)
==72468==    by 0x4F0492F: _Destroy<maxscale::Session::QueryInfo*> (stl_construct.h:126)
==72468==    by 0x4F0492F: _Destroy<maxscale::Session::QueryInfo*, maxscale::Session::QueryInfo> (stl_construct.h:151)
==72468==    by 0x4F0492F: std::deque<maxscale::Session::QueryInfo, std::allocator<maxscale::Session::QueryInfo> >::_M_destroy_data_aux(std::_Deque_iterator<maxscale::Session::QueryInfo, maxscale::Session::QueryInfo&, maxscale::Session::QueryInfo*>, std::_Deque_iterator<maxscale::Session::QueryInfo, maxscale::Session::QueryInfo&, maxscale::Session::QueryInfo*>) (deque.tcc:813)
==72468==    by 0x4F01AEA: _M_destroy_data (stl_deque.h:1853)
==72468==    by 0x4F01AEA: ~deque (stl_deque.h:918)
==72468==    by 0x4F01AEA: maxscale::Session::~Session() (session.cc:1213)
==72468==    by 0x4F01CFC: session_final_free (session.cc:398)
==72468==    by 0x4F01CFC: session_free (session.cc:364)
==72468==    by 0x4F01CFC: session_put_ref (session.cc:823)
==72468==    by 0x4F01E4D: session_unlink_backend_dcb (session.cc:286)
==72468==    by 0x4EB0947: dcb_final_free(dcb*) (dcb.cc:246)
==72468==    by 0x4EEC803: maxscale::RoutingWorker::delete_zombies() (routingworker.cc:553)
==72468==    by 0x4EED075: maxscale::RoutingWorker::epoll_tick() (routingworker.cc:638)
==72468==    by 0x4F0E6B5: maxbase::Worker::poll_waitevents() (worker.cc:884)
==72468==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==72468==    by 0x671106F: ??? (in /usr/lib64/libstdc++.so.6.0.19)
==72468==    by 0x56DBDD4: start_thread (in /usr/lib64/libpthread-2.17.so)
==72468==  Address 0xd8c97e0 is 0 bytes inside a block of size 72 alloc'd
==72468==    at 0x4C2B955: calloc (vg_replace_malloc.c:711)
==72468==    by 0x4E913CD: mxs_calloc (alloc.cc:58)
==72468==    by 0x4E935C8: gwbuf_clone_one(gwbuf*) (buffer.cc:165)
==72468==    by 0x4E93789: gwbuf_clone (buffer.cc:199)
==72468==    by 0x4F01F6F: maxscale::Session::retain_statement(gwbuf*) (session.cc:1493)
==72468==    by 0xB7C9B94: route_by_statement (mysql_client.cc:1673)
==72468==    by 0xB7C9B94: gw_read_finish_processing (mysql_client.cc:1212)
==72468==    by 0xB7C9B94: gw_read_normal_data (mysql_client.cc:1155)
==72468==    by 0xB7C9B94: gw_read_client_event(dcb*) (mysql_client.cc:547)
==72468==    by 0x4EB2776: dcb_process_poll_events(dcb*, unsigned int) (dcb.cc:3136)
==72468==    by 0x4EB2910: dcb_handler(dcb*, unsigned int) (dcb.cc:3221)
==72468==    by 0x4F0E544: maxbase::Worker::poll_waitevents() (worker.cc:842)
==72468==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==72468==    by 0x671106F: ??? (in /usr/lib64/libstdc++.so.6.0.19)
==72468==    by 0x56DBDD4: start_thread (in /usr/lib64/libpthread-2.17.so)
==72468==
==72468== Warning: invalid file descriptor 1031 in syscall accept()
==72468== Warning: invalid file descriptor 1031 in syscall accept()
==72468== Warning: invalid file descriptor 1031 in syscall accept()

Comment by markus makela [ 2019-04-03 ]

OK, that looks like a definite memory leak. The client request appears to leak which means something isn't freeing it.

Comment by Tim Westervoorde [ 2019-04-03 ]

Anything else I can do to debug this further?

Comment by markus makela [ 2019-04-03 ]

If you can reproduce it with some SQL statement or other behavior, it would allow us to quickly fix it.

Comment by Tim Westervoorde [ 2019-04-03 ]

Just connecting and disconnecting gives me

==73614== 120 (88 direct, 32 indirect) bytes in 2 blocks are definitely lost in loss record 1,178 of 1,512
==73614==    at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==73614==    by 0x4E9138D: mxs_malloc (alloc.cc:32)
==73614==    by 0x4E93668: gwbuf_alloc (buffer.cc:48)
==73614==    by 0x4EB1FEC: dcb_basic_read (dcb.cc:709)
==73614==    by 0x4EB1FEC: dcb_read (dcb.cc:612)
==73614==    by 0xB7C8E66: gw_read_client_event(dcb*) (mysql_client.cc:492)
==73614==    by 0x4EB2776: dcb_process_poll_events(dcb*, unsigned int) (dcb.cc:3136)
==73614==    by 0x4EB2910: dcb_handler(dcb*, unsigned int) (dcb.cc:3221)
==73614==    by 0x4F0E544: maxbase::Worker::poll_waitevents() (worker.cc:842)
==73614==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==73614==    by 0x4087F8: run (worker.hh:409)
==73614==    by 0x4087F8: main (gateway.cc:2256)

and on the main window on disconnect

==73614== Mismatched free() / delete / delete []
==73614==    at 0x4C2B16D: operator delete(void*) (vg_replace_malloc.c:576)
==73614==    by 0x4F04864: _M_release (shared_ptr_base.h:144)
==73614==    by 0x4F04864: ~__shared_count (shared_ptr_base.h:546)
==73614==    by 0x4F04864: ~__shared_ptr (shared_ptr_base.h:781)
==73614==    by 0x4F04864: ~shared_ptr (shared_ptr.h:93)
==73614==    by 0x4F04864: ~QueryInfo (session.hh:64)
==73614==    by 0x4F04864: _Destroy<maxscale::Session::QueryInfo> (stl_construct.h:93)
==73614==    by 0x4F04864: __destroy<maxscale::Session::QueryInfo*> (stl_construct.h:103)
==73614==    by 0x4F04864: _Destroy<maxscale::Session::QueryInfo*> (stl_construct.h:126)
==73614==    by 0x4F04864: _Destroy<maxscale::Session::QueryInfo*, maxscale::Session::QueryInfo> (stl_construct.h:151)
==73614==    by 0x4F04864: std::deque<maxscale::Session::QueryInfo, std::allocator<maxscale::Session::QueryInfo> >::_M_destroy_data_aux(std::_Deque_iterator<maxscale::Session::QueryInfo, maxscale::Session::QueryInfo&, maxscale::Session::QueryInfo*>, std::_Deque_iterator<maxscale::Session::QueryInfo, maxscale::Session::QueryInfo&, maxscale::Session::QueryInfo*>) (deque.tcc:811)
==73614==    by 0x4F01AEA: _M_destroy_data (stl_deque.h:1853)
==73614==    by 0x4F01AEA: ~deque (stl_deque.h:918)
==73614==    by 0x4F01AEA: maxscale::Session::~Session() (session.cc:1213)
==73614==    by 0x4F01CFC: session_final_free (session.cc:398)
==73614==    by 0x4F01CFC: session_free (session.cc:364)
==73614==    by 0x4F01CFC: session_put_ref (session.cc:823)
==73614==    by 0x4F01E4D: session_unlink_backend_dcb (session.cc:286)
==73614==    by 0x4EB39E0: dcb_maybe_add_persistent (dcb.cc:1335)
==73614==    by 0x4EB39E0: dcb_final_close (dcb.cc:1237)
==73614==    by 0x4EEC803: maxscale::RoutingWorker::delete_zombies() (routingworker.cc:553)
==73614==    by 0x4EED075: maxscale::RoutingWorker::epoll_tick() (routingworker.cc:638)
==73614==    by 0x4F0E6B5: maxbase::Worker::poll_waitevents() (worker.cc:884)
==73614==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==73614==    by 0x4087F8: run (worker.hh:409)
==73614==    by 0x4087F8: main (gateway.cc:2256)
==73614==  Address 0xceb0e80 is 0 bytes inside a block of size 72 alloc'd
==73614==    at 0x4C2B955: calloc (vg_replace_malloc.c:711)
==73614==    by 0x4E913CD: mxs_calloc (alloc.cc:58)
==73614==    by 0x4E935C8: gwbuf_clone_one(gwbuf*) (buffer.cc:165)
==73614==    by 0x4E93789: gwbuf_clone (buffer.cc:199)
==73614==    by 0x4F01F6F: maxscale::Session::retain_statement(gwbuf*) (session.cc:1493)
==73614==    by 0xB7C9B94: route_by_statement (mysql_client.cc:1673)
==73614==    by 0xB7C9B94: gw_read_finish_processing (mysql_client.cc:1212)
==73614==    by 0xB7C9B94: gw_read_normal_data (mysql_client.cc:1155)
==73614==    by 0xB7C9B94: gw_read_client_event(dcb*) (mysql_client.cc:547)
==73614==    by 0x4EB2776: dcb_process_poll_events(dcb*, unsigned int) (dcb.cc:3136)
==73614==    by 0x4EB2910: dcb_handler(dcb*, unsigned int) (dcb.cc:3221)
==73614==    by 0x4F0E544: maxbase::Worker::poll_waitevents() (worker.cc:842)
==73614==    by 0x4F0E740: maxbase::Worker::run(maxbase::Semaphore*) (worker.cc:541)
==73614==    by 0x4087F8: run (worker.hh:409)
==73614==    by 0x4087F8: main (gateway.cc:2256)

mariadb server is configured with

init-connect='SET NAMES utf8mb4'

Comment by markus makela [ 2019-04-03 ]

Managed to reproduce it, now all that needs to be done is to fix it.

Comment by Tim Westervoorde [ 2019-04-03 ]

Thanks for your efforts!

Comment by markus makela [ 2019-04-03 ]

Do you have retain_last_statements enabled? Looks like it leaks memory when that alloc/dealloc mismatch happens.

Comment by Tim Westervoorde [ 2019-04-03 ]

[maxscale]
threads=auto
retain_last_statements=5
dump_last_statements=on_error

We do, so that could be the cause.

Comment by markus makela [ 2019-04-03 ]

I'd recommend commenting out both and seeing whether that helps. I suspect it will solve the problem. Meanwhile, we'll work on fixing the code.

Comment by Tim Westervoorde [ 2019-04-03 ]

So far that seems to fix my issue. I'll continue to monitor it the next hour or so to verify it stays stable. Thanks!

Generated at Thu Feb 08 04:13:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.