[MXS-4501] Maxscale crashing suddenly without changes Created: 2023-02-06  Updated: 2023-05-22  Resolved: 2023-05-22

Status: Closed
Project: MariaDB MaxScale
Component/s: None
Affects Version/s: 6.1.4, 6.4.5
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Bryan Bancroft (Inactive) Assignee: Unassigned
Resolution: Incomplete Votes: 1
Labels: triage
Environment:

AWS ec2



 Description   

Maxscale stayed up for 60days. No logs of OS changes

Currently consistently fails on both noted versions after a short amount of time. No returns from maxctrl commands while "up" as they time out.

 
alert  : MaxScale 6.4.5 received fatal signal 6. Commit ID: e716c9cfc5f68f2e4ffada46c2d145918e7433bc System name: Linux Release  release 8.5 (Ootpa)
 
 
2023-02-06 20:53:44   alert  : MaxScale 6.4.5 received fatal signal 6. Commit ID: e716c9cfc5f68f2e4ffada46c2d145918e7433bc Systed Hat Enterprise Linux release 8.5 (Ootpa)
2023-02-06 20:53:44   alert  : Statement currently being classified: none/unknown
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f18d9e345d4 in read () from /lib64/libc.so.6
 
Thread 14 (Thread 0x7f18c17fa700 (LWP 1447096)):
#0  0x00007f18daac03fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f18dbe038f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2  0x00007f18dce1976c in std::condition_variable::wait<maxbase::ThreadPool::Thread::main()::<lambda()> > (__p=..., __lock=..., mofey_turenko_mariadb_com/MaxScale/maxutils/maxbase/src/threadpool.cc:81
#3  maxbase::ThreadPool::Thread::main (this=0x7f18b40017d0) at /home/timofey_turenko_mariadb_com/MaxScale/maxutils/maxbase/src/t
#4  0x00007f18dbe09ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#5  0x00007f18daaba17a in start_thread () from /lib64/libpthread.so.0
#6  0x00007f18d9e43dc3 in clone () from /lib64/libc.so.6
 
Thread 13 (Thread 0x7f18c1ffb700 (LWP 1447095)):
#0  0x00007f18daac03fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f18dbe038f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2  0x00007f18dce1976c in std::condition_variable::wait<maxbase::ThreadPool::Thread::main()::<lambda()> > (__p=..., __lock=..., mofey_turenko_mariadb_com/MaxScale/maxutils/maxbase/src/threadpool.cc:81
#3  maxbase::ThreadPool::Thread::main (this=0x7f18b4001300) at /home/timofey_turenko_mariadb_com/MaxScale/maxutils/maxbase/src/t
#4  0x00007f18dbe09ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#5  0x00007f18daaba17a in start_thread () from /lib64/libpthread.so.0
#6  0x00007f18d9e43dc3 in clone () from /lib64/libc.so.6
 
Thread 12 (Thread 0x7f18c27fc700 (LWP 1447094)):
#0  0x00007f18daac03fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f18dbe038f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2  0x00007f18dce1976c in std::condition_variable::wait<maxbase::ThreadPool::Thread::main()::<lambda()> > (__p=..., __lock=..., mofey_turenko_mariadb_com/MaxScale/maxutils/maxbase/src/threadpool.cc:81
#3  maxbase::ThreadPool::Thread::main (this=0x7f18b4000e30) at /home/timofey_turenko_mariadb_com/MaxScale/maxutils/maxbase/src/t
#4  0x00007f18dbe09ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#5  0x00007f18daaba17a in start_thread () from /lib64/libpthread.so.0
#6  0x00007f18d9e43dc3 in clone () from /lib64/libc.so.6
 
Thread 11 (Thread 0x7f18c2ffd700 (LWP 1447093)):
#0  0x00007f18d9e440f7 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f18dcf59599 in maxbase::Worker::poll_waitevents (this=this@entry=0x17b32b8) at /home/timofey_turenko_mariadb_com/MaxS.cc:792
#2  0x00007f18dcf598bf in maxbase::Worker::run (this=0x17b32b8, pSem=0x7ffe80dd6d70) at /home/timofey_turenko_mariadb_com/MaxScac:554
#3  0x00007f18dbe09ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#4  0x00007f18daaba17a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f18d9e43dc3 in clone () from /lib64/libc.so.6
 
Thread 10 (Thread 0x7f18c37fe700 (LWP 1447092)):
#0  0x00007f18daac074a in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f18dcf05c71 in __gthread_cond_timedwait (__abs_timeout=0x7f18c37fb230, __mutex=0x7f18dd2e5c18 <_ZN12_GLOBAL__N_19thise8 <_ZN12_GLOBAL__N_19this_unitE+104>) at /usr/include/c++/8/x86_64-redhat-linux/bits/gthr-default.h:871
#2  std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=<synthetic pinter>..., this=0x7f18dd2e5be8 <_ZN12_GLOBAL__N_19this_unitE+104>) at /usr/include/c++/8/condition_variable:178
#3  std::condition_variable::wait_until<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> ..., __lock=<synthetic pointer>..., this=0x7f18dd2e5be8 <_ZN12_GLOBAL__N_19this_unitE+104>) at /usr/include/c++/8/condition_vari
#4  std::condition_variable::wait_until<std::chrono::_V2::steady_clock, std::chrono::duration<long int, std::ratio<1, 1000000000:cleanup_thread_func()::<lambda()> > (__p=..., __atime=<synthetic pointer>..., __lock=<synthetic pointer>..., this=0x7f18dd2e5be104>) at /usr/include/c++/8/condition_variable:129
#5  HttpSql::ConnectionManager::cleanup_thread_func (this=0x7f18dd2e5b80 <_ZN12_GLOBAL__N_19this_unitE>) at /home/timofey_turenkre/sql_conn_manager.cc:163
#6  0x00007f18dbe09ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#7  0x00007f18daaba17a in start_thread () from /lib64/libpthread.so.0
#8  0x00007f18d9e43dc3 in clone () from /lib64/libc.so.6
 
Thread 9 (Thread 0x7f18c3fff700 (LWP 1447091)):
#0  0x00007f18d9e440f7 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f18dd01510a in MHD_epoll (daemon=daemon@entry=0x17bc720, may_block=may_block@entry=1) at daemon.c:4467
#2  0x00007f18dd016327 in MHD_polling_thread (cls=0x17bc720) at daemon.c:4746
#3  0x00007f18daaba17a in start_thread () from /lib64/libpthread.so.0
#4  0x00007f18d9e43dc3 in clone () from /lib64/libc.so.6
 
Thread 8 (Thread 0x7f18d08fd700 (LWP 1447090)):
#0  0x00007f18daac074a in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f18dcf46543 in __gthread_cond_timedwait (__abs_timeout=0x7f18d08fa250, __mutex=0x17bc3b8, __cond=0x17bc388) at /usr/i/bits/gthr-default.h:871
#2  std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=<synthetic pinter>..., this=0x17bc388) at /usr/include/c++/8/condition_variable:178
#3  std::condition_variable::wait_until<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> ..., __lock=<synthetic pointer>..., this=0x17bc388) at /usr/include/c++/8/condition_variable:119
#4  std::condition_variable::wait_until<std::chrono::_V2::steady_clock, std::chrono::duration<long int, std::ratio<1, 1000000000_thread_function()::<lambda()> > (__p=..., __atime=<synthetic pointer>..., __lock=<synthetic pointer>..., this=0x17bc388) at /usle:129
#5  MariaDBUserManager::updater_thread_function (this=0x17bc330) at /home/timofey_turenko_mariadb_com/MaxScale/server/modules/pr
#6  0x00007f18dbe09ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#7  0x00007f18daaba17a in start_thread () from /lib64/libpthread.so.0
#8  0x00007f18d9e43dc3 in clone () from /lib64/libc.so.6
 
Thread 7 (Thread 0x7f18d15a0700 (LWP 1447089)):
#0  0x00007f18d9e440f7 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f18dcf59599 in maxbase::Worker::poll_waitevents (this=this@entry=0x17838e0) at /home/timofey_turenko_mariadb_com/MaxS.cc:792
#2  0x00007f18dcf598bf in maxbase::Worker::run (this=0x17838e0, pSem=0x7ffe80dd9ca0) at /home/timofey_turenko_mariadb_com/MaxScac:554
#3  0x00007f18dbe09ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#4  0x00007f18daaba17a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f18d9e43dc3 in clone () from /lib64/libc.so.6
 
Thread 6 (Thread 0x7f18d1da1700 (LWP 1447088)):
#0  0x00007f18d9e3a62b in ioctl () from /lib64/libc.so.6
#1  0x00007f18dce68577 in DCB::socket_bytes_readable (this=0x7f18cc03db60) at /home/timofey_turenko_mariadb_com/MaxScale/server/
#2  0x00007f18dce69de8 in DCB::read (this=this@entry=0x7f18cc03db60, head=head@entry=0x7f18d1d9b1f8, maxbytes=maxbytes@entry=0) _com/MaxScale/server/core/dcb.cc:336
#3  0x00007f18dce69f7c in DCB::read (this=0x7f18cc03db60, min_bytes=min_bytes@entry=4, max_bytes=max_bytes@entry=0) at /home/time/server/core/dcb.cc:262
#4  0x00007f18dcf17a7e in MariaDBBackendConnection::normal_read (this=0x7f18cc03f290) at /home/timofey_turenko_mariadb_com/MaxScaDB/mariadb_backend.cc:659
#5  0x00007f18dce680b4 in DCB::process_events (events=1, this=0x7f18cc03db60) at /home/timofey_turenko_mariadb_com/MaxScale/serv
#6  DCB::process_events (this=0x7f18cc03db60, events=1) at /home/timofey_turenko_mariadb_com/MaxScale/server/core/dcb.cc:1245
#7  0x00007f18dce681d9 in DCB::event_handler (dcb=0x7f18cc03db60, events=<optimized out>) at /home/timofey_turenko_mariadb_com/M
#8  0x00007f18dcf596a9 in maxbase::Worker::poll_waitevents (this=this@entry=0x1783390) at /home/timofey_turenko_mariadb_com/MaxS.cc:848
#9  0x00007f18dcf598bf in maxbase::Worker::run (this=0x1783390, pSem=0x7ffe80dd9ca0) at /home/timofey_turenko_mariadb_com/MaxScac:554
#10 0x00007f18dbe09ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#11 0x00007f18daaba17a in start_thread () from /lib64/libpthread.so.0
#12 0x00007f18d9e43dc3 in clone () from /lib64/libc.so.6
 
Thread 5 (Thread 0x7f18d25a2700 (LWP 1447087)):
#0  0x00007f18daac07e8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f18dcf57ee0 in maxbase::ConditionVariable::wait_for (d=..., guard=..., this=0x1783f10) at /usr/include/c++/8/bits/std
#2  maxbase::WatchdogNotifier::Dependent::Ticker::run (this=0x1783ed0) at /home/timofey_turenko_mariadb_com/MaxScale/maxutils/ma
#3  0x00007f18dbe09ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#4  0x00007f18daaba17a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f18d9e43dc3 in clone () from /lib64/libc.so.6
 
Thread 4 (Thread 0x7f18d2da3700 (LWP 1447086)):
#0  0x00007f18daac07e8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f18dcf57ee0 in maxbase::ConditionVariable::wait_for (d=..., guard=..., this=0x175ad50) at /usr/include/c++/8/bits/std
#2  maxbase::WatchdogNotifier::Dependent::Ticker::run (this=0x175ad10) at /home/timofey_turenko_mariadb_com/MaxScale/maxutils/ma
#3  0x00007f18dbe09ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#4  0x00007f18daaba17a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f18d9e43dc3 in clone () from /lib64/libc.so.6
 
Thread 3 (Thread 0x7f18d35a4700 (LWP 1447085)):
#0  0x00007f18daac07e8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f18dcf57763 in maxbase::ConditionVariable::wait_for (d=..., guard=..., this=0x7ffe80dda1e8) at /usr/include/c++/8/bit
#2  maxbase::WatchdogNotifier::run (this=0x7ffe80dda1b0) at /home/timofey_turenko_mariadb_com/MaxScale/maxutils/maxbase/src/watc
#3  0x00007f18dbe09ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#4  0x00007f18daaba17a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f18d9e43dc3 in clone () from /lib64/libc.so.6
 
Thread 2 (Thread 0x7f18d4053700 (LWP 1447084)):
#0  0x00007f18daac07e8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f18dcf57ee0 in maxbase::ConditionVariable::wait_for (d=..., guard=..., this=0x175b390) at /usr/include/c++/8/bits/std
#2  maxbase::WatchdogNotifier::Dependent::Ticker::run (this=0x175b350) at /home/timofey_turenko_mariadb_com/MaxScale/maxutils/ma
#3  0x00007f18dbe09ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#4  0x00007f18daaba17a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f18d9e43dc3 in clone () from /lib64/libc.so.6
 
Thread 1 (Thread 0x7f18dd4f6380 (LWP 1447083)):
#0  0x00007f18d9e345d4 in read () from /lib64/libc.so.6
#1  0x00007f18d9dc4418 in __GI__IO_file_underflow () from /lib64/libc.so.6
#2  0x00007f18d9dc5755 in _IO_default_xsgetn () from /lib64/libc.so.6
#3  0x00007f18d9db85eb in fread () from /lib64/libc.so.6
#4  0x000000000040f1f3 in (anonymous namespace)::get_command_output_cb (cb=cb@entry=0x40b6d0 <<lambda(char const*)>::_FUN(const -pid=%d -batch -nx -iex 'set auto-load off' -iex 'set print thread-events off' -ex 'thr a a bt'", format=0x412c88 "gdb --pid=%d ff' -iex 'set print thread-events off' -ex 'thr a a bt'") at /home/timofey_turenko_mariadb_com/MaxScale/maxutils/maxbase/src/sta
#5  0x000000000040f937 in maxbase::dump_gdb_stacktrace (handler=0x40b6d0 <<lambda(char const*)>::_FUN(const char *)>) at /home/tale/maxutils/maxbase/src/stacktrace.cc:242
#6  0x000000000040b3c1 in sigfatal_handler (i=6) at /home/timofey_turenko_mariadb_com/MaxScale/server/core/gateway.cc:482
#7  <signal handler called>
#8  0x00007f18daac2cd6 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#9  0x00007f18daac2dc8 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#10 0x00007f18dcecd012 in maxbase::Semaphore::wait (signal_approach=<optimized out>, this=<optimized out>) at /home/timofey_tures/maxbase/include/maxbase/semaphore.hh:144
#11 maxbase::Semaphore::wait_n (signal_approach=<optimized out>, n_wait=<optimized out>, this=<optimized out>) at /home/timofey_utils/maxbase/include/maxbase/semaphore.hh:176
#12 maxscale::RoutingWorker::execute_concurrently(std::function<void ()> const&) (func=...) at /home/timofey_turenko_mariadb_comker.cc:1191
#13 0x00007f18dcedbc43 in maxscale::WorkerGlobal<std::unordered_map<unsigned int, unsigned long, std::hash<unsigned int>, std::ecator<std::pair<unsigned int const, unsigned long> > > >::assign (t=..., this=0x17ac768) at /usr/include/c++/8/new:169
#14 Server::<lambda()>::operator() (__closure=<optimized out>) at /home/timofey_turenko_mariadb_com/MaxScale/server/core/server.
#15 std::_Function_handler<void(), Server::set_gtid_list(const std::vector<std::pair<unsigned int, long unsigned int> >&)::<lambny_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297
#16 0x00007f18dcf584b9 in std::function<void ()>::operator()() const (this=0x7f18ac0092e8) at /usr/include/c++/8/bits/std_functi
#17 maxbase::Worker::CustomTask::execute (worker=..., this=0x7f18ac0092e0) at /home/timofey_turenko_mariadb_com/MaxScale/maxutil
#18 maxbase::Worker::handle_message (this=<optimized out>, queue=..., msg=...) at /home/timofey_turenko_mariadb_com/MaxScale/max
#19 0x00007f18dcf5b368 in maxbase::MessageQueue::handle_poll_events (pWorker=<optimized out>, events=<optimized out>, this=0x177ariadb_com/MaxScale/maxutils/maxbase/src/messagequeue.cc:309
#20 maxbase::MessageQueue::handle_poll_events (this=0x177edc0, pWorker=<optimized out>, events=<optimized out>) at /home/timofeyxutils/maxbase/src/messagequeue.cc:288
#21 0x00007f18dcf596a9 in maxbase::Worker::poll_waitevents (this=this@entry=0x7ffe80dda560) at /home/timofey_turenko_mariadb_comorker.cc:848
#22 0x00007f18dcf598bf in maxbase::Worker::run (this=this@entry=0x7ffe80dda560, pSem=pSem@entry=0x0) at /home/timofey_turenko_maase/src/worker.cc:554
#23 0x0000000000409cfa in maxbase::Worker::run (this=0x7ffe80dda560) at /home/timofey_turenko_mariadb_com/MaxScale/maxutils/maxb1
#24 main (argc=<optimized out>, argv=<optimized out>) at /home/timofey_turenko_mariadb_com/MaxScale/server/core/gateway.cc:2236
[Inferior 1 (process 1447083) detached]



 Comments   
Comment by Johan Wikman [ 2023-02-07 ]

No returns from maxctrl commands while "up" as they time out.

You mean that even before MaxScale has crashed, all maxctrl calls will hang and time out? If so, is that the case immediately after startup?

Comment by Johan Wikman [ 2023-02-07 ]

Can anything noteworthy be observed from the outside; e.g. does the memory consumption or CPU usage of MaxScale go up before the crash? Can something be said about the traffic when the crash occurs?

Comment by Sunny Nagra [ 2023-02-07 ]

johan.wikman that is correct regarding the maxctrl commands. These hang and then timeout after 1000 seconds.

It's normal traffic coming through and RAM usage is around 1G or just slightly more, with majority of it being cached from the OS side.

Nothing notable that I have observed when the crash happens. The watchdog kicks in after 1 min and kills maxscale with signal 6.

Comment by Bryan Bancroft (Inactive) [ 2023-02-07 ]

Needs feed back from customer

Generated at Thu Feb 08 04:29:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.